SiamAGN: Siamese attention-guided network for visual tracking

2022 
Most Siamese-based trackers utilize cross-correlation to calculate the similarity scores between the target template and the search image, which may cause the loss of spatial information and lead to inaccurate target estimation. To address this issue, we propose an attention-guided model under the Siamese framework for object tracking, named SiamAGN. Specifically, we combine the template and search image through the proposed feature fusion module. It contains a self-context interaction (SCI) module, cross-context interaction (CCI) module, and target location module (TLM). SCI based on self-attention learns global context by emphasizing channel-wise complementary features. CCI based on cross-attention explores rich dynamic context via the channel interaction between the template and the search image. TLM based on cross-attention reformulates the template according to the pixel-level similarity scores between the template and the search image, which can keep as much spatial information as possible and enable our model to predict more precise bounding boxes. Extensive experimental results on the GOT10k, OTB100, VOT2016, VOT2018, , and LaSOT benchmarks indicate that the proposed tracker SiamAGN achieves competitive performance compared with state-of-the-art trackers.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []