Channel and spatial attention-based Siamese network for visual object tracking

2021 
Visual object tracking, which aims to estimate the position of an arbitrary target in a video sequence automatically, has drawn great attention in recent years. Many efforts have been made regarding this topic. The Siamese network, with a balanced accuracy and speed, has achieved great success. The Siamese network consists of two branches: one for the target image and the other for the search image. The position with the maximum score in the similarity map between the target and the search images indicates the place of the target image in the search image. Current Siamese trackers treat the features of different channels and spatial locations equally. However, the features of different channels and spatial locations may represent different semantic information. We propose a channel and spatial (CS) attention-based Siamese network for visual object tracking. A CS attention mechanism is inserted into the feature extractor to enhance the semantic feature learning. The experimental results show that the proposed network significantly improves the performance of the baseline tracker and is one of the top-ranked trackers among all tested state-of-the-art trackers on the most widely used visual object tracking datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    0
    Citations
    NaN
    KQI
    []