Channel and spatial attention-based Siamese network for visual object tracking

Shishun Tian,Zixi Chen,Bolin Chen,Wenbin Zou,Xia Li

Channel and spatial attention-based Siamese network for visual object tracking

2021

Visual object tracking, which aims to estimate the position of an arbitrary target in a video sequence automatically, has drawn great attention in recent years. Many efforts have been made regarding this topic. The Siamese network, with a balanced accuracy and speed, has achieved great success. The Siamese network consists of two branches: one for the target image and the other for the search image. The position with the maximum score in the similarity map between the target and the search images indicates the place of the target image in the search image. Current Siamese trackers treat the features of different channels and spatial locations equally. However, the features of different channels and spatial locations may represent different semantic information. We propose a channel and spatial (CS) attention-based Siamese network for visual object tracking. A CS attention mechanism is inserted into the feature extractor to enhance the semantic feature learning. The experimental results show that the proposed network significantly improves the performance of the baseline tracker and is one of the top-ranked trackers among all tested state-of-the-art trackers on the most widely used visual object tracking datasets.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations