SiamAtt: Siamese attention network for visual tracking

2020 
Abstract Visual attention has recently achieved great success and wide application in deep neural networks. Existing methods based on Siamese network have achieved a good accuracy–efficiency trade-off in visual tracking. However, the training time of Siamese trackers becomes longer for the deeper network and larger training data. Further, Siamese trackers cannot predict the target location well in fast motion, full occlusion, camera motion, and similar object scenarios. Due to these difficulties, we develop an end-to-end Siamese attention network for visual tracking. Our approach is to introduce an attention branch in the region proposal network that contains a classification branch and a regression branch. We perform foreground–background classification by combining the scores of the classification branch and the attention branch. The regression branch predicts the bounding boxes of the candidate regions based on the classification results. Furthermore, the proposed tracker achieves the experimental results comparable to the state-of-the-art tracker on six tracking benchmarks. In particular, the proposed method achieves an AUC score of 0.503 on LaSOT, while running at 40 frames per second (FPS).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    30
    References
    10
    Citations
    NaN
    KQI
    []