Siamese Visual Tracking With Deep Features and Robust Feature Fusion

2020 
Trackers based on fully-convolutional Siamese networks regard tracking as a process of learning a similarity function. By utilizing shallow networks and off-line training, Siamese trackers can achieve high tracking speed and perform well in some simple scenes. However, due to the less semantic information and the invariant template, Siamese trackers still have a gap compared with the state-of-the-art methods in complex scenes and other challenging problems (Occlusion, Deformation, etc.). In this paper, we propose a Siamese tracking algorithm with deep features and robust feature fusion (SiamDF). The improved ResNet-18 network is utilized to replace the traditional shallow network and extract the deep features with more semantic information. For eliminating the negative effect of padding and making better use of the deep network, the proposed algorithm adopts the spatial aware sampling strategy to overcome the strict translation invariance. Meanwhile, a final response map with high quality can be obtained by using the multi-layer feature fusion. Thus, the tracker can significantly reduce the impact of the distractors in complex scenes. In addition, an adaptive feature information fusion is adopted to update the template, so that the algorithm can adapt to various changes of the target appearance. Objective evaluation on the OTB100 dataset shows that the precision and the overlap success can reach 0.852 and 0.658 respectively. Moreover, the EAO value evaluated on the VOT2016 database can reach 0.336. These results demonstrate that our algorithm can effectively improve the tracking performance and perform favorably in both robustness and accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    3
    Citations
    NaN
    KQI
    []