A New Siamese Co-attention Network for Unsupervised Video Object Segmentation

2021 
Unsupervised Video Object Segmentation (UVOS) aims to generate accurate pixel-level masks for moving objects without any prior knowledge. A lot of UVOS methods process frames independently by using image segmentation model without considering the temporal information between consecutive frames. Other works rely on RNNs or motion cues to find objects that need to be tracked, these models learn short-term temporal dependencies and thus tend to accumulate errors over time. We propose a new Siamese Co-attention Network to tackle Unsupervised Video Object Segmentation task based on SOLOv2. The Co-attention module in our Siamese Network captures global correspondences between a reference frame and the current one from same video, and it can learn pairwise correlation at any distance to help current frame correctly distinguish primary objects from a global view. Our proposed method is evaluated in TianChi VOS Challenge and DAVIS2017, and the results indicate that it exhibits superior performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    0
    Citations
    NaN
    KQI
    []