Video salient object detection using dual-stream spatiotemporal attention

2021 
Abstract Video salient object detection plays an important role in many exciting applications in different areas. However, the existing deep learning-based video salient object detection methods still struggle in scenes of large salient object variabilities and great background scene diversity between and within frames. In this paper, we propose a dual-stream spatiotemporal attention network (DSSANet) for saliency detection in videos. It creatively introduces a multiplex attention mechanism to effectively extract and fuse spatiotemporal features of video salient object over frames in the video, thereby improving saliency detection performance. The DSSANet consists of: (1) A context feature path leverages a novel attention-augmented convolutional LSTM to effectively model the long-range dependency of the great temporal variation in the salient object over frames. (2) A content feature path creatively leverages an attention-based 1D dilated convolution to effectively model the local pixel correlation structure of each pixel in the salient object and the surrounding objects. (3) A refinement fusion module fuses these two features from their paths and further refines the fused feature by an attention-based feature selection. By integrating these three parts, DSSANet accurately detects the salient object from the video. The extensive experiments are performed on four public datasets and demonstrate the effectiveness of DSSANet and the superiority to five state-of-the-art video salient object detection methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    41
    References
    2
    Citations
    NaN
    KQI
    []