STA3D: spatiotemporally attentive 3D network for video saliency prediction

2021 
Abstract 3D fully convolutional networks (FCN), which jointly leverage the spatial and temporal cues, have achieved great success in video saliency prediction. However, they still have limitations in some challenging cases, eg. fixation shift. To address this issue, we propose a SpatioTemporally Attentive 3D Network (STA3D) to selectively propagate the significant temporal features and refine the spatial features in 3D FCN for video saliency prediction. Extensive experiments on three standard datasets demonstrate the superiority of the proposed model against the state-of-the-art.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    42
    References
    2
    Citations
    NaN
    KQI
    []