Cross-modality Fusion and Progressive Integration Network for Saliency Prediction on Stereoscopic 3D Images

2021 
Traditional 2D image-based saliency prediction models suffer from unsatisfactory performance when dealing with stereoscopic 3D (S3D) images because eye movements in the case of freely viewing S3D images are demonstrated to be guided by both RGB and depth features. This paper studies the problem of saliency prediction on S3D images, where the interactions between RGB and depth modalities are both taken into account. Specifically, we design a novel deep neural network named Cross-modality Fusion and Progressive Integration Network (CFPI-Net) to address this problem. It consists of a Multi-level Cross-modality Feature Fusion (MCFF) module and a Multi-stage Progressive Feature Integration (MPFI) module. The MCFF module first captures hierarchical contexture features from each modality and then effectively fuses the hierarchical contexture features from different modalities at each level. The MPFI module involves multiple cascaded deeply supervised feature integration (DSFI) blocks in which the low-level and high-level cross-modality features are progressively integrated using the integrated features in the previous stage as a guidance. Our proposed CFPI-Net benefits from the advantages of multi-level feature representation, cross-modality feature fusion, and multi-stage progressive feature integration, which hereby fully boost the performance. Experimental results on two benchmark datasets demonstrate that CFPI-Net outperforms state-of-the-art saliency prediction methods both quantitatively and qualitatively. All the results and relevant codes will be made available to the public.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []