Learning Implicit Class Knowledge for RGB-D Co-Salient Object Detection With Transformers

2022 
RGB-D co-salient object detection aims to segment co-occurring salient objects when given a group of relevant images and depth maps. Previous methods often adopt separate pipeline and use hand-crafted features, being hard to capture the patterns of co-occurring salient objects and leading to unsatisfactory results. Using end-to-end CNN models is a straightforward idea, but they are less effective in exploiting global cues due to the intrinsic limitation. Thus, in this paper, we alternatively propose an end-to-end transformer-based model which uses class tokens to explicitly capture implicit class knowledge to perform RGB-D co-salient object detection, denoted as CTNet. Specifically, we first design adaptive class tokens for individual images to explore intra-saliency cues and then develop common class tokens for the whole group to explore inter-saliency cues. Besides, we also leverage the complementary cues between RGB images and depth maps to promote the learning of the above two types of class tokens. In addition, to promote model evaluation, we construct a challenging and large-scale benchmark dataset, named RGBD CoSal1k, which collects 106 groups containing 1000 pairs of RGB-D images with complex scenarios and diverse appearances. Experimental results on three benchmark datasets demonstrate the effectiveness of our proposed method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []