A joint framework for mining discriminative and frequent visual representation

2022 
Discovering visual representation in an image category is a challenging issue, because the visual representation should not only be discriminative but also frequently appears in these images. Previous studies have proposed many solutions, however, all of them separately optimized the discrimination and frequency, which consequently makes the solutions sub-optimal. We propose a method to discover the jointly discriminative and frequent visual representation to address this issue, named as JDFR. To ensure discrimination, JDFR employs a with cross-entropy loss. To achieve frequency, we design a novel similarity concentration (SC) loss to concentrate on the samples with the same representation and pull them closer in the feature space, and then mine the frequent visual representations. Moreover, we utilize an attention module to locate the representative region in the image. Extensive experiments on five benchmark datasets (Place365-20, Travel, VOC2012-10, ImageNet-100, and iNaturalist-100) show that the discovered visual representations have better discrimination and frequency than ones mined by the state-of-the-art (SOTA) method with average improvements of 5.37% on accuracy and 3.06% on frequency.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []