Coarse2Fine: a two-stage training method for fine-grained visual classification

2021 
Small inter-class and large intra-class variations are the key challenges in fine-grained visual classification. Objects from different classes share visually similar structures, and objects in the same class can have different poses and viewpoints. Therefore, the proper extraction of discriminative local features (e.g., bird’s beak or car’s headlight) is crucial. Most of the recent successes on this problem are based upon the attention models which can localize and attend the local discriminative objects parts. In this work, we propose a training method for visual attention networks, Coarse2Fine, which creates a differentiable path from the attended feature maps to the input space. Coarse2Fine learns an inverse mapping function from the attended feature maps to the informative regions in the raw image, which will guide the attention maps to better attend the fine-grained features. Besides, we propose an initialization method for the attention weights. Our experiments show that Coarse2Fine reduces the classification error by up to 5.1% on common fine-grained datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    1
    Citations
    NaN
    KQI
    []