Dynamic Perception Framework for Fine-grained Recognition

2021 
Fine-grained recognition poses the challenge of discriminating categories with only small subtle visual differences, which can be easily overwhelmed by diverse appearance within categories. Conventional approaches generally locate discriminative parts and then recognize the part-based features. However, we find that tuning the effective receptive field (ERF) of the network to the task plays the key role, which enables significant regions to contribute more to the output. Inspired by the receptive field stimulation mechanism of the visual cortex, we propose a Dynamic Perception framework as a solution. Our framework adapts the ERF by considering the image space and the kernel space simultaneously. In the image space, the Spatial Selective Sampling module is adopted to enlarge informative regions locally. In the kernel space, Spatial Selective Kernel convolution is introduced to adapt different kernel sizes for regions of interest and backgrounds by embedding spatial attention in the multi-path convolution. Extensive experiments on challenging benchmarks, including CUB-200-2011, FGVC-Aircraft, and Stanford Cars, demonstrate that our method yields a performance boost over the state-of-the-art methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []