A multi-level convolution pyramid semantic fusion framework for high-resolution remote sensing image scene classification and annotation

2021 
High spatial resolution (HSR) imagery scene classification has become a hot research topic in remote sensing. Scene classification method based on the handcrafted features, such as the bag-of-visual-words (BoVW) model, describes an image by extracting local features of the scene and mapping them to the dictionary space, but usually uses a shallow structure and loses the spatial distribution characteristics of the scene. The method based on deep learning extracts hierarchical features to describe the scene, which can maintain the spatial position information well. However, deep features in different levels have scale recognition restrictions for multi-scale ground objects, and cannot understand complex scenes well. In this paper, the multi-level convolutional pyramid semantic fusion (MCPSF) framework is proposed for HSR imagery scene classification. Differing from previous scene classification methods, which integrate the feature of different levels directly, of which the fusion features have large differences in both sparsity and eigenvalue magnitude, MCPSF integrates multi-level semantic features extracted by BoVW model and convolutional neural network (CNN) model. In MCPSF, two convolution pyramid feature expression strategies are proposed to enhance the ability of capturing multi-scale land objects, i.e., local and convolutional pyramid based BoVW (LCPB) model and local and convolutional pyramid based pooling-stretched (LCPP) model. The effectiveness of the proposed method is verified on 21-class UC Merced (UCM) dataset and 30-class Aerial Image Dataset (AID). The framework was also transferred toa case study of scene annotation in Wuhan. The proposed framework significantly improves the performance when compared with other state-of-the-art methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    43
    References
    3
    Citations
    NaN
    KQI
    []