Multimodal Semantic Consistency-Based Fusion Architecture Search for Land Cover Classification

2022 
Multimodal land cover classification (MLCC) using the optical and synthetic aperture radar (SAR) modalities has resulted in outstanding performances over using only unimodal data due to their complementary information on land properties. Previous multimodal deep learning (MDL) methods have relied on handcrafted multibranch convolutional neural networks (CNN) to extract the features of different modalities and merged them for land cover classification. However, natural-image-oriented handcrafted CNN models may not be the optimal strategies to handle remote sensing (RS) image interpretation problems, due to the huge difference in terms of imaging angles and imaging ways. Furthermore, few MDL methods have analyzed optimal combinations of hierarchical features from different modalities. In this article, we propose an efficient multimodal architecture search framework, namely, multimodal semantic consistency-based fusion architecture search ( $\text{M}^{2}$ SC-FAS) in continuous search space with the gradient-based optimization method, which can not only discover optimal optical- and SAR-specific architectures according to the different characteristics of the optical and SAR images, respectively, but also realizes the search of optimal multimodal dense fusion architecture. Specifically, the semantic consistency constraint is introduced to guarantee dense fusion between hierarchical optical and SAR features with high semantic consistency and then capture the complementary performance on land properties. Finally, the basis of curriculum learning strategy is adopted on $\text{M}^{2}$ SC-FAS. Extensive experiments show superior performances of our work on three broad coregistered optical and SAR datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    50
    References
    0
    Citations
    NaN
    KQI
    []