High Resolution SAR Image Classification Using Global-Local Network Structure Based on Vision Transformer and CNN

2022 
High-resolution (HR) synthetic aperture radar (SAR) image classification is a challenging task for the limitation of its complex semantic scenes and coherent speckles. Convolutional neural networks (CNNs) have been proven the superior local spatial features representation capability for SAR images. However, it is hard to capture global information of images by convolutions. To solve such issues, this letter proposes an end-to-end network named global–local network structure (GLNS) for HR SAR classification. In the GLNS framework, a lightweight CNN and a compact vision transformer (ViT) are designed to learn local and global features, and two types of features are fused in quality to mine complementary information through the fusion net. Then, our research devolves the twofold loss function to reduce the interclass distance of SAR images, which brings more compactness to classification features and less interference of coherent speckles. Experimental results on real HR SAR images indicate that the proposed method has more strong feature extraction capability and noise resistance performance. This method achieves the highest classification accuracy on both datasets compared with other related approaches based on CNN.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    0
    Citations
    NaN
    KQI
    []