A 3-D-Swin Transformer-Based Hierarchical Contrastive Learning Method for Hyperspectral Image Classification

2022 
Deep convolutional neural networks have been dominating in the field of hyperspectral image (HSI) classification. However, single convolutional kernel can limit the receptive field and fail to capture the sequential properties of data. The self-attention-based Transformer can build global sequence information, among which the Swin Transformer (SwinT) integrates sequence modeling capability and prior information of the visual signals (e.g., locality and translation invariance). Based on SwinT, we propose a 3-D SwinT (3DSwinT) to accommodate the 3-D properties of HSI and capture the rich spatial–spectral information of HSI. Currently, supervised learning is still the most commonly used method for remote sensing image interpretation. However, pixel-by-pixel HSI classification demands a large number of high-quality labeled samples that are time-consuming and costly to collect. As unsupervised learning, self-supervised learning (SSL), especially contrastive learning, can learn semantic representations from unlabeled data and, hence, is becoming a potential alternative to supervised learning. On the other hand, current contrastive learning methods are all single level or single scale, which do not consider complex and variable multiscale features of objects. Therefore, this article proposes a novel 3DSwinT-based hierarchical contrastive learning (3DSwinT-HCL) method, which can fully exploit multiscale semantic representations of images. Besides, we propose a multiscale local contrastive learning (MS-LCL) module to mine the pixel-level representations in order to adapt to downstream dense prediction tasks. A series of experiments verify the great potential and superiority of 3DSwinT-HCL.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []