Learning Deep Cross-Modal Embedding Networks for Zero-Shot Remote Sensing Image Scene Classification

Yansheng Li,Zhihui Zhu,Jin-Gang Yu,Yongjun Zhang

Learning Deep Cross-Modal Embedding Networks for Zero-Shot Remote Sensing Image Scene Classification

2021

Due to its wide applications, remote sensing (RS) image scene classification has attracted increasing research interest. When each category has a sufficient number of labeled samples, RS image scene classification can be well addressed by deep learning. However, in the RS big data era, it is extremely difficult or even impossible to annotate RS scene samples for all the categories in one time as the RS scene classification often needs to be extended along with the emergence of new applications that inevitably involve a new class of RS images. Hence, the RS big data era fairly requires a zero-shot RS scene classification (ZSRSSC) paradigm in which the classification model learned from training RS scene categories obeys the inference ability to recognize the RS image scenes from unseen categories, in common with the humans' evolutionary perception ability. Unfortunately, zero-shot classification is largely unexploited in the RS field. This article proposes a novel ZSRSSC method based on locality-preservation deep cross-modal embedding networks (LPDCMENs). The proposed LPDCMENs, which can fully assimilate the pairwise intramodal and intermodal supervision in an end-to-end manner, aim to alleviate the problem of class structure inconsistency between two hybrid spaces (i.e., the visual image space and the semantic space). To pursue a stable and generalization ability, which is highly desired for ZSRSSC, a set of explainable constraints is specially designed to optimize LPDCMENs. To fully verify the effectiveness of the proposed LPDCMENs, we collect a new large-scale RS scene data set, including the instance-level visual images and class-level semantic representations (RSSDIVCS), where the general and domain knowledge is exploited to construct the class-level semantic representations. Extensive experiments show that the proposed ZSRSSC method based on LPDCMENs can obviously outperform the state-of-the-art methods, and the domain knowledge further improves the performance of ZSRSSC compared with the general knowledge. The collected RSSDIVCS will be made publicly available along with this article.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations