ChangeMask: Deep multi-task encoder-transformer-decoder architecture for semantic change detection
Abstract Multi-temporal high spatial resolution earth observation makes it possible to detect complex urban land surface changes, which is a significant and challenging task in remote sensing communities. Previous works mainly focus on binary change detection (BCD) based on modern technologies, e.g., deep fully convolutional network (FCN), whereas the deep network architecture for semantic change detection (SCD) is insufficiently explored in current literature. In this paper, we propose a deep multi-task encoder-transformer-decoder architecture (ChangeMask) designed by exploring two important inductive biases: sematic-change causal relationship and temporal symmetry. ChangeMask decouples the SCD into a temporal-wise semantic segmentation and a BCD, and then integrates these two tasks into a general encoder-transformer-decoder framework. In the encoder part, we design a semantic-aware encoder to model the semantic-change causal relationship. This encoder is only used to learn semantic representation and then learn change representation from semantic representation via a later transformer module. In this way, change representation can constrain semantic representation during training, which introduces a regularization to reduce the risk of overfitting. To learn a robust change representation from semantic representation, we propose a temporal-symmetric transformer (TST) to guarantee temporal symmetry for change representation and keep it discriminative. Based on the above semantic representation and change representation, we adopt simple multi-task decoders to output semantic change map. Benefiting from the differentiable building blocks, ChangeMask can be trained by a multi-task loss function, which significantly simplifies the whole pipeline of applying ChangeMask. The comprehensive experimental results on two large-scale SCD datasets confirm the effectiveness and superiority of ChangeMask in SCD. Besides, to demonstrate the potential value in real-world applications, e.g., automatic urban analysis and decision-making, we deploy the ChangeMask to map a large geographic area covering 30 km2 with 300 million pixels. Code will be made available.