Cascaded Multiscale Structure With Self-Smoothing Atrous Convolution for Semantic Segmentation

2021 
Convolutional neural networks (CNNs) have attracted great attention in the semantic segmentation of very-high-resolution (VHR) images of urban areas. However, large-scale variation of objects in the urban areas often makes it difficult to achieve good segmentation accuracy. Atrous convolution and atrous spatial pyramid pooling composed of atrous convolution can alleviate this problem by exploring multiscale contextual information. Unfortunately, atrous convolution causes gridding artifacts, where actual receptive fields are separated unit sets and fail to cover all the receptive fields. To address this problem, in this article, we first propose a self-smoothing atrous convolution (SS-AConv) that intrinsically improves atrous convolution, unlike existing methods. SS-AConv enhances sampling rates with low computational costs by adding several key parameters in the spatial dimension of its filter. Then, in the backbone, the Xception network, all max pooling operations are replaced with SS-AConvs for a large and effective receptive field. Moreover, to extract diverse features at multiple scales, we propose the SS-AConv cascaded multiscale structure (SCMS) by integrating SS-AConvs with different rates and the residual correction scheme (RCS) into a cascaded spatial pyramid. Finally, to extract diverse features at dense multiple scales, the SS-AConv convolutional network (SS-ACNet) is constructed by integrating SCMS into both the encoder and decoder layers of the modified Xception network. Extensive experimental results show that SS-ACNet outperforms some state-of-the-art methods on four open challenge datasets: the ISPRS Vaihingen and Potsdam, Cityscapes, and PASCAL VOC 2012.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []