Parsing very high resolution urban scene images by learning deep ConvNets with edge-aware loss

2020 
Abstract Parsing very high resolution (VHR) urban scene images into regions with semantic meaning, e.g. buildings and cars, is a fundamental task in urban scene understanding. However, due to the huge quantity of details contained in an image and the large variations of objects in scale and appearance, the existing semantic segmentation methods often break one object into pieces, or confuse adjacent objects and thus fail to depict these objects consistently. To address these issues uniformly, we propose a standalone end-to-end edge-aware neural network (EaNet) for urban scene semantic segmentation. For semantic consistency preservation inside objects, the EaNet model incorporates a large kernel pyramid pooling (LKPP) module to capture rich multi-scale context with strong continuous feature relations. To effectively separate confusing objects with sharp contours, a Dice-based edge-aware loss function (EA loss) is devised to guide the EaNet to refine both the pixel- and image-level edge information directly from semantic segmentation prediction. In the proposed EaNet model, the LKPP and the EA loss couple to enable comprehensive feature learning across an entire semantic object. Extensive experiments on three challenging datasets demonstrate that our method can be readily generalized to multi-scale ground/aerial urban scene images, achieving 81.7% in mIoU on Cityscapes Test set and 90.8% in the mean F1-score on the ISPRS Vaihingen 2D Test set. Code is available at: https://github.com/geovsion/EaNet .
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    51
    References
    22
    Citations
    NaN
    KQI
    []