Structure-Guided Feature Transform Hybrid Residual Network for Remote Sensing Object Detection
Object detection in remote sensing imagery (RSI) is a fundamental task for Earth monitoring. Objects captured from the bird's eye view perspective in RSI can appear as multiscale in arbitrary orientations, most of which are small and dense. In specific, vehicles or ships only occupy a dozen pixels in the image, but are surrounded by roads and seas, which occupy thousands of pixels and comprise overwhelmingly dominant of all pixels. Although a large number of common object detection methods have been proposed, most of them cannot detect small and dense objects accurately because none of them has paid enough attention to the unique characteristic of RSI. In this work, we propose a novel structure-guided feature transform hybrid residual (SGFTHR) network, which can conquer the low performance of detection of objects at different scales, especially for small and dense objects, in an anchor-free manner. The structure-guided feature transform (SGFT) module is promoted to extract discriminative structural information and guide this information into high-level contextual feature maps, preventing the important low-level spatial and structural information from being lost when the network goes deeper. Furthermore, the hybrid residual (HR) module is embedded in the backbone to acquire multiscale features in a novel hybrid hierarchical residual-like manner. Extensive experiments are performed on the HRRSD and NWPU VHR-10 datasets to evaluate the performance of the SGFTHR network, which demonstrates that our SGFTHR network achieves state-of-the-art detection accuracy with high efficiency and robustness. Specifically, 4.12% improvements in mean average precision (mAP) on the HRRSD dataset compared with baseline powerfully demonstrate the effectiveness and superiority of the SGFTHR network.