LiDAR-based 3D Video Object Detection with Foreground Context Modeling and Spatiotemporal Graph Reasoning

2021 
The strong demand for autonomous driving in the industry has promoted researches on 3D object detection algorithms. However, the vast majority of algorithms use a single-frame detection diagram, ignoring the spatiotemporal correlations across the point cloud frames. In this work, a novel Foreground Context Modeling Block (FCMB) is proposed to model the foreground spatial context and channel-wise dependency of point cloud features which maintains the original inference speed. Besides, to explore the information of multiple frames, we design a two-stage Spatial-Temporal Graph Neural Network (STGNN). In STGNN, the first stage consumes the coarse proposals of each point cloud frame and conducts intra-frame proposals refinement by massage update functions. And the second stage performs multiple graph convolutions based on the similarity graph to aggregate the semantically similar objects across the input frames. Experimental results show that our 3D video object detector outperforms the LiDAR-based state-of-the-art (SOTA) models on the nuScenes benchmark.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    45
    References
    0
    Citations
    NaN
    KQI
    []