A feature temporal attention based interleaved network for fast video object detection

2021 
Object detection in videos is a fundamental technology for applications such as monitoring. Since video frames are treated as independent input images, static detectors ignore the temporal information of objects when detecting objects in videos, generating redundant calculations in the detection process. In this paper, based on the spatiotemporal continuity of video objects, we propose an attention-guided dynamic video object detection method for fast detection. We define two frame attributes as key frame and non-key frame, then extract complete or shallow features, respectively. Distinct from the fixed key frame strategy used in previous studies, by measuring the feature similarity between frames, we develop a new key frame decision method to adaptively determine the attributes of the current frame. For the extracted shallow features of non-key frames, semantic enhancement and feature temporal attention (FTA) based feature propagation are performed to generate high-level semantic features in the designed temporal attention based feature propagation module (TAFPM). Our method is evaluated on the ImageNet VID dataset. It runs at the speed of 21.53 fps, which is twice the speed of the base detector R-FCN. The mAP decline is only 0.2% compared to R-FCN. Effectively, the proposed method achieves comparable performance with the state-of-the-arts which focus on speed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    45
    References
    0
    Citations
    NaN
    KQI
    []