Class-aware Feature Aggregation Network for Video Object Detection

Liang Han,Pichao Wang,Zhaozheng Yin,Fan Wang,Hao Li

Class-aware Feature Aggregation Network for Video Object Detection

2021

Recent progress in video object detection (VOD) has shown that aggregating features from other frames to capture long-range contextual information is very important to deal with the challenges in VOD, such as partial occlusion, motion blur, etc. To exploit more effective feature aggregation, we propose several improvements over previous works in this paper: (1) a class-aware pixel-level feature aggregation module, which characterizes a pixel by exploiting the context information lying in the instances from both the current frame and other frames. Different from the previous non-local operation, the proposed class-aware pixel-level feature aggregation filters out most of the noisy information from the large scope of background and objects in different classes, and only enhances representation of a foreground pixel with the same class instances with limited ambiguous information; (2) a class-aware instance-level feature aggregation module, which aggregates features for object proposals by learning two kinds of relations: the temporal dependencies among the same class object proposals from support frames sampled in a long time range or even the whole sequence, and spatial topology relation among proposals of different objects in the target frame. The homogeneity constraint in instance-level feature aggregation filters out many defective proposals, making the feature aggregation more accurate; and (3) a correlation-based feature alignment module embedded in the instance-level feature aggregation, which aligns the feature maps of the support and target proposals. Without bells or whistles, the proposed method achieves state-of-the-art performance on the ImageNet VID dataset without any post-processing methods. This project is publicly available https://github.com/LiangHann/Class-aware-Feature-Aggregation-Network-for-Video-Object-Detection.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations