Unsupervised Feature Propagation for Fast Video Object Detection Using Generative Adversarial Networks

2020 
We propose unsupervised Feature Propagation Generative Adversarial Network (denoted as FPGAN) for fast video object detection in this paper. In our video object detector, we detect objects on spare key frames using pre-trained state-of-the-art object detector R-FCN, and propagate CNN features to adjacent frames for fast detection via a light-weight transformation network. To learn the feature propagation network, we make full use of unlabeled video data and employ generative adversarial networks in model training. Specifically, in FPGAN, the generator is the feature propagation network, and the discriminator employs second-order temporal coherence and 3D ConvNets to distinguish between predicted and “ground truth” CNN features. In addition, Euclidean distance loss provided by the pre-trained image object detector is also adopted to jointly supervise the learning. Our method doesn’t need any human labelling in videos. Experiments on the large-scale ImageNet VID dataset demonstrate the effectiveness of our method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    0
    Citations
    NaN
    KQI
    []