VIDEOWHISPER: Towards unsupervised learning of discriminative features of videos with RNN

Na Zhao,Hanwang Zhang,Mingxing Zhang,Richang Hong,Meng Wang,Tat-Seng Chua

VIDEOWHISPER: Towards unsupervised learning of discriminative features of videos with RNN

2017

We present VidedWhisfer, a novel approach for unsupervised video representation learning, in which video sequence is treated as a self-supervision entity based on the observation that the sequence encodes video temporal dynamics (e.g., object movement and event evolution). Specifically, for each video sequence, we use a pre-learned visual dictionary to generate a sequence of high-level semantics, dubbed “whisper”, which encodes both visual contents at the frame level and visual dynamics at the sequence level. VidedWhisfer is driven by a novel “sequence-to-whisper” learning strategy. Naturally, an end-to-end sequence-to-sequence learning model using RNN is modeled and trained to predict the whisper sequence. We propose two ways to generate video representation from the model. Through extensive experiments we demonstrate that video representation learned by VidedWhisfer is effective to boost fundamental video-related applications such as video retrieval and classification.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations