Towards Practical Compressed Video Action Recognition: A Temporal Enhanced Multi-Stream Network

2021 
Current compressed video action recognition methods are mainly based on complete data. However, in a real transmission scenario, the compressed video packets are usually disorderly received and even lost due to network jitters or congestion. To recognize actions in early phases with limited packets, e.g. for quickly forecasting possible potential risks, in this paper, we propose a Temporal Enhanced Multi-Stream Network (TEMSN) towards practical compressed video action recognition. First, we make use of three modalities in the compressed domain as complementary cues and build a multi-stream network to capture rich information from compressed video packets. Second, we design a temporal enhanced module based on an Encoder-Decoder structure, which is applied to each stream to infer missing packets, generating more accurate action dynamics. Thanks to the multiple modalities and their temporal enhancement, our approach better models actions with partial available compressed video packets. Experiments on the HMDB-51 and UCF-101 datasets validate its effectiveness and efficiency.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    0
    Citations
    NaN
    KQI
    []