Selective Dependency Aggregation for Action Classification

2021 
Video data are distinct from images for the extra temporal dimension, which results in more content dependencies from various perspectives. It increases the difficulty of learning representation for various video actions. Existing methods mainly focus on the dependency under a specific perspective, which cannot facilitate the categorization of complex video actions. This paper proposes a novel selective dependency aggregation (SDA) module, which adaptively exploits multiple types of video dependencies to refine the features. Specifically, we empirically investigate various long-range and short-range dependencies achieved by the multi-direction multi-scale feature squeeze and the dependency excitation. Query structured attention is then adopted to fuse them selectively, fully considering the diversity of videos' dependency preferences. Moreover, the channel reduction mechanism is involved in SDA for controlling the additional computation cost to be lightweight. Finally, we show that the SDA module can be easily plugged into different backbones to form SDA-Nets and demonstrate its effectiveness, efficiency and robustness by conducting extensive experiments on several video benchmarks for action classification. The code and models will be available at https://github.com/ty-97/SDA.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    51
    References
    0
    Citations
    NaN
    KQI
    []