SDAN: Stacked Diverse Attention Network for Video Action Recognition

2021 
Recently, deep learning methods have proved exceptional performance in video action recognition. However, it remains a challenging problem to extract discriminative features from videos effectively. Most existing methods mainly focus on spatial-temporal information separately. In this paper, we propose a novel Stacked Diverse Attention Network (SDAN). It uses a Multi-dimensional Attention Module to emphasize informative maps along the channel and spatial-temporal dimension. Besides, a Supervised Attention Module is designed to generate a weighted feature map in a class-supervised way. Compared with previous methods, the proposed method has the following advantages: (1) The Multi-dimensional Attention Module can exploit effective combinations and correlation of attention mechanisms along different dimensions. (2) The Supervised Attention Module improves the network capability supervised directly by action labels which reveal the class-related object and limb information. Extensive experimental evaluation demonstrates the effectiveness of the proposed approach and establishes significant results on Kinetics400, UCF101 and HMDB51 Datasets. Codes are available on https://github.com/jeff62802217/SDAN-Pytorch.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []