I2Net: Mining Intra-video and Inter-video Attention for Temporal Action Localization

2021 
Abstract This paper focuses on two challenges for temporal action localization community, i.e., lack of long-term relationship and action pattern uncertainty. The former prevents the cooperation among multiple action instances within a video, while the latter may cause incomplete localizations or false positives. The lack of long-term relationship challenge results from the limited receptive field. Instead of stacking multiple layers or using large convolution kernels, we propose the intra-video attention mechanism to bring global receptive field to each temporal point. As for the action pattern uncertainty challenge, although it is hard to precisely depict the desired action pattern, paired videos that share the same action category can provide complementary information about action pattern. Consequently, we propose an inter-video attention mechanism to assist learning accurate action patterns. Based on the intra-video attention and inter-video attention, we propose a unified framework, namely I2Net, to tackle the challenging temporal action localization task. Given two videos containing sharing action categories, I2Net adopts the widely used one-stage action localization paradigm to dispose of them in parallel. As for two neighboring layers within the same video, the intra-video attention brings global information to each temporal point and helps to learn representative features. As for two parallel layers between two videos, the inter-video attention introduces complementary information to each video and helps to learn accurate action patterns. With the cooperation of intra-video and inter-video attention mechanisms, I2Net shows obvious performance gains over the baseline and builds new state-of-the-art on two widely-used benchmarks, i.e., THUMOS14 and ActivityNet v1.3.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    75
    References
    1
    Citations
    NaN
    KQI
    []