PFWNet: Pretraining neural network via feature jigsaw puzzle for weakly-supervised temporal action localization

2021 
Abstract Weakly supervised temporal action localization is a challenging yet interesting task. Existing methods usually apply a few temporal convolutional layers or linear layers to predict classification scores, where the model capacity is limited. Inspired by counterpart researches, increasing model capacity is the potential to improve the localization performance. However, under the weakly supervised paradigm, the video-level classification label is insufficient to learn large-capacity models. The essential reason lies in that most of the inputs to action localization networks are high-level features extracted by video recognition models. In lack of off-the-shelf initialization weights, the action localization networks have to train from scratch and can only explore low-capacity models. In this work, we are inspired by the self-supervised learning paradigm and propose to learn high-quality representative models via solving the feature jigsaw puzzle task. The proposed self-supervised pretraining process can explore networks with large kernel size and deeper layers, which can provide valuable initialization to action localization networks. In the implementation, we first discover potential action scopes via calculating motion intensity. Then, we cut features into snippets and permute them into out-of-order status. We randomly discard frames for boundaries between two snippets to guide the network learning high-level representations and prevent information leakage. Moreover, because the potential permutation number factorially rises with the increase of snippet number, we select a fixed number of permutation operations via the maximum hamming distance criterion, which eases the learning process. Comprehensive experiments on two benchmarks demonstrate the efficiency of pretraining to weakly supervised action localization task, and the proposed method builds new state-of-the-art performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    85
    References
    0
    Citations
    NaN
    KQI
    []