Compact Representation and Reliable Classification Learning for Point-Level Weakly-Supervised Action Localization

2022 
Point-level weakly-supervised temporal action localization (P-WSTAL) aims to localize temporal extents of action instances and identify the corresponding categories with only a single point label for each action instance for training. Due to the sparse frame-level annotations, most existing models are in the localization-by-classification pipeline. However, there exist two major issues in this pipeline: large intra-action variation due to task gap between classification and localization and noisy classification learning caused by unreliable pseudo training samples. In this paper, we propose a novel framework CRRC-Net, which introduces a co-supervised feature learning module and a probabilistic pseudo label mining module, to simultaneously address the above two issues. Specifically, the co-supervised feature learning module is applied to exploit the complementary information in different modalities for learning more compact feature representations. Furthermore, the probabilistic pseudo label mining module utilizes the feature distances from action prototypes to estimate the likelihood of pseudo samples and rectify their corresponding labels for more reliable classification learning. Comprehensive experiments are conducted on different benchmarks and the experimental results show that our method achieves favorable performance with the state-of-the-art.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    73
    References
    0
    Citations
    NaN
    KQI
    []