Multi-actor activity detection by modeling object relationships in extended videos based on deep learning

2022 
We present a Multi-actor Activity Detection Framework (MADF) to model the interactive relationship among multiple actors for activity detection in extended videos. MADF can detect 3 groups of multi-actor activities with different kinds of actors, which involves three stages: detection, classification and post-processing. In the detection stage, both interaction proposals and actor proposals are generated in each video clip, in order to eliminate irrelevant background in the scene. In the classification stage, 3 different classification networks are proposed to classify the 3 groups of activities. And further, for person-object person–object interaction, an attention mechanism is adopted to help the person-object person–object classification network to pay more attention to the small-scale objects; for person-person person–person interaction, a suppression module is used to improve the accuracy of the person-person person–person activity detection; for person-vehicle person–vehicle interaction, a spatial–temporal graph convolution network (GCN) module is embedded to model the fine-grained relationship between the person and vehicle in the person-vehicle person–vehicle classification network, with a proposed Mutually Exclusive Category Loss (MECLoss) helping this network distinguish mutually exclusive activities. At last, we use the off-the-shelf post-processing methods to re-score the proposals for more stable results. The proposed system achieves a great progress on our baseline and achieves the state-of-the-art results in TRECVID 2021 ActEV challenge.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []