Continuous Action Recognition and Segmentation in Untrimmed Videos

2018 
Recognizing continuous human action is a fundamental task in many real-world computer vision applications including video surveillance, video retrieval, and human-computer interaction, etc. It requires to recognize each action performed as well as their segmentation boundaries in a continuous sequence. In previous works, great progress has been reported for single action recognition, by using deep convolutional networks. In order to further improve the performance for continuous action recognition, in this paper, we introduce a discriminative approach consisting of three modules. The first feature extraction module uses a two stream Convolutional Neural Network to capture the appearance and the short-term motion information from the raw video input. Based on the obtained features, the second classification module performs spatial and temporal recognition and then fuses the two scores from respective feature stream. In the final segmentation module, a semi-Markov Conditional Field model, capable of handling long-term action interactions, is built to partition the action sequence. As can be seen in the experimental results, our approach obtains state-of-the-art performance on public datasets including 50Salads, Breakfast, and MERL Shopping. We have also visualized the continuous actions segmentation results for more insightful discussion in the paper.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    1
    Citations
    NaN
    KQI
    []