The Application of Centroid Tracking Algorithm in Video Action Recognition

There are many benchmark data sets for video action recognition, such as UCF-101, Activity Net and Kinetics data set. They are all the label scheme based on image classification, distributing each video or video clip of the data set a label. But there does not exist a data set which for complex scenarios, such as many different action scenes at same time. And the video pixels of the benchmark data set is small which is 224*224, and a person's actions will account for half of the screen. The model trained with this data set is not suitable for large pixel video. The standard video pixel taken by the camera is now 1920*1080, and the human motion area only takes up a small part of the screen, so the video classification model is not suitable for this situation. This paper provides a general method for the recognition for large-size pixel video and action video of multiple people with different actions at the same time. A network combine YOLOV5 object detection network, centroid tracking algorithm and C3D video action recognition network. It can recognize multiple person's actions in the video.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader