Combining ConvNets with hand-crafted features for action recognition based on an HMM-SVM classifier

2018 
This paper proposes a new framework for RGB-D-based action recognition that takes advantages of hand-designed features from skeleton data and deeply learned features from depth maps, and exploits effectively both the local and global temporal information. Specifically, depth and skeleton data are firstly augmented for deep learning and making the recognition insensitive to view variance. Secondly, depth sequences are segmented using the handcrafted features based on skeleton joints motion histogram to exploit the local temporal information. All training segments are clustered using an Infinite Gaussian Mixture Model (IGMM) through Bayesian estimation and labelled for training Convolutional Neural Networks (ConvNets) on the depth maps. Thus, a depth sequence can be reliably encoded into a sequence of segment labels. Finally, the sequence of labels is fed into a joint Hidden Markov Model and Support Vector Machine (HMM-SVM) classifier to explore the global temporal information for final recognition. The proposed framework was evaluated on the widely used MSRAction-Pairs, MSRDailyActivity3D and UTD-MHAD datasets and achieved promising results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    9
    Citations
    NaN
    KQI
    []