A Bayesian nonparametric multimodal data modeling framework for video emotion recognition

2017 
Video emotion recognition as an emerging research field has been attracting more and more focus in recent years. However, such work is quite challenging, since human emotions are hard to differentiate precisely due to its complexity and diversity, moreover, the expressions of sentiment in a content-rich video are sparse. Previous studies presented a number of approaches to try to learn human emotions on video level by exploiting various video features. However, most of works just used simple low-level video features such as hand-crafted image features, and they also did not consider the further latent connections among different multimodal data within a video. To tackle these problems, we develop a novel Bayesian non-parametric multimodal data modeling framework to learn the emotions from video, where the adopted image data are deep features extracted from key frames of video via convolutional neural networks (CNNs), and the adopted audio data are Mel-frequency cepstral coefficient (MFCC) features. In this framework, we then use a symmetric correspondence hierarchical Dirichlet processes (Sym-cHDP) model to mine their latent emotional events (topics) between image features and audio features. Finally, the effectiveness of our framework is demonstrated via comprehensive experimentations.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    4
    Citations
    NaN
    KQI
    []