Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features

Nasir,Arindam Jati,Prashanth Gurunath Shivakumar,Sandeep Nallan Chakravarthula,Panayiotis G. Georgiou

Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features

2016

Nasir
Arindam Jati
Prashanth Gurunath Shivakumar
Sandeep Nallan Chakravarthula
Panayiotis G. Georgiou

Automatic classification of depression using audiovisual cues can help towards its objective diagnosis. In this paper, we present a multimodal depression classification system as a part of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). We investigate a number of audio and video features for classification with different fusion techniques and temporal contexts. In the audio modality, Teager energy cepstral coefficients~(TECC) outperform standard baseline features; while the best accuracy is achieved with i-vector modelling based on MFCC features. On the other hand, polynomial parameterization of facial landmark features achieves the best performance among all systems and outperforms the best baseline system as well.

Keywords:

Artificial intelligence
Computer vision
Speech recognition
Fusion
Cepstrum
Mel-frequency cepstrum
Computer science
i vector
Polynomial
baseline system
Landmark

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations