Image based Emotional State Prediction from Multiparty Audio Conversation

2020 
Recognizing human emotion is a complex task and is being researched upon since couple of decades. The problem has still gained popularity because of its need in various domains, when it comes to human computer interaction or human robot interaction. As per researchers, human predict other persons state of mind by observing various parameters, 70% of them being non-verbal. Human have emotions embedded in their speech, pose, gesture, context, facial expressions, and even the past history of conversation or situation. These all sub problems can be beautifully solved using learning based techniques. Predicting emotion in multi party audio based conversation aids complexity to the problem, which needs to predict intent of speech, culture, accent of talking, gender and many other diversities. There are various attempts made by researchers to classify human audio into required classes, using Support Vector Machine model, Long Short Term Memeory (LSTM) and bi-LSTM on audio input. We propose an image based emotional classification approach for an audio conversation. Spectrogram of an audio signal plotted as an image is used as a input to Convolutional Neural Network model obtaining the pattern for classification. The proposed approach is able to provide an accuracy of around 86% on test dataset, which is considerable improvement over state of the art models.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []