Speech Emotion Recognition Using Convolutional- Recurrent Neural Networks with Attention Model

2017 
Speech Emotion Recognition (SER) plays an important role in human-computer interface and assistant technologies. In this paper, a new method is proposed using distributed Convolution Neural Networks (CNN) to automatically learn affect-salient features from raw spectral information, and then applying Bidirectional Recurrent Neural Network (BRNN) to obtain the temporal information from the output of CNN. In the end, an Attention Mechanism is implemented on the output sequence of the BRNN to focus on target emotion-pertinent parts of an utterance. This attention mechanism not only improves the classification accuracy, but also provides model’s interpretability. Experimental results show that this approach can gain 64.08% weighted accuracy and 56.41% unweighted accuracy for four-emotion classification in IEMOCAP dataset, which outperform previous results reported for this dataset.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    8
    Citations
    NaN
    KQI
    []