Speech Emotion Recognition Using Convolutional Neural Networks

2021 
Speech is the most natural and convenient ways by which humans communicate, and understanding speech is one of the most intricate processes that human brain performs. Speech Emotion Recognition (SER) aims to   recognize human emotion from speech. This is on the fact that voice often reflects underlying emotions through tone and pitch. The libraries used are Librosa for analyzing audio and music, sound file for reading and writing sampled sound file formats, sklearn for building the model. In the current study, the efficacy of Convolutional Neural Network (CNN) in recognition of speech emotions has been investigated.  Spectrograms of the speech signals are used as the input features of the networks. Mel-Frequency Cepstral Coefficients (MFCC) is used to extract features from audio. Our own speech dataset is used to train and evaluate our models. Based on the evaluation, the emotions (happy, sad, angry, neutral, surprised, disgust) of the speech will be detected.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []