MFCC Based Audio Classification Using Machine Learning

2021 
Emotion classification is very easy to detect by any human being with noticing the change in facial appearance or tone of voice of the other person. But for any machine to understand and decode it, becomes very complex. This domain is very important and relevant in the present era as it can be used and modelled for taking feedback from the customer regarding any product or hotel etc. The idea behind creating this proposed solution was to build a machine learning model that will detect emotions from the speech of any concerned persons. The main objective for this solution is to acknowledge emotions in speech and classifying them into 8 emotions, they are unbiased, cool, ecstatic, poignant, furious, fearful, shock, and astonished. The proposed approach relies on the Mel Frequency Cepstral coefficients (MFCC) and energy of the speech signals as the core feature inputs to be taken for processing. To serve this purpose, we have used a RAVDESS database of emotional speech. One feature extraction is performed, then the so obtained feature vectors, are successively used to train different Machine Learning built classification algorithms. Those algorithms include Decision tree, Random Forest, and Support Vector Machine (SVM). Finally, from the study conducted, we were able to achieve the highest accuracy of 88.54using the random forest algorithm when compared with others.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    0
    Citations
    NaN
    KQI
    []