Multi-dimensional Speaker Information Recognition with Multi-task Neural Network

2018 
This paper proposes a novel approach to simultaneously estimate speaker identity and the other two traits of the speaker, specifically, emotion and gender. This simultaneous process is named as multi-dimensional speaker information recognition. We choose i-vector to represent the utterance with different duration as fixed length vector. In our paper, we first build individual neural networks (NN) for each single recognition task. In this single task, the input is i-vector of one speaker and the output is the corresponding one-hot vector. We then design a multi-task learning (MTL), the difference with the single task is the NN output, which contains three parallel one-hot vectors for MTL. The one-hot vectors denote the identity, emotion and gender information, respectively. The proposed approach is conducted on the KSU-Emotions corpus, which was recorded mainly for emotional recognition. Experimental results show that the best multi-task mechanism recognition accuracy is 4.6% better than the single task.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    3
    Citations
    NaN
    KQI
    []