Generative Approach Using Soft-Labels to Learn Uncertainty in Predicting Emotional Attributes *

2021 
This paper presents a novel speech emotion recognition (SER) method to capture the uncertainty in predicting emotional attributes using the true distribution of scores provided by annotators as ground truth (i.e., soft-labels). Reliable, generalizable, and scalable SER systems are important in areas such as healthcare, customer service, security, and defense. A barrier to build these systems is the lack of quality labels due to the expensive annotation process, leading to poor generalization. To address this limitation, this study proposes a semi-supervised generative modeling approach using a variational autoencoder (VAE) with an emotional regressor at the bottleneck trained with soft-labels of emotional attributes. We demonstrate that estimating uncertainties in predicting emotional attribute scores is possible with soft-labels. We analyze the benefits of uncertainty estimation with a reject option formulation, where the model can abstain from predicting emotion when it is less confident. At 60% test coverage, we achieve relative improvements in concordance correlation coefficient (CCC) up to 16.85% for valence, 7.12% for arousal, and 8.01% for dominance. Furthermore, we propose an uncertainty transfer learning strategy where uncertainties learned from one attribute are used as a sample re-ordering criterion for another attribute, achieving additional improvements in prediction performance for valence. We also demonstrate the generalization power of our method in comparison to other uncertainty estimating methods using cross-corpus evaluations. Finally, we demonstrate that our method has lower computational complexity than alternative approaches.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    0
    Citations
    NaN
    KQI
    []