Privacy and Utility Preserving Data Transformation for Speech Emotion Recognition

2021 
Speech carries rich information not only about an individual’s intent but about demographic traits, physical and psychological state among other things. Notably, continuously worn wearable sensors enable researchers to collect egocentric speech data to study and assess real-life expressed emotions, offering unprecedented opportunities for applications in the field of assistive agents, medical diagnoses, and personalized education. Many existing systems collect and transmit these speech data, either processed or unprocessed, from users’ devices to a central server for post analysis. However, egocentric audio sensing for speech emotion recognition has created concerns and risks to privacy, where unintended/improper inferences of sensitive information and demographic information may occur without user consent. Toward addressing these concerns, in this work, we propose a privacy-preserving data transformation technique to mitigate potential threats associated with sensitive information and demographic inferences. The proposed mechanism combines an autoencoder architecture, called replacement autoencoder, with gradient reversal layer to remove sensitive information inside the data, such as sensitive labels and demographics. We empirically validate our approach for predicting emotions using three commonly used datasets for speech emotion recognition. We show that our method can effectively prevent inferences of sensitive emotions and demographic information. We further show that the improved privacy comes at a cost of a minor utility loss for the target application.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    0
    Citations
    NaN
    KQI
    []