Towards Privacy-Preserving Speech Data Publishing

Jianwei Qian Illinois Institute of Technology, USA
Feng Han University of Science and Technology of China, P.R. China
Jiahui Hou Illinois Institute of Technology, USA
Chunhong Zhang Beijing University of Posts & Telecommunication, P.R. China
Yu Wang University of North Carolina at Charlotte, USA
Xiangyang Li University of Science and Technology of China, P.R. China


Privacy-preserving data publishing has been a heated research topic in the last decade. Numerous ingenious attacks on users' privacy and defensive measures have been proposed for the sharing of various data, varying from relational data, social network data, spatiotemporal data, to images and videos. Speech data publishing, however, is still untouched in the literature. To fill this gap, we study the privacy risk in speech data publishing and explore the possibilities of performing data sanitization to achieve privacy protection while preserving data utility simultaneously. We formulate this optimization problem in a general fashion and present thorough quantifications of privacy and utility. We analyze the sophisticated impacts of possible sanitization methods on privacy and utility, and also design a novel method-key term perturbation for speech content sanitization. A heuristic algorithm is proposed to personalize the sanitization for speakers to restrict their privacy leak (p-leak limit) while minimizing the utility loss. The simulations of linkage attacks and sanitization on real datasets validate the necessity and feasibility of this work.

You may want to know: