|Jianwei Qian||Illinois Institute of Technology, USA|
|Feng Han||University of Science and Technology of China, P.R. China|
|Jiahui Hou||Illinois Institute of Technology, USA|
|Chunhong Zhang||Beijing University of Posts & Telecommunication, P.R. China|
|Yu Wang||University of North Carolina at Charlotte, USA|
|Xiangyang Li||University of Science and Technology of China, P.R. China|
Privacy-preserving data publishing has been a heated research topic in the last decade. Numerous ingenious attacks on users' privacy and defensive measures have been proposed for the sharing of various data, varying from relational data, social network data, spatiotemporal data, to images and videos. Speech data publishing, however, is still untouched in the literature. To fill this gap, we study the privacy risk in speech data publishing and explore the possibilities of performing data sanitization to achieve privacy protection while preserving data utility simultaneously. We formulate this optimization problem in a general fashion and present thorough quantifications of privacy and utility. We analyze the sophisticated impacts of possible sanitization methods on privacy and utility, and also design a novel method-key term perturbation for speech content sanitization. A heuristic algorithm is proposed to personalize the sanitization for speakers to restrict their privacy leak (p-leak limit) while minimizing the utility loss. The simulations of linkage attacks and sanitization on real datasets validate the necessity and feasibility of this work.