|Shaowei Wang||University of Science and Technology of China, P.R. China|
|Liusheng Huang||University of Science and Technology of China, P.R. China|
|Yiwen Nie||University of Science and Technology of China, P.R. China|
|Pengzhan Wang||University of Science and Technology of China, P.R. China|
|Hongli Xu||University of Science and Technology of China, P.R. China|
|Wei Yang||University of Science and Technology of China, P.R. China|
Set-valued data is useful for representing a rich family of information in numerous areas, such as market basket data of online shopping, apps on mobile phones and web browsing history. By analyzing set-valued data that are collected from users, service providers could learn the demographics of the users, the patterns of their usages, and finally, improve the quality of services for them. However, privacy has been an increasing concern in collecting and analyzing users' set-valued data, since these data may reveal sensitive information (e.g., identities, preferences and diseases) about individuals. In this work, we propose a privacy preserving aggregation mechanism for set-valued data: PrivSet. It provides rigorous data privacy protection locally (e.g., on mobile phones or wearable devices) and efficiently (its computational overhead is linear to the item domain size) for each user, and meanwhile allowing effective statistical analyses (e.g., distribution estimation of items, distribution estimation of set cardinality) on set-valued data for service providers. More specifically, in PrivSet, within the constraints of local ϵ-differential privacy, each user independently responses with a subset of the set-valued data domain with calibrated probabilities, hence the true positive/false positive rate of each item is balanced and the performance of distribution estimation is optimized. Besides presenting theoretical error bounds of PrivSet and proving its optimality over existing approaches, we experimentally validate the mechanism, the experimental results illustrate that the estimation error in PrivSet has been reduced by half when compared to state-of-the-art approaches.