Privacy Risk Assessment for Text Data Based on Semantic Correlation Learning

Ping Xiong,Lin Liang,Yunli Zhu,Tianqing Zhu

Privacy Risk Assessment for Text Data Based on Semantic Correlation Learning

2021

Ping Xiong
Lin Liang
Yunli Zhu
Tianqing Zhu

Privacy risk assessment determines the extent to which generalization and obfuscation should be applied to the sensitive data. In this paper, we propose PriTxt for evaluating the privacy risk associated with text data by exploiting the semantic correlation. Using definitions derived from the General Data Protection Regulation (GDPR), PriTxt first defines the private features that related to individual privacy. By using the word2vec algorithm, a word-embedding model is further constructed to identify the quasi-sensitive words. The privacy risk of a given text is finally evaluated by aggregating the weighted risks of the sensitive and the quasi-sensitive words in the text.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations