Privacy Risk Assessment for Text Data Based on Semantic Correlation Learning

2021 
Privacy risk assessment determines the extent to which generalization and obfuscation should be applied to the sensitive data. In this paper, we propose PriTxt for evaluating the privacy risk associated with text data by exploiting the semantic correlation. Using definitions derived from the General Data Protection Regulation (GDPR), PriTxt first defines the private features that related to individual privacy. By using the word2vec algorithm, a word-embedding model is further constructed to identify the quasi-sensitive words. The privacy risk of a given text is finally evaluated by aggregating the weighted risks of the sensitive and the quasi-sensitive words in the text.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []