Achieving Optimal K-Anonymity Parameters for Big Data

2018 
Datasets containing private and sensitive information are useful for data analytics. Data owners cautiously release such sensitive data using privacy-preserving publishing techniques. Personal re-identification possibility is much larger than ever before. For instance, social media has dramatically increased the exposure to privacy violation. One well-known technique of k-anonymity proposes a protection approach against privacy exposure. K-anonymity tends to find k equivalent number of data records. The chosen attributes are known as Quasi-identifiers. This approach may reduce the personal re-identification. However, this may lessen the usefulness of information gained. The value of k should be carefully determined, to compromise both security and information gained. Unfortunately, there is no any standard procedure to define the value of k. The problem of the optimal k-anonymization is NP-hard. In this paper, we propose a greedy-based heuristic approach that provides an optimal value for k. The approach evaluates the empirical risk concerning our Sensitivity-Based Anonymization method. Our approach is derived from the fine-grained access and business role anonymization for big data, which forms our framework.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []