An Enhanced Hybrid Clustering Approach for Privacy Preservation (ECPS) in Big Data using Apache Spark Framework

2019 
In the era of big data mining, clustering is the organization of objects into groups based on applied similarity metrics. Among those metrics, the k-means approach is considered the most popular algorithm so far. As a result of the advancement in parallel processing and distributed computation, traditional techniques are not as efficient in computing the centroid when performing clustering. Consequently, this issue has an effect on privacy preservation when it comes to processing individuals’ sensitive information. Hence, an optimal clustering technique an enhanced hybrid clustering approach for privacy preservation in the context of big data perspective as well as in the context of the preservation of individual privacy protection from background knowledge attack is proposed in this article. The first approach depicts a combination of the ant colony optimization and firefly techniques for choosing the better centroid position with the data. The next approach is about combining the differential privacy algorithm, which uses the Laplace mechanism for augmenting additional noise to the individual’s data to make privacy preservation more robust. With the evolving trends and technologies, the amount of data being generated is increasing at an overwhelming rate. Thus, the proposed approaches are designed in such a way that they can be adapted to the changing needs of big data. The proposed algorithms are efficient when compared with the existing clustering algorithms and provide better performance by guaranteeing privacy. The implementation of the proposed works is done upon the Apache Spark with the big data framework.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []