Scalable Distributed Data Anonymization

Sabrina De Capitani di Vimercati,Dario Facchinetti,Sara Foresti,Gianluca Oldani,Stefano Paraboschi,Matthew Rossi,Pierangela Samarati

Scalable Distributed Data Anonymization

2021

Sabrina De Capitani di Vimercati
Dario Facchinetti
Sara Foresti
Gianluca Oldani
Stefano Paraboschi
Matthew Rossi
Pierangela Samarati

We present an approach for enabling a distributed anonymization process over large collections of sensor data. Our approach anonymizes large datasets (which might not fit in main memory) using an arbitrary number of workers within the Spark framework. We describe how to parallelize the anonymization process through a proper partitioning of the dataset. Our experimental evaluation shows that the proposed approach is scalable and do not affect the quality of the anonymized dataset.

Keywords:

scalable distributed
Computer science
Ubiquitous computing
Spark (mathematics)
Scalability
Data mining
Distributed database
Process (computing)
Visualization
Data anonymization

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations