language-icon Old Web
English
Sign In

Canopy-Based Private Blocking

2018 
Integrating data from different sources often involves using personal information for linking records that correspond to the same real-world entities. This raises privacy concerns, leading to development of privacy preserving record linkage (PPRL) techniques which aim to conduct linkage without revealing private or confidential information of the corresponding entities. To make privacy methods scalable to large datasets, in this paper, we propose a novel blocking approach that adapts canopy clustering for a private setting. Our approach features using public reference data as a basis to form blocks, and involving redundancy in block assignments. We provide an analysis on the approach’s privacy and experimentally evaluate its performance in terms of efficiency and effectiveness. The results show that our approach is scalable with the size of datasets and achieves better quality than the state-of-the-art sorted neighborhood based approaches.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []