A Privacy-Preserving Similarity Search Scheme over Encrypted Word Embeddings

2019 
Recent evolution in cloud computing platforms have attracted the largest amount of data than ever before. Today, even the most sensitive data are being outsourced, thus, protection is essential to ensure that privacy is not traded for the convenience provided by cloud platforms. Traditional symmetric encryption schemes provide good protection; however, they ruin the merits of cloud computing. Attempts have been made to obtain a scheme where both functionality and protection can be achieved. However, features provided in existing searchable encryption schemes tend to be left behind the latest findings in the information retrieval (IR) area. In this study, we propose a privacy-preserving similar document search system based on Simhash. Our scheme is open to the latest machine-learning based IR schemes, and performance has been tuned utilizing a VP-tree based index, which is optimized for security. Analysis and various tests on real-world datasets demonstrate the scheme's security and efficiency on real-world datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []