LSR-Forest: An LSH-Based Approximate k-Nearest Neighbor Query Algorithm on High-Dimensional Uncertain Data

2020 
Uncertain data is widely used in many practical applications, such as data cleaning, location-based services, privacy protection and so on. With the development of technology, the data has a tendency to high-dimensionality. The most common indexes for nearest neighbor search on uncertain data are the R-Tree and the KD-Tree. These indexes will inevitably bring about “curse of dimension”. Focus on this problem, this paper proposes a new hash algorithm, called the LSR-forest, which based on the locality-sensitive hashing and the R-Tree, to solve the high-dimensional uncertain data approximate neighbor search problem. The LSR-forest can hash similar high dimensional uncertain data into a same bucket with a high probability, and then constructs multiple R-Tree-based indexes for hashed buckets. When querying, it is possible to judge neighbors by checking the data in the hypercube which the query point is in. One can also adjust the query range automatically by different parameter k. Many experiments on different data sets are presented in this paper. The results show that LSR-forest has better effectiveness and efficiency than R-Tree on high-dimensional datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    0
    Citations
    NaN
    KQI
    []