Accelerated distance computation with encoding tree/forest for high dimensional data

2016 
Vector quantization based methods are important tools for large scale search. They perform lossy compression on the dataset, then the distance between an uncompressed vector and compressed dataset can be efficiently computed via asymmetric distance computation (ADC). We show that for large datasets, ADC conducts excessive computations on vectors sharing the same prefix. To this end, we propose encoding tree (E-Tree) to avoid the excessive computations by storing the encodings on a radix-tree like data-structure. The memory consumption is also lowered. We then propose encoding forest (E-Forest) to further lower the computation cost. E-Tree and E-Forest are compatible with any quantization-based methods. Furthermore, unlike ordinary tree structures, our E-Tree and E-Forest essentially store the encoded vectors linearly in the memory, thus they are easy to implement and compute efficiently. We show by experiments that our methods bring significant speed-up compared to the vanilla ADC method. Therefore, E-Tree and E-Forest offer a practical and useful solution for real-world problems.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    0
    Citations
    NaN
    KQI
    []