A High-Performance Index for Real-Time Matrix Retrieval

2020 
With the embedding techniques, many real-world objects can be represented using matrices. For example, a document can be represented by a matrix, where each row of the matrix represents a word. On the other hand, we have witnessed that many applications continuously generate new data represented by matrices and require real-time query answering on the data. These continuously generated matrices need to be well managed for efficient retrieval. In this paper, we propose an index for real-time matrix retrieval. Besides fast query response, the index also supports real-time insertion by exploiting the LSM-tree. Since the index is built for matrices, it consumes much more memory and requires much more time to search than the traditional index for information retrieval. To tackle the challenges, we power our proposed index with precise and fuzzy inverted lists, and propose a series of novel techniques to improve the memory consumption and the search efficiency of the index. The proposed techniques include vector signature, vector residual sorting, hashing based lookup, and dictionary initialization to guarantee the index quality. Comprehensive experimental results show that our proposed index can support real-time search on matrices and is more efficient than the state-of-the-art method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []