Learned Cardinality Estimation for Similarity Queries

2021 
In this paper, we study the problem of using deep neural networks (DNNs) for estimating the cardinality of similarity queries. Intuitively, DNNs can capture the distribution of data points, and learn to predict the number of data points that are similar to one data point (a similarity search) or a set of data points (a similarity join). However, DNNs are data hungry; directly training a DNN often results in poor performance. We propose two strategies to improve the accuracy and reduce the size of training data: query segmentation and data segmentation. Query segmentation divides a query into query segments, trains a neural network for each query segment, and combines their outputs with subsequent DNNs to get the query embedding. Data segmentation groups similar data into data segments, train a local-model for each data segment, and learn a global-model to decide which local-models should be used for a given query. The estimates from selected local-models will be summed up as the final estimate.We also extend our model to support similarity joins, which trains a DNN to directly estimate the cumulative sum of objects that are similar to a set of queries. Experiments show that our methods can efficiently (i.e., with small training data) learn to estimate the cardinality of similarity searches/joins, and yield effective estimates (i.e., close to real cardinalities).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    43
    References
    2
    Citations
    NaN
    KQI
    []