Characterizing the optimal pivots for efficient similarity searches in vector space databases with Minkowski distances

Carolina Gómez-Tostón,Manuel Barrena,Álvaro Cortés

Characterizing the optimal pivots for efficient similarity searches in vector space databases with Minkowski distances

2018

Abstract Pivot-based retrieval algorithms are commonly used to solve similarity queries in a number of application domains, such as multimedia retrieval, biomedical databases, time series and computer vision. The query performances of pivot-based index algorithms can be significantly improved by properly choosing the set of pivots that is able to narrow down the database elements to only those relevant to a query. While many other approaches in the literature rely on empirical studies or intuitive observations and assumptions to achieve effective pivot strategies, this paper addresses the problem by using a formal mathematical approach. We conclude in our study that the optimal set of pivots in vector databases with L p metrics is a set of uniformly distributed points on the surface of an n -sphere defined by these metrics. To make the study mathematically tractable, a uniform distribution of data in the database is assumed, allowing us to outline the problem from a purely geometrical point of view. Then, we present experimental results demonstrating the usefulness of our characterization when applied to real databases in the ( R n , L p ) metric space. Our technique is shown to outperform comparable techniques in the literature. However, we do not propose a new pivot-selection technique but rather experiments that are designed exclusively to show the usefulness of such a characterization.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations