language-icon Old Web
English
Sign In

Locality-sensitive hashing

In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same 'buckets' with high probability. (The number of buckets are much smaller than the universe of possible input items.) Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items. In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same 'buckets' with high probability. (The number of buckets are much smaller than the universe of possible input items.) Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items. Hashing-based approximate nearest neighbor search algorithms generally use one of two main categories of hashing methods: either data-independent methods, such as locality-sensitive hashing (LSH); or data-dependent methods, such as Locality-preserving hashing (LPH). An LSH family F {displaystyle {mathcal {F}}} is defined for a metric space M = ( M , d ) {displaystyle {mathcal {M}}=(M,d)} , a threshold R > 0 {displaystyle R>0} and an approximation factor c > 1 {displaystyle c>1} . This family F {displaystyle {mathcal {F}}} is a family of functions h : M → S {displaystyle h:{mathcal {M}} o S} which map elements from the metric space to a bucket s ∈ S {displaystyle sin S} . The LSH family satisfies the following conditions for any two points p , q ∈ M {displaystyle p,qin {mathcal {M}}} , using a function h ∈ F {displaystyle hin {mathcal {F}}} which is chosen uniformly at random: A family is interesting when P 1 > P 2 {displaystyle P_{1}>P_{2}} . Such a family F {displaystyle {mathcal {F}}} is called ( R , c R , P 1 , P 2 ) {displaystyle (R,cR,P_{1},P_{2})} -sensitive. Alternatively it is defined with respect to a universe of items U that have a similarity function ϕ : U × U → [ 0 , 1 ] {displaystyle phi :U imes U o } . An LSH scheme is a family of hash functions H coupled with a probability distribution D over the functions such that a function h ∈ H {displaystyle hin H} chosen according to D satisfies the property that P r h ∈ H [ h ( a ) = h ( b ) ] = ϕ ( a , b ) {displaystyle Pr_{hin H}=phi (a,b)} for any a , b ∈ U {displaystyle a,bin U} . A locality-preserving hashing is a hash function f that maps a point or points in a multidimensional coordinate space to a scalar value, such that if we have three points A, B and C such that

[ "Hash table", "Hash function", "Extendible hashing", "SUHA", "learning to hash", "Hopscotch hashing", "Pearson hashing" ]
Parent Topic
Child Topic
    No Parent Topic