language-icon Old Web
English
Sign In

Cosine similarity

Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0° is 1, and it is less than 1 for any angle in the interval (0,π] radians. It is thus a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors oriented at 90° relative to each other have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude. The cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [ 0 , 1 ] {displaystyle } . The name derives from the term 'direction cosine': in this case, unit vectors are maximally 'similar' if they're parallel and maximally 'dissimilar' if they're orthogonal (perpendicular). This is analogous to the cosine, which is unity (maximum value) when the segments subtend a zero angle and zero (uncorrelated) when the segments are perpendicular. Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0° is 1, and it is less than 1 for any angle in the interval (0,π] radians. It is thus a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors oriented at 90° relative to each other have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude. The cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [ 0 , 1 ] {displaystyle } . The name derives from the term 'direction cosine': in this case, unit vectors are maximally 'similar' if they're parallel and maximally 'dissimilar' if they're orthogonal (perpendicular). This is analogous to the cosine, which is unity (maximum value) when the segments subtend a zero angle and zero (uncorrelated) when the segments are perpendicular. These bounds apply for any number of dimensions, and the cosine similarity is most commonly used in high-dimensional positive spaces. For example, in information retrieval and text mining, each term is notionally assigned a different dimension and a document is characterised by a vector where the value in each dimension corresponds to the number of times the term appears in the document. Cosine similarity then gives a useful measure of how similar two documents are likely to be in terms of their subject matter. The technique is also used to measure cohesion within clusters in the field of data mining. The term cosine distance is often used for the complement in positive space, that is: D C ( A , B ) = 1 − S C ( A , B ) , {displaystyle D_{C}(A,B)=1-S_{C}(A,B),} where D C {displaystyle D_{C}} is the cosine distance and S C {displaystyle S_{C}} is the cosine similarity. It is important to note, however, that this is not a proper distance metric as it does not have the triangle inequality property—or, more formally, the Schwarz inequality—and it violates the coincidence axiom; to repair the triangle inequality property while maintaining the same ordering, it is necessary to convert to angular distance (see below). One advantage of cosine similarity is its low-complexity, especially for sparse vectors: only the non-zero dimensions need to be considered. Other names of cosine similarity are Orchini similarity and the Tucker coefficient of congruence; Ochiai similarity (see below) is cosine similarity applied to binary data. The cosine of two non-zero vectors can be derived by using the Euclidean dot product formula: Given two vectors of attributes, A and B, the cosine similarity, cos(θ), is represented using a dot product and magnitude as where A i {displaystyle A_{i}} and B i {displaystyle B_{i}} are components of vector A {displaystyle A} and B {displaystyle B} respectively.

[ "Machine learning", "Data mining", "Artificial intelligence", "Pattern recognition", "Information retrieval" ]
Parent Topic
Child Topic
    No Parent Topic