k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.1. k initial 'means' (in this case k=3) are randomly generated within the data domain (shown in color).2. k clusters are created by associating every observation with the nearest mean. The partitions here represent the Voronoi diagram generated by the means.3. The centroid of each of the k clusters becomes the new mean.4. Steps 2 and 3 are repeated until convergence has been reached. k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. The problem is computationally difficult (NP-hard); however, efficient heuristic algorithms converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both k-means and Gaussian mixture modeling. They both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes. The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means due to the name. Applying the 1-nearest neighbor classifier to the cluster centers obtained by k-means classifies new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm. Given a set of observations (x1, x2, …, xn), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k (≤ n) sets S = {S1, S2, …, Sk} so as to minimize the within-cluster sum of squares (WCSS) (i.e. variance). Formally, the objective is to find: where μi is the mean of points in Si. This is equivalent to minimizing the pairwise squared deviations of points in the same cluster: The equivalence can be deduced from identity ∑ x ∈ S i ‖ x − μ i ‖ 2 = ∑ x ≠ y ∈ S i ( x − μ i ) ( μ i − y ) {displaystyle sum _{mathbf {x} in S_{i}}left|mathbf {x} -{oldsymbol {mu }}_{i} ight|^{2}=sum _{mathbf {x} eq mathbf {y} in S_{i}}(mathbf {x} -{oldsymbol {mu }}_{i})({oldsymbol {mu }}_{i}-mathbf {y} )} . Because the total variance is constant, this is equivalent to maximizing the sum of squared deviations between points in different clusters (between-cluster sum of squares, BCSS), which follows from the law of total variance. The term 'k-means' was first used by James MacQueen in 1967, though the idea goes back to Hugo Steinhaus in 1956. The standard algorithm was first proposed by Stuart Lloyd of Bell Labs in 1957 as a technique for pulse-code modulation, though it wasn't published as a journal article until 1982. In 1965, Edward W. Forgy published essentially the same method, which is why it is sometimes referred to as Lloyd-Forgy. The most common algorithm uses an iterative refinement technique. Due to its ubiquity, it is often called the k-means algorithm; it is also referred to as Lloyd's algorithm, particularly in the computer science community. Given an initial set of k means m1(1),…,mk(1) (see below), the algorithm proceeds by alternating between two steps: