A Local Search Algorithm for k-Means with Outliers

2021 
Abstract k-means is a well-studied clustering problem that finds applications in many fields related to unsupervised learning. It is known that k-means clustering is highly sensitive to the isolated points (called outliers). Such outliers can significantly influence the final cluster configuration and should be removed to obtain quality solutions. In this paper, we study the k-means with outliers problem. Given a set of data points, the problem is to remove a set of outliers such that the k-means clustering cost of the remaining points is minimized. Designing efficient algorithms for this problem remains an active area of research due to its important role in dealing with noisy data. We consider a relaxed objective function and propose a local search algorithm for k-means with outliers. It is shown that the algorithm has better performance guarantees than previously implemented methods. In particular, it yields a constant-factor bi-criteria approximation solution to the problem. Moreover, we show experimentally that the algorithm performs much better than its provable guarantee and dominates other state-of-the-art methods for the problem.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    45
    References
    2
    Citations
    NaN
    KQI
    []