No-Substitution $k$-means Clustering with Optimal Center Complexity and Low Memory.

2021 
We consider $k$-means clustering in the online no-substitution setting where one must decide whether to take each data point $x_t$ as a center immediately upon streaming it and cannot remove centers once taken. Our work is focused on the \emph{arbitrary-order} assumption where there are no restrictions on how the points $X$ are ordered or generated. Algorithms in this setting are evaluated with respect to their approximation ratio compared to optimal clustering cost, the number of centers they select, and their memory usage. Recently, Bhattacharjee and Moshkovitz (2020) defined a parameter, $Lower_{\alpha, k}(X)$ that governs the minimum number of centers any $\alpha$-approximation clustering algorithm, allowed any amount of memory, must take given input $X$. To complement their result, we give the first algorithm that takes $\tilde{O}(Lower_{\alpha,k}(X))$ centers (hiding factors of $k, \log n$) while simultaneously achieving a constant approximation and using $\tilde{O}(k)$ memory in addition to the memory required to save the centers. Our algorithm shows that it in the no-substitution setting, it is possible to take an order-optimal number of centers while using little additional memory.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []