Spectral Clustering Of Large-scale Data By Directly Solving Normalized Cut

Authors:
Xiaojun Chen Shenzhen University
Weijun Hong Shenzhen University
Feiping Nie Northwestern Polytechnical University
Dan He Shenzhen University
Min Yang Chinese Academy of Sciences
Joshua Z. Huang Shenzhen University

Introduction:

This paper studies spectral clustering algorithms. The authors propose a new optimization algorithm, namely Direct Normalized Cut (DNC), to directly optimize the normalized cut model.

Abstract:

During the past decades, many spectral clustering algorithms have been proposed. However, their high computational complexities hinder their applications on large-scale data. Moreover, most of them use a two-step approach to obtain the optimal solution, which may deviate from the solution by directly solving the original problem. In this paper, we propose a new optimization algorithm, namely Direct Normalized Cut (DNC), to directly optimize the normalized cut model. DNC has a quadratic time complexity, which is a significant reduction comparing with the cubic time complexity of the traditional spectral clustering. To cope with large-scale data, a Fast Normalized Cut (FNC) method with linear time and space complexities is proposed by extending DNC with an anchor-based strategy. In the new method, we first seek a set of anchors and then construct a representative similarity matrix by computing distances between the anchors and the whole data set. To find high quality anchors that best represent the whole data set, we propose a Balanced k-means (BKM) to partition a data set into balanced clusters and use the cluster centers as anchors. Then DNC is used to obtain the final clustering result from the representative similarity matrix. A series of experiments were conducted on both synthetic data and real-world data sets, and the experimental results show the superior performance of BKM, DNC and FNC.

You may want to know: