Local gap density for clustering high-dimensional data with varying densities

2019 
Abstract Density-based clustering algorithms are for clustering the data with arbitrary shapes. However, most of these algorithms face difficulties in handling the high-dimensional data with varying densities; especially, they can not well discover the clusters in sparse regions. In this paper, we define a new type of density, local gap density, in the k -NN graph which works well for high-dimensional data. The local gap density of each point considers not only the number of all points in its nearest neighbor but also the average distance from this point to all points in this nearest neighbor. In this way, the core points in sparse regions in the sense of existing density-based clustering have high densities in our density definition, so they can be easily detected. By the core points, the potential cross-cluster edges in the k -NN graph can be well identified. After deleting these edges, we group all the points in each component with large cardinality as a subcluster, and then, similar to density peaks clustering, assign each remaining point to its corresponding existing subcluster. Extensive experiments on eight publicly available datasets demonstrate the effectiveness of our clustering algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    12
    Citations
    NaN
    KQI
    []