A cluster validity evaluation method for dynamically determining the near-optimal number of clusters

2019 
Cluster validity evaluation is a hot issue in clustering algorithm research. Aiming at determining the optimal number of clusters in cluster validity evaluation, this paper proposes a new cluster validity index Ratio of Deviation of Sum-of-squares and Euclid distance (RDSED), and designs a cluster validity evaluation method based on RDSED which is suitable to dynamically determine the near-optimal number of clusters. Firstly, based on the analysis of the relationships of the intra-class and inter-class, the concepts of sum-of-squares of within-cluster, sum-of-squares of between-cluster, total sum-of-squares, sum of intra-cluster distance and average distance between clusters are proposed, and then a cluster validity index RDSED based on these concepts is constructed. Secondly, a cluster validity evaluation method based on RDSED for dynamically determining the near-optimal number of clusters is designed. In this method, RDSED value is calculated from large to small in the range of clustering number and this index value is used to dynamically terminate the clustering validity verification process, and finally the near-optimal number of clusters and clustering partition results are obtained. Experiment results of artificial datasets and real datasets show that, compared with some classical clustering validity evaluation method, the proposed cluster validity evaluation method can obtain the near-optimal number of clusters that is closest to the real cluster number in most cases and can effectively evaluate clustering partition results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    38
    References
    9
    Citations
    NaN
    KQI
    []