Optimization on Purity K-Means Using Variant Distance Measure

2020 
The initial center point of the cluster (centroid) affects the results of the K-Means algorithm. In this study is discussing the results of K-Means clustering with initial centroid determination using purity and different distance measures performed on various datasets with varying amounts of attributes and amounts of data. So this paper presents the best distance in the process of clustering the K-Means algorithm. Based on the results of test conducted on the Iris dataset with 150 data amounts, the number of attributes 4 and 4 clusters, the results obtained using Euclidean Purity K-Means of 0.2935 is better than the Canberra Purity K-Means of 0,5949 and City Block Purity K-Means of 0.4771. The next comparison is the results of test conducted on the Birth and Death Rates dataset with 70 data, the number of attributes 2 and 4 clusters obtained the results are using the Canberra Purity K-Means of 0.4925 better than the Euclidean Purity K-Means of 0.5411 and City Block Purity K-Means of 0.4938. The last comparison of the results of test conducted on the Wholesale Customers dataset with a total of 440 data, the number of attributes 6 and 4 clusters are obtained using Canberra Purity K-Means of 0.9357 better than Euclidean Purity K-Means of 1.0357 and City Block Purity K-Means of 0.9619.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    1
    References
    0
    Citations
    NaN
    KQI
    []