A gradient ascent algorithm based on possibilistic fuzzy C-Means for clustering noisy data

2021 
Abstract Real-world data are often corrupted by noise and outliers, which are originated from different procedures such as data collection, storage, and processing. Noise and outliers decrease the quality of clustering and lead to the inaccurate and misplaced cluster centers. In this paper, we propose a new algorithm called Improved Possibilistic Fuzzy C-Means (IPFCM) to cluster noisy data. First, initial cluster centers are calculated by Possibilistic Fuzzy C-Means (PFCM) which do not match dense regions of the data. Then, the domain is divided to some subdomains and each data point is assigned to a sub-domain. The cluster centers are iteratively moved towards high-density regions by maximizing a novel cluster validity index. In the proposed method, a Gaussian membership function is defined on each cluster to weight the data. Then, the sum of weights in each cluster is calculated. The product of these values is considered as the validity index. Since division of the domain is changed with moving the cluster centers, this procedure is repeated until the convergent criterion is satisfied. Cluster analysis performed on six synthetics, nine real benchmarks datasets shows the superiority of IPFCM over some previous clustering algorithms such as Fuzzy C-Means (FCM), PFCM, Kernel Fuzzy C-Means (KFCM), Noise Clustering (NC), and Generalized Entropy based Possibilistic Fuzzy C-Means (GEPFCM). The clustering results of near-fault ground motion data indicate that the cluster centers identified by IPFCM are well separated from each other, while those for PFCM are close to each other in some datasets. Moreover, the results show that the impact of noisy data on the proposed index and consequently cluster analysis decreases as the noisy data get away from the cluster centers which is one of the advantages of using IPFCM algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    60
    References
    0
    Citations
    NaN
    KQI
    []