An Adaptive Pre-clustering Support Vector Machine for Binary Imbalanced Classification

2018 
Imbalance classification is a common but critical problem in machine learning and artificial intelligence. Derived from structural risk minimization, a Support Vector Machine (SVM) enjoys great reputation in classification. However, the original SVM is not suitable for the imbalance classification and the existing modifications of SVM for this kind of problems fail to take the distribution of datasets into full consideration, thereby leading to some remarkable loss in their classification performance. Recently, an Adaptive Clustering by Fast Search and Find of Density Peaks (ADPclust) is proposed and performs well in finding cluster centroids in a sample space automatically and more reliably by using adaptive density peak detection and silhouette theory. Motivated by this, this work proposes an adaptive pre-clustering SVM (AP-SVM) such that the information of the original dataset distribution is well utilized to yield balanced sub-datasets for accurate and efficient classification. Specifically, AP-SVM clusters the majority into several groups given a dataset and then applies undersampling on every cluster to re-balance the dataset to be used in the SVM classification step. After experiments on 10 binary public datasets and evaluation using Area Under Curve (AUC), F-Measure, G-Mean, we well show the superiority of the proposed method over SVM, Synthetic Minority Over-sampling Technique algorithm (SMOTE), Undersampling-SVM (U-SVM), K-Means, Fuzzy C Means and EasyEnsemble.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    1
    Citations
    NaN
    KQI
    []