Improving Classification of Imbalanced Datasets Based on KM++ SMOTE Algorithm

2019 
Imbalanced datasets are common in the area of cancer diagnosis, blood sample centres and industrial equipment. Based on these imbalanced datasets, the training model using traditional machine learning algorithms predicts a poor performance of the minority class samples, which may bring huge losses. Therefore, how to improve the classification of imbalanced datasets is a research hotspot. Motivated by this problem, a new improved SMOTE algorithm is proposed in this paper. The algorithm named KM++ SMOTE aims to deal with the problem of the new synthetic sample distribution marginalization and the uneven distribution of the overall dataset caused by the SMOTE algorithm. By using three different datasets of Bearing Data Center Seeded Fault Test Data and comparing the experimental results of KM++ SMOTE algorithm and random forest algorithm with other improved SMOTE algorithm and random forest algorithm, KM++ SMOTE algorithm and random forest algorithm have better performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    2
    Citations
    NaN
    KQI
    []