An Improved Oversampling Method for imbalanced Data–SMOTE Based on Canopy and K-means

2019 
Synthetic Minority Oversampling Technique (SMOTE) is a preferable method used to solve the imbalanced data classification issues. However, its efficiency for resolving the issues of minority sample classification still need to be improved. In order to balance its value and shortcome, we designed a perfected algorithm called “C-K-SMOTE’', which is a mixture clustering algorithm of the Canopy and K-means. For the final purpose of obtaining an approximately balanced data, first we use Canopy to achieve the approximate clustering, then use the K-means to obtain the accurate clustering, and after that we apply the SMOTE to increase the number of minority samples. The referential imbalanced data sets used in the article are selected from KEEL (Knowledge Extraction on Evolutionary Learning). By adopting random forest disaggregated model to carry experiments, SMOTE's efficiency of balancing the imbalanced databases is verified.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    4
    Citations
    NaN
    KQI
    []