An Improved Oversampling Method for imbalanced Data–SMOTE Based on Canopy and K-means

Chaoyou Guo,Yankun Ma,Zhe Xu,Mengmeng Cao,Qian Yao

An Improved Oversampling Method for imbalanced Data–SMOTE Based on Canopy and K-means

2019

Synthetic Minority Oversampling Technique (SMOTE) is a preferable method used to solve the imbalanced data classification issues. However, its efficiency for resolving the issues of minority sample classification still need to be improved. In order to balance its value and shortcome, we designed a perfected algorithm called “C-K-SMOTE’', which is a mixture clustering algorithm of the Canopy and K-means. For the final purpose of obtaining an approximately balanced data, first we use Canopy to achieve the approximate clustering, then use the K-means to obtain the accurate clustering, and after that we apply the SMOTE to increase the number of minority samples. The referential imbalanced data sets used in the article are selected from KEEL (Knowledge Extraction on Evolutionary Learning). By adopting random forest disaggregated model to carry experiments, SMOTE's efficiency of balancing the imbalanced databases is verified.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations