A Comparison of Two Oversampling Techniques (SMOTE vs MTDF) for Handling Class Imbalance Problem: A Case Study of Customer Churn Prediction

2015 
Predicting the behavior of customer is at great importance for a project manager. Data driven industries such as telecommunication industries have advantage of various data mining techniques to extract meaningful information regarding customer’s future behavior. However, the prediction accuracy of these data mining techniques is significantly affected if the real world data is highly imbalanced. In this study, we investigate and compare the predictive performance of two well-known oversampling techniques Synthetic Minority Oversampling Technique (SMOT) and Megatrend Diffusion Function (MTDF) and four different rule generation algorithms (Exhaustive, Genetic, Covering, and LEM2) based on rough set classification using publicly available data sets. As useful feature extraction can play a vital role not only in improving the classification performance, but also to reduce the computational cost and complexity by eliminating unnecessary features from the dataset. Minimum Redundancy Maximum Relevance (mRMR) technique has been used in the proposed study for feature extraction which not only selects the best feature subset but also reduces the features space. The results clearly demonstrate the predictive performance of both oversampling techniques and rules generation algorithms that will help the decision makers/researcher to select the ultimate one.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    11
    Citations
    NaN
    KQI
    []