A Comparison of Data Sampling Techniques for Credit Card Fraud Detection

2020 
Credit Card fraud is a tough reality that continues to constrain the financial sector and its detrimental effects are felt across the entire financial market. Criminals are continuously on the lookout for ingenious methods for such fraudulent activities and are a real threat to security. Therefore, there is a need for early detection of fraudulent activity to preserve customer trust and safeguard their business. A major challenge faced in designing fraud detection systems is dealing with the class imbalance issue in the data since genuine transactions outnumber the fraudulent transactions typically account less than 1% of the total transactions. This is an important area of study as the positive case (fraudulent case) is hard to distinguish and becomes even harder with the inflow of data where the representation of such cases even decreases further. This study trained four predictive models, Artificial Neural Network (ANN), Gradient Boosting Machine (GBM) and Random Forest (RF) on different sampling methods. Random Under Sampling (RUS), Synthetic Minority Over-sampling Technique (SMOTE), Density-Based Synthetic Minority Over-Sampling Technique (DBSMOTE) and SMOTE combined with Edited Nearest Neighbour (SMOTEENN) was used for all models. The findings of this study indicate promising results with SMOTE based sampling techniques. The best recall score obtained was with SMOTE sampling strategy by DRF classifier at 0.81. The precision score for this classifier was observed to be 0.86. Stacked Ensemble was trained for all the sampled datasets and found to have the best average performance at 0.78. The Stacked Ensemble model has shown promise in the detection of fraudulent transactions across most of the sampling strategies.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    1
    Citations
    NaN
    KQI
    []