Multi-view Ensemble Learning Based on Distance-to-model and Adaptive Clustering for Imbalanced Credit Risk Assessment in P2P Lending

2020 
Abstract Credit risk assessment is a crucial task in the peer-to-peer (P2P) lending industry. In recent years, ensemble learning methods have been verified to perform better in default prediction than individual classifiers and statistical techniques. Real-world loan datasets are imbalanced; however, most studies focus on enhancing overall prediction accuracy rather than improving the identification ability of real default loans. Moreover, some of the features that are significantly correlated with default rates are not attached importance in the model construction of previous studies. To fill these gaps, we propose a distance-to-model and adaptive clustering-based multi-view ensemble (DM-ACME) learning method for predicting default risk in P2P lending. In this method, multi-view learning and an adaptive clustering method are explored to produce an ensemble of diverse ensembles constituted by gradient boosting decision trees. A novel combination strategy called distance-to-model and a soft probability fashion are embedded for model integration. To verify the effectiveness of the proposed ensemble approach, comprehensive analysis on DM-ACME, comparative experiments with several state-of-the-art methods, and feature importance evaluation are conducted with the data provided by Lending Club. Experimental results demonstrate the superiority of the proposed method as well as indicate the importance of some features in loan default prediction.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    49
    References
    16
    Citations
    NaN
    KQI
    []