A graph-based semi-supervised reject inference framework considering imbalanced data distribution for consumer credit scoring

2021 
Abstract Credit scoring has been attracting increasing attention in the Chinese consumer financial industry. Traditional approaches are easily influenced by sample selection bias because they use accepted applicant samples only, while the applicant population also includes rejected applicants. Reject inference is a technique to infer good/bad labels for rejected applicants, which can overcome biases in credit scoring. However, previously proposed reject inference methods usually ignore the imbalanced distribution in accepted data, which means that good applicants are much more than bad ones in most practical consumer loan applications. Both the neglect of rejected data and the imbalanced distribution in accepted data weaken the performance of current credit scoring models. In this paper, we propose a novel reject inference framework that takes into account the imbalanced data distribution for consumer credit scoring. First, we use an advanced graph-based semi-supervised learning algorithm to solve the reject inference problem, which is called label spreading. Second, faced with an imbalanced distribution of good and bad samples in accepted applicants, we conduct imbalanced learning using a modified Synthetic Minority Over-sampling Technique before reject inference. Then, six binary classifiers are studied in our proposed framework for credit scoring modelling. Finally, we present the results of four exact experiments as well as online A/B tests for performance evaluation using data provided by a leading Chinese fintech company. Empirical results indicate that the proposed framework performs better than traditional scoring models across different evaluation metrics, representing a progressive method that promotes credit scoring research as well as improving fintech practices.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    60
    References
    2
    Citations
    NaN
    KQI
    []