Cost-sensitive semi-supervised selective ensemble model for customer credit scoring

2019 
Abstract Only a few customers can be labeled in realistic credit-scoring problems, while many other customers cannot. Further, satisfactory performance is difficult, as traditional supervised learning methods can only use labeled samples to build credit-scoring models. Semi-supervised learning (SSL) can use both labeled and unlabeled samples to solve this problem, but existing credit-scoring research has primarily constructed single semi-supervised models. This study introduces SSL, cost-sensitive learning, a group method of data handling (GMDH), and an ensemble learning technique to propose a GMDH-based cost-sensitive semi-supervised selective ensemble (GCSSE) model. This involves two stages: (1)First, train an ensemble model composed of N base classifiers on the initial training set L with class labels, use it to selectively label the samples from the dataset U without class labels, add them with their predicted labels to the training set, and update the N base classifiers on the new training set; (2)Second, classify L and the test set using the respective trained base classifiers, and construct a cost-sensitive GMDH neural network to obtain the selective ensemble classification results for the test set. Experimental comparisons of five public customer credit score datasets and an empirical analysis of a real customer credit score dataset suggest that this model exhibits the best overall credit-scoring performance compared with one supervised ensemble model and three semi-supervised ensemble models.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    67
    References
    14
    Citations
    NaN
    KQI
    []