Multi-label clinical document classification: Impact of label-density

2019 
Abstract Objective The goal of this work is the classification of Electronic Health Records using Natural Language Techniques. Electronic Health Records (EHRs) convey valuable clinical information, as diagnoses and patient conditions. We explore several Deep Learning classification models for assigning multiple ICD codes to clinical documents. Within the framework of data mining, the aim of multi-label classification is to associate each instance with a set of labels. Methods The multi-label classification is typically carried out based on multiple independent classifiers, in the so-called binary relevance learning approach. Nevertheless, diseases tend to be co-related, independent classifiers are unable to model relationships and do not guarantee the consistency of the predicted label-set. To tackle this, we investigate three Neural Network architectures. We study models that are capable of capturing and modeling label dependencies on the output layer. Moreover, learning from data with low label-density is an inherent challenge in multi-label classification. Thorough experiments were conducted to assess each architecture under different scenarios, varying the language, amount of data and label-density. Results The results showed that the Bi-GRU model outperform the DNN and both overcome the baseline (BLR). We observed better results with MIMIC than with Osakidetza corpus. Experimental results showed that as the label-density decreases the prediction task becomes harder. It seems that label-density is very much related to the learning ability of the neural networks and another important factor that affects the inference is the amount of training data. Conclusions The contributions of this work are: a) a comparison among three classification approaches based on Neural Networks on data sets in English and Spanish to cope with the multi-label classification problem and b) the study of the impact of label-density in prediction capabilities in the multi-label context.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    53
    References
    7
    Citations
    NaN
    KQI
    []