Automatic Text Classification for Label Imputation of Medical Diagnosis Notes Based on Random Forest

Bokai Yang,Guangzhe Dai,Yujie Yang,Darong Tang,Qi Li,Denan Lin,Jing Zheng,Yunpeng Cai

Automatic Text Classification for Label Imputation of Medical Diagnosis Notes Based on Random Forest

2018

Electronic medical records (EMRs) contain many information of patients, which are of great value for data mining for various clinical applications. However, information missing, including label missing, is pervasive in nature EMRs which would bring lots of obstacles for processing of the medical text contents. The aim of this study is to adopt automatic text classification technologies to recover missing medical text labels for EMRs and support downstream analyses. A combination of word-embedding technology and random forest classifiers are applied to identify multiple medical note labels including disease types and examination types, from short texts of medical imaging diagnosis notes. The results show that the average binary classification accuracies are 91%. Our research results indicate that using advanced NLP techniques for EMRs can reach high classification accuracies.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations