A systematic approach for pre-processing electronic health records for mining: case study of heart disease

2020 
Electronic Health Records (EHRs) form major part of Medical Big Data (MBD) and are enormous resources of knowledge. Mining EHRs can lead us to new generations of medicine (e.g. precision medicine). But actually it is not simply possible because EHRs are unsuitable for mining. Naturally any raw data is dirty but some special challenges make EHRs more susceptible to be dirty. To extract more precise and reliable knowledge we must pre-process EHRs. Performing appropriate pre-processing techniques which are based on specific properties of EHRs will provide high quality and more utilisable data. Here we introduce PEPMED, a systematic pre-processing approach that consists of three main stages. Each stage includes hybrid methods to deal with challenges of dirty data. Four well-known subgrouping methods were performed on both raw and pre-processed data to evaluate the approach. We used precision value and overall accuracy for measurements. Results show that PEPMED dramatically improved accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []