Negation scope and spelling variation for text-mining of Danish electronic patient records

2014 
Electronic patient records are a potentially rich data source for knowledge extraction in biomedical research. Here we present a method based on the ICD10 system for text-mining of Danish health records. We have evaluated how adding functionalities to a baseline text-mining tool affected the overall performance. The purpose of the tool was to create enriched phenotypic profiles for each patient in a corpus consisting of records from 5,543 patients at a Danish psychiatric hospital, by assigning each patient additional ICD10 codes based on freetext parts of these records. The tool was benchmarked by manually curating a test set consisting of all records from 50 patients. The tool evaluated was designed to handle spelling and ending variations, shuffling of tokens within a term, and introduction of gaps in terms. In particular we investigated the importance of negation identification and negation scope. The most important functionality of the tool was handling of spelling variation, which greatly increased the number of phenotypes that could be identified in the records, without noticeably decreasing the precision. Further, our results show that different negations have different optimal scopes, some spanning only a few words, while others span up to whole sentences.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    4
    Citations
    NaN
    KQI
    []