Learning to Recognize Protected Health Information in Electronic Health Records with Recurrent Neural Network

2016 
De-identification in electronic health records is a prerequisite to distribute medical records for further clinical data processing or mining. In this paper, we introduce a framework based on recurrent neural network to solve the de-identification problem, and compare state-of-the-art methods with our framework. It is integrated, which includes records skeleton generation, chunk representation and protected information labeling. We evaluate our framework on three datasets involving two English datasets from i2b2 de-identification challenge and a Chinese dataset we created. To the best of our knowledge, we are the first to apply RNN model to the Chinese de-identification problem. The experimental results indicate that our framework not only achieves high performance but also has strong generalization ability.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    2
    Citations
    NaN
    KQI
    []