Transformer-based models for ICD-10 coding of death certificates with Portuguese text

2022 
Natural Language Processing (NLP) can offer important tools for unlocking relevant information from clinical narratives. Although Transformer-based models can achieve remarkable results in several different NLP tasks, these models have been less used in clinical NLP, and particularly in low resource languages, of which Portuguese is one example. It is still not entirely clear whether pre-trained Transformer models are useful for clinical tasks, without further architecture engineering or particular training strategies. In this work, we propose a BERT model to assign ICD-10 codes for causes of death, by analyzing free-text descriptions in death certificates, together with the associated autopsy reports and clinical bulletins, from the Portuguese Ministry of Health. We used a novel pre-training procedure that incorporates in-domain knowledge, and also a fine-tuning method to address the class imbalance issue. Experimental results show that, in this particular clinical task that requires the processing of relatively short documents, Transformer-based models can achieve very strong results, significantly outperforming tailored approaches based on recurrent neural networks.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []