Named Entity Recognition and Linking: a Portuguese and Spanish Oncological Parallel Corpus

2021 
Biomedical literature is the main mean of communication for researchers to share their findings. Since biomedical literature is composed of a large collection of text expressed in natural language, the usage of text mining tools to extract information from those texts automatically is of utmost importance. The problem is that the majority of the state-of-the-art tools were not developed to deal with other languages besides English, which in biomedical literature is even more critical since a significant part of health-related texts is written in the author9s native language. To address this issue, this work presents a deep learning NERL (Named Entity Recognition and Linking) system and a parallel corpus for the Spanish and Portuguese languages focused on the oncological domain. Both the system and the corpus are available at https://github.com/lasigeBioTM/ICERL_system-ICR_Corpus.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []