NERO: A Biomedical Named-entity (Recognition) Ontology with a Large, Annotated Corpus Reveals Meaningful Associations Through Text Embedding

Kanix Wang,Robert Stevens,Alachram H,Yu Li,Larisa N. Soldatova,Ross D. King,Sophia Ananiadou,Maolin Li,Fenia Christopoulou,José Luis Ambite,Sahil Garg,Ulf Hermjakob,Daniel Marcu,Emily Sheng,Tim Beißbarth,Edgar Wingender,Aram Galstyan,Xin Gao,Brendan Chambers,Bohdan B. Khomtchouk,James A. Evans,Andrey Rzhetsky

NERO: A Biomedical Named-entity (Recognition) Ontology with a Large, Annotated Corpus Reveals Meaningful Associations Through Text Embedding

2020

Machine reading is essential for unlocking valuable knowledge contained in the millions of existing biomedical documents. Over the last two decades 1,2, the most dramatic advances in machine-reading have followed in the wake of critical corpus development3. Large, well-annotated corpora have been associated with punctuated advances in machine reading methodology and automated knowledge extraction systems in the same way that ImageNet 4 was fundamental for developing machine vision techniques. This study contributes six components to an advanced, named-entity analysis tool for biomedicine: (a) a new, Named-Entity Recognition Ontology (NERO) developed specifically for describing entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named-entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named-entity recognition automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations