Evaluation of Disease-Associated Text-Mining Databases

Myeong-Sang Yu,Dokyun Na

Evaluation of Disease-Associated Text-Mining Databases

2015

There are about 20 million scientific articles in PubMed and this is a great source of knowledge. Extraction of information from the articles is one of challenges in biology and thus many text-mining approaches have been developed. However, the accuracy of text-mined results is still in question. Here we evaluated three text-mining databases with genes associated with Alzheimer's disease. Their per-gene accuracy is high (57-100%), but their per-abstract accuracy is relatively low (33-64%). This represents that the association of gene and disease is well-identified when abundant articles are available. However, genes with fewer articles could be wrongfully identified associated. Consequently, human-curation is still complementary to current text-mining approaches and future text-mining methods should improve their accuracy for genes with few articles or information.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations