Evaluation of Disease-Associated Text-Mining Databases

2015 
There are about 20 million scientific articles in PubMed and this is a great source of knowledge. Extraction of information from the articles is one of challenges in biology and thus many text-mining approaches have been developed. However, the accuracy of text-mined results is still in question. Here we evaluated three text-mining databases with genes associated with Alzheimer's disease. Their per-gene accuracy is high (57-100%), but their per-abstract accuracy is relatively low (33-64%). This represents that the association of gene and disease is well-identified when abundant articles are available. However, genes with fewer articles could be wrongfully identified associated. Consequently, human-curation is still complementary to current text-mining approaches and future text-mining methods should improve their accuracy for genes with few articles or information.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []