Identifying Abbreviations in Biomedical Literature Based on Maximum Entropy with Web Features

Jing Peng,Yan Wang,Hong Min Sun

Identifying Abbreviations in Biomedical Literature Based on Maximum Entropy with Web Features

2014

Jing Peng
Yan Wang
Hong Min Sun

The number of biomedical literatures is growing rapidly, and biomedical literature mining is becoming essential. A learning classifier based on maximum entropy (ME) for identifying abbreviations is proposed. Two innovative Web-based features for extracting additional semantic information are developed. The study shows the Web as a knowledge source can be incorporated effectively in the machine learning framework and significantly improves its performance. The ME classifier achieves 95% precision and 89% recall on the gold standard corpus “Medstract” and 91% precision and 84% recall on the larger test data that includes 128 full text literatures.

Keywords:

Principle of maximum entropy
Information retrieval
Classifier (linguistics)
Test data
Data mining
Recall
Computer science
Text mining
literature based
semantic information

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations