Content Models for Survey Generation: A Factoid-Based Evaluation

Rahul Jha,Catherine Finegan-Dollak,Ben King,Reed Coke,Dragomir R. Radev

Content Models for Survey Generation: A Factoid-Based Evaluation

2015

Rahul Jha
Catherine Finegan-Dollak
Ben King
Reed Coke
Dragomir R. Radev

We present a new factoid-annotated dataset for evaluating content models for scientific survey article generation containing 3,425 sentences from 7 topics in natural language processing. We also introduce a novel HITS-based content model for automated survey article generation called HITSUM that exploits the lexical network structure between sentences from citing and cited papers. Using the factoid-annotated data, we conduct a pyramid evaluation and compare HITSUM with two previous state-of-the-art content models: C-Lexrank, a network based content model, and TOPICSUM, a Bayesian content model. Our experiments show that our new content model captures useful survey-worthy information and outperforms C-Lexrank by 4% and TOPICSUM by 7% in pyramid evaluation.

Keywords:

Natural language processing
Artificial intelligence
Computer science
Pyramid
Content Model
Factoid
Exploit
Bayesian probability
network structure

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations