Summarizing Events from Massive News Reports on the Web

2015 
In the big data era, massive news reports about the latest events are being published on the Web. To thoroughly understand an event, we have to read massive reports and keep clues in mind, which is very difficult and usually results in a one-sided interpretation. In this paper, we propose a multi-document summarization approach which summarizes reports of a particular social or political event automatically and comprehensively. To speed up summarization, a pre-summarization approach is introduced to condense each report to a sub-summary, which can reduce the scale of subsequent processing. As an event should be told in chronological order, a timeline is introduced to organize and aggregate event-relevant sub-summaries. With each day's sub-summaries, a key phrase extraction algorithm is used to cluster them into topics and generate a meaningful label for each topic. Finally, a selection criterion is introduced to select relevant and novel sentences for each topic. We perform experiments on a large-scale news dataset, with about 10 million reports collected from news sites. An empirical study shows that our system is feasible under large scale environment. An evaluation on effectiveness shows that it is favoured by users.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    1
    Citations
    NaN
    KQI
    []