Analyzing the Competition: Abstractive Summarization at Thomsom Reuters Labs

Matt Corkum,Sujit Pal

Analyzing the Competition: Abstractive Summarization at Thomsom Reuters Labs

2020

This manual task of writing a summary for a 30->100 page court document can be measured against a computer written task to learn the publishable acceptance accuracies of each. Acceptance leverages accuracy and grammar. The initial accuracies were 74% for the computer approach and 88% for the manual human approach. The first computer approach started with 100M annotated documents and the watershed moment came when they used OpenNMT. Initially the approach used a human manual highlight of the sentences (text) to summarize. Just doing this reduced the time from 30 mins to 3 mins per document. Next, they performed TFIDF and embeddings to get a weighted embedding for each sentence of the court documents. They choose the highest scoring sentences (the distinguishing ones) using this weighted BOW model. Finally, they introduce a language scoring by leveraging BertScore. Lastly, they adjusted the focus for the human reviewers to review the abstracts most likely needing review. This was accomplished using a straightforward binary classifier of ones needing or not needing editing in the past.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations