Analyzing the Competition: Abstractive Summarization at Thomsom Reuters Labs

2020 
This manual task of writing a summary for a 30->100 page court document can be measured against a computer written task to learn the publishable acceptance accuracies of each. Acceptance leverages accuracy and grammar. The initial accuracies were 74% for the computer approach and 88% for the manual human approach. The first computer approach started with 100M annotated documents and the watershed moment came when they used OpenNMT. Initially the approach used a human manual highlight of the sentences (text) to summarize. Just doing this reduced the time from 30 mins to 3 mins per document. Next, they performed TFIDF and embeddings to get a weighted embedding for each sentence of the court documents. They choose the highest scoring sentences (the distinguishing ones) using this weighted BOW model. Finally, they introduce a language scoring by leveraging BertScore. Lastly, they adjusted the focus for the human reviewers to review the abstracts most likely needing review. This was accomplished using a straightforward binary classifier of ones needing or not needing editing in the past.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []