Collocating News Articles with Structured Web Tables

2021 
In today’s news deluge, it can often be overwhelming to understand the significance of a news article or verify the facts within. One approach to address this challenge is to identify relevant data so that crucial statistics or facts can be highlighted for the user to easily digest, and thus improve the user’s comprehension of the news story in a larger context. In this paper, we look toward structured tables on the Web, especially the high quality data tables from Wikipedia, to assist in news understanding. Specifically, we aim to automatically find tables related to a news article. For that, we leverage the content and entities extracted from news articles and their matching tables to fine-tune a Bidirectional Transformers (BERT) model. The resulting model is, therefore, an encoder tailored for article-to-table match. To find the matching tables for a given news article, the fine-tuned BERT model encodes each table in the corpus and the news article into their respective embedding vectors. The tables with the highest cosine similarities to the news article in this new representation space are considered the possible matches. Comprehensive experimental analyses show that the new approach significantly outperforms the baselines over a large, weakly-labeled, dataset obtained from Web click logs as well as a small, crowdsourced, evaluation set. Specifically, our approach achieves near 90% accuracy@5 as opposed to baselines varying between 30% and 64%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []