Content-based table retrieval for web queries

2019 
Abstract Understanding the connections between unstructured text and semi-structured table is an important yet neglected problem in natural language processing. In this work, we focus on content-based table retrieval. Given a query, the task is to find the most relevant table from a collection of tables. Further progress towards improving this area requires powerful models of semantic matching and richer training and evaluation resources. To remedy this, we present a ranking based approach, and implement both carefully designed features and neural network architectures to measure the relevance between a query and the content of a table. Furthermore, we release an open-domain dataset that includes 21,113 web queries for 273,816 tables. We conduct comprehensive experiments on both real world and synthetic datasets. Various evaluation criteria demonstrate that the proposed approach performs comparable or better than a carefully designed feature-based system. We show what depth of table and language understanding is required to do well on this task, and hope further interests from the community in exploring deeper connections between table and text.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    19
    Citations
    NaN
    KQI
    []