Query by documents on top of a search interface

2021 
Abstract Document repositories often provide a keyword-based query interfaces to allow users to search for documents. These interfaces typically have rate limits or monetary cost per access operation. Constrained search interfaces include legal or medical data sources, social networks and the Web. We study the problem where a user has a set of input documents, and wants to discover other similar documents using a constrained search interface. Specifically, given a set of input documents and an access budget, we present principled techniques to generate a list of queries to submit. Our technique’s key intuition is to compute the best set of queries to return the input documents, which, as we show experimentally, also return other relevant documents. We show that our techniques are superior to the state-of-the-art work, according to several intuitive document relevance metrics, on several real benchmark datasets. We show results for two problem variants: finding queries to return in the highest positions the input documents (Docs2Queries-Self problem) and other relevant documents (Docs2Queries-Sim problem).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    0
    Citations
    NaN
    KQI
    []