Scalable and Efficient Web Search Result Diversification

Kaweh Djafari Naini,Ismail Sengor Altingovde,Wolf Siberski

Scalable and Efficient Web Search Result Diversification

2016

It has been shown that top- k retrieval quality can be considerably improved by taking not only relevance but also diversity into account. However, currently proposed diversification approaches have not put much attention on practical usability in large-scale settings, such as modern web search systems. In this work, we make two contributions toward this goal. First, we propose a combination of optimizations and heuristics for an implicit diversification algorithm based on the desirable facility placement principle, and present two algorithms that achieve linear complexity without compromising the retrieval effectiveness. Instead of an exhaustive comparison of documents, these algorithms first perform a clustering phase and then exploit its outcome to compose the diverse result set. Second, we describe and analyze two variants for distributed diversification in a computing cluster, for large-scale IR where the document collection is too large to keep in one node. Our contribution in this direction is pioneering, as there exists no earlier work in the literature that investigates the effectiveness and efficiency of diversification on a distributed setup. Extensive evaluations on a standard TREC framework demonstrate a competitive retrieval quality of the proposed optimizations to the baseline algorithm while reducing the processing time by more than 80p and up to 97p, and shed light on the efficiency and effectiveness tradeoffs of diversification when applied on top of a distributed architecture.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations