Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants

Chris Kamphuis,Arjen de Vries,Leonid Boytsov,Jimmy Lin

Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants

2020

Chris Kamphuis
Arjen de Vries
Leonid Boytsov
Jimmy Lin

When researchers speak of BM25, it is not entirely clear which variant they mean, since many tweaks to Robertson et al.’s original formulation have been proposed. When practitioners speak of BM25, they most likely refer to the implementation in the Lucene open-source search library. Does this ambiguity “matter”? We attempt to answer this question with a large-scale reproducibility study of BM25, considering eight variants. Experiments on three newswire collections show that there are no significant effectiveness differences between them, including Lucene’s often maligned approximation of document length. As an added benefit, our empirical approach takes advantage of databases for rapid IR prototyping, which validates both the feasibility and methodological advantages claimed in previous work.

Keywords:

Reproducibility
Computer science
Data mining
Ambiguity
Information retrieval
Relational database

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations