RapidScorer: Fast Tree Ensemble Evaluation By Maximizing Compactness In Data Level Parallelization

Authors:
Ting Ye Microsoft
Hucheng Zhou Alibaba
Will Zou Microsoft
Bin Gao Microsoft
Ruofei Zhang Microsoft

Introduction:

This paper studies Relevance ranking models based on additive ensembles of regression trees. The authors present RapidScorer , a novel framework for speeding up the scoring process of industry-scale tree ensemble models.

Abstract:

Relevance ranking models based on additive ensembles of regression trees have shown quite good effectiveness in web search engines. In the era of big data, tree ensemble models grow large in both tree depth and ensemble size to provide even better search relevance and user experience. However, the computational cost for their scoring process is high, such that it becomes a challenging issue to apply the big tree ensemble models in a search engine which needs to answer thousands of queries per second. Although several works have been proposed to improve the scoring process, the challenge is still great especially when the model size grows large. In this paper, we present RapidScorer , a novel framework for speeding up the scoring process of industry-scale tree ensemble models, without hurting the quality of scoring results. RapidScorer introduces a modified run length encoding called epitome to the bitvector representation of the tree nodes. Epitome can greatly reduce the computation cost to traverse the tree ensemble, and work with several other proposed strategies to maximize the compactness of data units in memory. The achieved compactness makes it possible to fully utilize data parallelization to improve model scalability. Experiments on two web search benchmarks show that, RapidScorer achieves significant speed-up over the state-of-the-art methods: V-QuickScorer , ranging from 1.3x to 3.5x; QuickScorer , ranging from 2.1x to 25.0x; VPred , ranging from 2.3x to 18.3x; and XGBoost , ranging from 2.6x to 42.5x.

You may want to know: