Skip to main content

QuickScorer: Efficient Traversal of Large Ensembles of Decision Trees

Part of the Lecture Notes in Computer Science book series (LNAI,volume 10536)

Abstract

Machine-learnt models based on additive ensembles of binary regression trees are currently deemed the best solution to address complex classification, regression, and ranking tasks. Evaluating these models is a computationally demanding task as it needs to traverse thousands of trees with hundreds of nodes each. The cost of traversing such large forests of trees significantly impacts their application to big and stream input data, when the time budget available for each prediction is limited to guarantee a given processing throughput. Document ranking in Web search is a typical example of this challenging scenario, where the exploitation of tree-based models to score query-document pairs, and finally rank lists of documents for each incoming query, is the state-of-art method for ranking (a.k.a. Learning-to-Rank). This paper presents QuickScorer, a novel algorithm for the traversal of huge decision trees ensembles that, thanks to a cache- and CPU-aware design, provides a \({\sim } 9 \! \times \) speedup over best competitors.

Keywords

  • Learning to rank
  • Ensemble of decision trees
  • Efficiency

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-71273-4_36
  • Chapter length: 5 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-71273-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)
Fig. 1.

References

  1. Asadi, N., Lin, J., de Vries, A.P.: Runtime optimizations for tree-based machine learning models. IEEE TKDE 26(9), 2281–2292 (2014)

    Google Scholar 

  2. Chen, T., Guestrin, C.: XGBoost: A scalable tree boosting system. In: Proceedings of SIGKDD, pp. 785–794. ACM (2016)

    Google Scholar 

  3. Dato, D., Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Tonellotto, N., Venturini, R.: Fast ranking with additive ensembles of oblivious and non-oblivious regression trees. ACM TOIS 35(2), 1–31 (2016)

    CrossRef  Google Scholar 

  4. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2000)

    MathSciNet  CrossRef  MATH  Google Scholar 

  5. Gulin, A., Kuralenok, I., Pavlov, D.: Winning The transfer learning track of yahoo!’s learning to rank challenge with YetiRank. In: Yahoo! Learning to Rank Challenge, pp. 63–76 (2011)

    Google Scholar 

  6. Langley, P., Sage, S.: Oblivious decision trees and abstract cases. In: Working Notes of the AAAI-94 Workshop on Case-Based Reasoning, pp. 113–117. AAAI Press (1994)

    Google Scholar 

  7. Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Tonellotto, N., Venturini, R.: QuickScorer: a fast algorithm to rank documents with additive ensembles of regression trees. In: Proceedings of SIGIR, pp. 73–82. ACM (2015)

    Google Scholar 

  8. Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Tonellotto, N., Venturini, R.: Exploiting CPU SIMD extensions to speed-up document scoring with tree ensembles. In: Proceedings of SIGIR, pp. 833–836. ACM (2016)

    Google Scholar 

  9. Segalovich, I.: Machine learning in search quality at Yandex. Presentation at the Industry Track of SIGIR (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudio Lucchese .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Tonellotto, N., Venturini, R. (2017). QuickScorer: Efficient Traversal of Large Ensembles of Decision Trees. In: , et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2017. Lecture Notes in Computer Science(), vol 10536. Springer, Cham. https://doi.org/10.1007/978-3-319-71273-4_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-71273-4_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-71272-7

  • Online ISBN: 978-3-319-71273-4

  • eBook Packages: Computer ScienceComputer Science (R0)