## Abstract

Random forest (RF)-based pointwise learning-to-rank (LtR) algorithms use surrogate loss functions to minimize the ranking error. In spite of their competitive performance to other state-of-the-art LtR algorithms, these algorithms, unlike other frameworks such as boosting and neural network, have not been thoroughly investigated in the literature so far. In the first part of this study, we aim to better understand and improve the RF-based pointwise LtR algorithms. When working with such an algorithm, currently we need to choose a setting from a number of available options such as (1) classification versus regression setting, (2) using absolute relevance judgements versus mapped labels, (3) the number of features using which a split-point for data is chosen, and (4) using weighted versus un-weighted average of the predictions of multiple base learners (i.e., trees). We conduct a thorough study on these four aspects as well as on a pairwise objective function for RF-based rank-learners. Experimental results on several benchmark LtR datasets demonstrate that performance can be significantly improved by exploring these aspects. In the second part of this paper, we, guided by our investigations performed into RF-based rank-learners, conduct extensive comparison between these and state-of-the-art rank-learning algorithms. This comparison reveals some interesting and insightful findings about LtR algorithms including the finding that RF-based LtR algorithms are among the most robust techniques across datasets with diverse properties.

This is a preview of subscription content, log in to check access.

## Notes

- 1.
- 2.
Manning et al. [1] nicely explain these models.

- 3.
- 4.
- 5.
- 6.
This algorithm is also used by Ibrahim and Carman [15].

- 7.
While there exist several functions to be used as splitting criterion, Marko Robnik-Sikonja [35] shows that this choice makes insignificant performance variation, if at all.

- 8.
While in the literature most of the implementations of a tree uses a depth-first (i.e., recursive) exploration of the nodes, the implementation shown here uses a breadth-first exploration mainly because we think that this represents a more systematic way of exploring the nodes. For an entropy-based objective function, the node exploration strategy does not affect the tree structure, i.e., the data partitions [15].

- 9.
For all of the experiments of this section, for two larger datasets (MSLR-WEB10K and Yahoo), the bold and italic and bold figures denote that the best performance is significant with

*p*value less than 0.01 and 0.05, respectively. For the smaller datasets, an average over five independent runs is reported (and each run is the result of fivefold cross-validation), and the winning value is given in italic font. - 10.
Recall that the usual practice has been to use the average relevance of the instances as the score.

- 11.
Since the properties of HP2004 and NP2004 datasets are similar to that of TD2004, we do not conduct further experiments.

- 12.
Although the features are not disclosed for the Yahoo dataset, a particular feature index corresponding to BM25 is mentioned.

- 13.
This is intuitive since navigational queries have very few relevant documents, setting a higher value for

*k*facilitates the base ranker(s) put those few relevant documents into the training set of rank-learning phase. - 14.
Ibrahim and Carman [15] report results of only RF-rand, RF-point with classification and RF-list.

- 15.
We perform a pairwise significance test on the comparatively larger (in terms of number of queries) MQ2007 and MQ2008 datasets. Since for the rest of the datasets the number of queries is small, the significance test results may not be reliable.

- 16.
- 17.
We, however, did run a pilot experiment on comparison between RF-point and RF-hybrid with \(K \in \{23, 50, 80, 130\}\) (the value 23 is used as \(\sqrt{(}M)\)), and did not observe improvement in performance of RF-hybrid over RF-point for any of the settings. We thus conclude that the

*K*is not likely to play a significant role in the relative performance of these two systems for the Yahoo dataset. - 18.
The inventors of the said technique [63] admit that there is a concern of degradation of accuracy, slightly though.

- 19.
- 20.
- 21.

## References

- 1.
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge

- 2.
Li H (2011) Learning to rank for information retrieval and natural language processing. Synth Lect Hum Lang Technol 4(1):1–113

- 3.
Liu TY (2011) Learning to rank for information retrieval. Springer, Berlin

- 4.
Ibrahim M, Murshed M (2016) From tf-idf to learning-to-rank: an overview. In: Handbook of research on innovations in information retrieval, analysis, and management. IGI Global, USA, pp 62–109

- 5.
Karatzoglou A, Baltrunas L, Shi Y (2013) Learning to rank for recommender systems. In: Proceedings of the 7th ACM conference on recommender systems, ACM, pp 493–494

- 6.
Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47(2):227–237

- 7.
Santos RL, Macdonald C, Ounis I (2013) Learning to rank query suggestions for adhoc and diversity search. Inf Retr 16(4):429–451

- 8.
Li Z, Tang J, Mei T (2019) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell 41(9):2070–2083

- 9.
Li Z, Tang J, He X (2018) Robust structured nonnegative matrix factorization for image representation. IEEE Trans Neural Netw Learn Syst 29(5):1947–1960

- 10.
Dang V, Bendersky M, Croft WB (2013) Two-stage learning to rank for information retrieval. In: Advances in information retrieval. Springer, pp 423–434

- 11.
Macdonald C, Santos RL, Ounis I (2013) The whens and hows of learning to rank for web search. Inf Retr 16(5):584–628

- 12.
Aslam JA, Kanoulas E, Pavlu V, Savev S, Yilmaz E (2009) Document selection methodologies for efficient and effective learning-to-rank. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 468–475

- 13.
Pavlu V (2008) Large scale ir evaluation. ProQuest LLC, Ann Arbor

- 14.
Qin T, Liu TY, Xu J, Li H (2010) Letor: a benchmark collection for research on learning to rank for information retrieval. Inf Retr 13(4):346–374

- 15.
Ibrahim M, Carman M (2016) Comparing pointwise and listwise objective functions for random-forest-based learning-to-rank. ACM TOIS 34(4):20

- 16.
Breiman L (2001) Random forests. Mach Learn 45(1):5–32

- 17.
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York. https://doi.org/10.1007/978-0-387-84858-7

- 18.
Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29(1):173–180

- 19.
Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 161–168

- 20.
Criminisi A (2011) Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found Trends Comput Graph Vis 7(2–3):81–227

- 21.
Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181

- 22.
Biau G (2012) Analysis of a random forests model. J Mach Learn Res 98888:1063–1095

- 23.
Cui Z, Chen W, He Y, Chen Y (2015) Optimal action extraction for random forests and boosted trees. In: Proceedings of the 21st ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 179–188

- 24.
Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7(1):3

- 25.
Dong Y, Zhang Y, Yue J, Hu Z (2016) Comparison of random forest, random ferns and support vector machine for eye state classification. Multimed Tools Appl 75(19):11763–11783

- 26.
Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pattern Recogn Lett 27(4):294–300

- 27.
Chapelle O, Chang Y (2011) Yahoo! learning to rank challenge overview. J Mach Learn Res Proc Track 14:1–24

- 28.
Geurts P, Louppe G (2011) Learning to rank with extremely randomized trees. In: JMLR: workshop and conference proceedings, vol 14

- 29.
Mohan A, Chen Z, Weinberger KQ (2011) Web-search ranking with initialized gradient boosted regression trees. J Mach Learn Res Proc Track 14:77–89

- 30.
Han X, Lei S (2018) Feature selection and model comparison on microsoft learning-to-rank data sets. arXiv preprint arXiv:1803.05127

- 31.
Järvelin K, Kekäläinen J (2000) Ir evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 41–48

- 32.
Chapelle O, Metlzer D, Zhang Y, Grinspan P (2009) Expected reciprocal rank for graded relevance. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM). ACM, pp 621–630

- 33.
Li P, Wu Q, Burges C (2007) McRank: learning to rank using classification and gradient boosting. Adv Neural Inf Process Syst 20:897–904

- 34.
Cossock D, Zhang T (2006) Subset ranking using regression. Learning Theory, pp 605–619

- 35.
Robnik-Šikonja M (2004) Improving random forests. In: Machine learning: ECML 2004. Springer, pp 359–370

- 36.
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42

- 37.
Wager S, Hastie T, Efron B (2014) Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. J Mach Learn Res 15(1):1625–1651

- 38.
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

- 39.
Zaidi N, Webb G, Carman M, Petitjean F (2015) Deep broad learning—big models for big data. arXiv preprint arXiv:1509.01346

- 40.
Winham SJ, Freimuth RR, Biernacka JM (2013) A weighted random forests approach to improve predictive performance. Stat Anal Data Min ASA Data Sci J 6(6):496–505

- 41.
Bernard S, Heutte L, Adam S (2009) On the selection of decision trees in random forests. In: International joint conference on neural networks, 2009. IJCNN 2009. IEEE, pp 302–307

- 42.
Li HB, Wang W, Ding HW, Dong J (2010) Trees weighting random forest method for classifying high-dimensional noisy data. In: 2010 IEEE 7th international conference on e-business engineering (ICEBE). IEEE, pp 160–163

- 43.
Oshiro TM, Perez PS, Baranauskas JA (2012) How many trees in a random forest? In: MLDM. Springer, pp 154–168

- 44.
Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969

- 45.
Tax N, Bockting S, Hiemstra D (2015) A cross-benchmark comparison of 87 learning to rank methods. Inf Process Manag 51(6):757–772

- 46.
Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 133–142

- 47.
Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 391–398

- 48.
Wu Q, Burges CJ, Svore KM, Gao J (2010) Adapting boosting for information retrieval measures. Inf Retr 13(3):254–270

- 49.
Metzler D, Croft WB (2007) Linear feature-based models for information retrieval. Inf Retr 10(3):257–274

- 50.
Ibrahim M (2019) Sampling non-relevant documents of training sets for learning-to-rank algorithms. Int J Mach Learn Comput 10(2)

**(to appear)** - 51.
Ibrahim M (2019) Reducing correlation of random forest-based learning-to-rank algorithms using subsample size. Comput Intell 35(2):1–25

- 52.
Ibrahim M, Carman M (2014) Improving scalability and performance of random forest based learning-to-rank algorithms by aggressive subsampling. In: Proceedings of the 12th Australasian data mining conference, pp 91–99

- 53.
He B, Macdonald C, Ounis I (2008) Retrieval sensitivity under training using different measures. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 67–74

- 54.
Robertson S (2008) On the optimisation of evaluation metrics. In: Keynote, SIGIR 2008 workshop learning to rank for information retrieval (LR4IR)

- 55.
Donmez P, Svore KM, Burges CJ (2009) On the local optimality of lambdarank. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 460–467

- 56.
Yilmaz E, Robertson S (2010) On the choice of effectiveness measures for learning to rank. Inf Retr 13(3):271–290

- 57.
Hanbury A, Lupu M (2013) Toward a model of domain-specific search. In: Proceedings of the 10th conference on open research areas in information retrieval, Le Centre De Hautes Etudes Internationales D’Informatique Documentaire, pp 33–36

- 58.
Hawking D (2004) Challenges in enterprise search. In: Proceedings of the 15th Australasian database conference, vol 27. Australian Computer Society, Inc., pp 15–24

- 59.
McCallum A, Nigam K, Rennie J, Seymore K (1999) A machine learning approach to building domain-specific search engines. In: IJCAI, vol 99. Citeseer, pp 662–667

- 60.
Owens L, Brown M, Poore K, Nicolson N (2008) The forrester wave: enterprise search, q2 2008. For information and knowledge management professionals

- 61.
Yan X, Lau RY, Song D, Li X, Ma J (2011) Toward a semantic granularity model for domain-specific information retrieval. ACM TOIS 29(3):15

- 62.
Szummer M, Yilmaz E (2011) Semi-supervised learning to rank with preference regularization. In: Proceedings of the 20th ACM international conference on information and knowledge management (CIKM). ACM, pp 269–278

- 63.
Tyree S, Weinberger KQ, Agrawal K, Paykin J (2011) Parallel boosted regression trees for web search ranking. In: Proceedings of the 20th international conference on world wide web. ACM, pp 387–396

- 64.
Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory. Springer, pp 23–37

- 65.
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232 (English summary)

- 66.
Quoc C, Le V (2007) Learning to rank with nonsmooth cost functions. Proc Adv Neural Inf Process Syst 19:193–200

- 67.
Ganjisaffar Y, Caruana R, Lopes CV (2011) Bagging gradient-boosted trees for high precision, low variance ranking models. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 85–94

- 68.
Ganjisaffar Y, Debeauvais T, Javanmardi S, Caruana R, Lopes CV (2011) Distributed tuning of machine learning algorithms using mapreduce clusters. In: Proceedings of the third workshop on large scale data mining: theory and applications. ACM, p 2

## Author information

### Affiliations

### Corresponding author

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Appendix: Implementations and parameter settings of baseline algorithms

### Appendix: Implementations and parameter settings of baseline algorithms

The description of different baseline algorithms along with their parameter settings is given as follows.

RankSVM [46] is an SVM-inspired pairwise LtR algorithm which has been used as a baseline in a large number of works on LtR. In our experiments, we use a publicly available implementation of it.^{Footnote 19}

RankBoost [44] is an Adaboost-inspired [64] pairwise algorithm which, instead of using standard exponential loss, uses a pairwise loss. RankLib^{Footnote 20} is a popular package of a number of LtR algorithms from which we use RankBoost’s implementation. We set the number of trees of the ensemble to 500.

AdaRank [47] also uses the AdaBoost framework but unlike RankBoost it adopts a listwise approach. We use the implementation in RankLib with the number of base learners set to 500 and NDCG@10 as the optimization metric for learning.

CoorAsc [49] algorithm uses the coordinate ascent method in a listwise manner. Its RankLib implementation is used.

A popular gradient-boosted regression tree ensemble [65]-based pointwise LtR algorithm is Mart [33]. We use its implementation of RankLib with the following change in its default parameter settings: number of trees = 500, number of leaves for each tree = 7 (according to [17, Ch. 10], any value between 4 and 8 is likely to work well).

One of the most popular LtR algorithms is LambdaMart [48]. It blends the ingenuine idea of approximated gradient of its predecessor, namely LambdaRank [66] with the gradient boosting framework [65]. In the Yahoo LtR Challenge [27], a variant of LambdaMart topped the winning list. We use an open-source implementation of it mentioned in [67].^{Footnote 21} The parameter settings we maintain are as follows: number of trees = 500, number of leaves for each tree = 31 (Ganjisaffar et al. [68] report that value close to this has been found to work well for MSLR-WEB10K). On smaller datasets, in order to mitigate overfitting we set it, like Mart, to 7. The rest of the parameters are kept unchanged. We note that during training, all the six baselines make use of validation sets.

## Rights and permissions

## About this article

### Cite this article

Ibrahim, M. An empirical comparison of random forest-based and other learning-to-rank algorithms.
*Pattern Anal Applic* **23, **1133–1155 (2020). https://doi.org/10.1007/s10044-019-00856-6

Received:

Accepted:

Published:

Issue Date:

### Keywords

- Learning-to-rank
- Random forest
- Decision tree
- Parameter settings