An empirical comparison of random forest-based and other learning-to-rank algorithms

Ibrahim, Muhammad

doi:10.1007/s10044-019-00856-6

An empirical comparison of random forest-based and other learning-to-rank algorithms

Theoretical advances
Published: 28 October 2019

Volume 23, pages 1133–1155, (2020)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Muhammad Ibrahim¹

480 Accesses
7 Citations
Explore all metrics

Abstract

Random forest (RF)-based pointwise learning-to-rank (LtR) algorithms use surrogate loss functions to minimize the ranking error. In spite of their competitive performance to other state-of-the-art LtR algorithms, these algorithms, unlike other frameworks such as boosting and neural network, have not been thoroughly investigated in the literature so far. In the first part of this study, we aim to better understand and improve the RF-based pointwise LtR algorithms. When working with such an algorithm, currently we need to choose a setting from a number of available options such as (1) classification versus regression setting, (2) using absolute relevance judgements versus mapped labels, (3) the number of features using which a split-point for data is chosen, and (4) using weighted versus un-weighted average of the predictions of multiple base learners (i.e., trees). We conduct a thorough study on these four aspects as well as on a pairwise objective function for RF-based rank-learners. Experimental results on several benchmark LtR datasets demonstrate that performance can be significantly improved by exploring these aspects. In the second part of this paper, we, guided by our investigations performed into RF-based rank-learners, conduct extensive comparison between these and state-of-the-art rank-learning algorithms. This comparison reveals some interesting and insightful findings about LtR algorithms including the finding that RF-based LtR algorithms are among the most robust techniques across datasets with diverse properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An Outlier Ranking Tree Selection Approach to Extreme Pruning of Random Forests

Feature ranking for multi-target regression

Article 27 August 2019

Matej Petković, Dragi Kocev & Sašo Džeroski

Rankboost $$+$$ : an improvement to Rankboost

Article 12 August 2019

Harold Connamacher, Nikil Pancha, … Soumya Ray

Notes

www.google.com.
Manning et al. [1] nicely explain these models.
http://research.microsoft.com/en-us/um/beijing/projects/letor/.
http://research.microsoft.com/en-us/projects/mslr/.
To know details of these metrics, the reader can go through Järvelin and Kekäläinen [31], Chapelle et al. [32] and Ibrahim and Murshed [4].
This algorithm is also used by Ibrahim and Carman [15].
While there exist several functions to be used as splitting criterion, Marko Robnik-Sikonja [35] shows that this choice makes insignificant performance variation, if at all.
While in the literature most of the implementations of a tree uses a depth-first (i.e., recursive) exploration of the nodes, the implementation shown here uses a breadth-first exploration mainly because we think that this represents a more systematic way of exploring the nodes. For an entropy-based objective function, the node exploration strategy does not affect the tree structure, i.e., the data partitions [15].
For all of the experiments of this section, for two larger datasets (MSLR-WEB10K and Yahoo), the bold and italic and bold figures denote that the best performance is significant with p value less than 0.01 and 0.05, respectively. For the smaller datasets, an average over five independent runs is reported (and each run is the result of fivefold cross-validation), and the winning value is given in italic font.
Recall that the usual practice has been to use the average relevance of the instances as the score.
Since the properties of HP2004 and NP2004 datasets are similar to that of TD2004, we do not conduct further experiments.
Although the features are not disclosed for the Yahoo dataset, a particular feature index corresponding to BM25 is mentioned.
This is intuitive since navigational queries have very few relevant documents, setting a higher value for k facilitates the base ranker(s) put those few relevant documents into the training set of rank-learning phase.
Ibrahim and Carman [15] report results of only RF-rand, RF-point with classification and RF-list.
We perform a pairwise significance test on the comparatively larger (in terms of number of queries) MQ2007 and MQ2008 datasets. Since for the rest of the datasets the number of queries is small, the significance test results may not be reliable.
As explained earlier, since HP2004 and NP2004 datasets contain navigational queries, MAP may not be considered to be a very effective choice for evaluation of this type of information need [1, Sec. 8.4]. That is why, in Table 12 we chose NDCG@10 for overall comparison.
We, however, did run a pilot experiment on comparison between RF-point and RF-hybrid with $K \in \{23, 50, 80, 130\}$ (the value 23 is used as $\sqrt{(}M)$), and did not observe improvement in performance of RF-hybrid over RF-point for any of the settings. We thus conclude that the K is not likely to play a significant role in the relative performance of these two systems for the Yahoo dataset.
The inventors of the said technique [63] admit that there is a concern of degradation of accuracy, slightly though.
http://research.microsoft.com/en-us/um/beijing/projects/letor//Baselines/RankSVM-Primal.html.
https://people.cs.umass.edu/~vdang/ranklib.html.
https://code.google.com/p/jforests/.

References

Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge
MATH Google Scholar
Li H (2011) Learning to rank for information retrieval and natural language processing. Synth Lect Hum Lang Technol 4(1):1–113
Google Scholar
Liu TY (2011) Learning to rank for information retrieval. Springer, Berlin
MATH Google Scholar
Ibrahim M, Murshed M (2016) From tf-idf to learning-to-rank: an overview. In: Handbook of research on innovations in information retrieval, analysis, and management. IGI Global, USA, pp 62–109
Karatzoglou A, Baltrunas L, Shi Y (2013) Learning to rank for recommender systems. In: Proceedings of the 7th ACM conference on recommender systems, ACM, pp 493–494
Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47(2):227–237
Google Scholar
Santos RL, Macdonald C, Ounis I (2013) Learning to rank query suggestions for adhoc and diversity search. Inf Retr 16(4):429–451
Google Scholar
Li Z, Tang J, Mei T (2019) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell 41(9):2070–2083
Google Scholar
Li Z, Tang J, He X (2018) Robust structured nonnegative matrix factorization for image representation. IEEE Trans Neural Netw Learn Syst 29(5):1947–1960
MathSciNet Google Scholar
Dang V, Bendersky M, Croft WB (2013) Two-stage learning to rank for information retrieval. In: Advances in information retrieval. Springer, pp 423–434
Macdonald C, Santos RL, Ounis I (2013) The whens and hows of learning to rank for web search. Inf Retr 16(5):584–628
Google Scholar
Aslam JA, Kanoulas E, Pavlu V, Savev S, Yilmaz E (2009) Document selection methodologies for efficient and effective learning-to-rank. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 468–475
Pavlu V (2008) Large scale ir evaluation. ProQuest LLC, Ann Arbor
Google Scholar
Qin T, Liu TY, Xu J, Li H (2010) Letor: a benchmark collection for research on learning to rank for information retrieval. Inf Retr 13(4):346–374
Google Scholar
Ibrahim M, Carman M (2016) Comparing pointwise and listwise objective functions for random-forest-based learning-to-rank. ACM TOIS 34(4):20
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
MATH Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York. https://doi.org/10.1007/978-0-387-84858-7
MATH Google Scholar
Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29(1):173–180
Google Scholar
Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 161–168
Criminisi A (2011) Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found Trends Comput Graph Vis 7(2–3):81–227
MATH Google Scholar
Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181
MathSciNet MATH Google Scholar
Biau G (2012) Analysis of a random forests model. J Mach Learn Res 98888:1063–1095
MathSciNet MATH Google Scholar
Cui Z, Chen W, He Y, Chen Y (2015) Optimal action extraction for random forests and boosted trees. In: Proceedings of the 21st ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 179–188
Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7(1):3
Google Scholar
Dong Y, Zhang Y, Yue J, Hu Z (2016) Comparison of random forest, random ferns and support vector machine for eye state classification. Multimed Tools Appl 75(19):11763–11783
Google Scholar
Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pattern Recogn Lett 27(4):294–300
Google Scholar
Chapelle O, Chang Y (2011) Yahoo! learning to rank challenge overview. J Mach Learn Res Proc Track 14:1–24
Google Scholar
Geurts P, Louppe G (2011) Learning to rank with extremely randomized trees. In: JMLR: workshop and conference proceedings, vol 14
Mohan A, Chen Z, Weinberger KQ (2011) Web-search ranking with initialized gradient boosted regression trees. J Mach Learn Res Proc Track 14:77–89
Google Scholar
Han X, Lei S (2018) Feature selection and model comparison on microsoft learning-to-rank data sets. arXiv preprint arXiv:1803.05127
Järvelin K, Kekäläinen J (2000) Ir evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 41–48
Chapelle O, Metlzer D, Zhang Y, Grinspan P (2009) Expected reciprocal rank for graded relevance. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM). ACM, pp 621–630
Li P, Wu Q, Burges C (2007) McRank: learning to rank using classification and gradient boosting. Adv Neural Inf Process Syst 20:897–904
Google Scholar
Cossock D, Zhang T (2006) Subset ranking using regression. Learning Theory, pp 605–619
Robnik-Šikonja M (2004) Improving random forests. In: Machine learning: ECML 2004. Springer, pp 359–370
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
MATH Google Scholar
Wager S, Hastie T, Efron B (2014) Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. J Mach Learn Res 15(1):1625–1651
MathSciNet MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Zaidi N, Webb G, Carman M, Petitjean F (2015) Deep broad learning—big models for big data. arXiv preprint arXiv:1509.01346
Winham SJ, Freimuth RR, Biernacka JM (2013) A weighted random forests approach to improve predictive performance. Stat Anal Data Min ASA Data Sci J 6(6):496–505
MathSciNet MATH Google Scholar
Bernard S, Heutte L, Adam S (2009) On the selection of decision trees in random forests. In: International joint conference on neural networks, 2009. IJCNN 2009. IEEE, pp 302–307
Li HB, Wang W, Ding HW, Dong J (2010) Trees weighting random forest method for classifying high-dimensional noisy data. In: 2010 IEEE 7th international conference on e-business engineering (ICEBE). IEEE, pp 160–163
Oshiro TM, Perez PS, Baranauskas JA (2012) How many trees in a random forest? In: MLDM. Springer, pp 154–168
Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969
MathSciNet MATH Google Scholar
Tax N, Bockting S, Hiemstra D (2015) A cross-benchmark comparison of 87 learning to rank methods. Inf Process Manag 51(6):757–772
Google Scholar
Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 133–142
Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 391–398
Wu Q, Burges CJ, Svore KM, Gao J (2010) Adapting boosting for information retrieval measures. Inf Retr 13(3):254–270
Google Scholar
Metzler D, Croft WB (2007) Linear feature-based models for information retrieval. Inf Retr 10(3):257–274
Google Scholar
Ibrahim M (2019) Sampling non-relevant documents of training sets for learning-to-rank algorithms. Int J Mach Learn Comput 10(2) (to appear)
Ibrahim M (2019) Reducing correlation of random forest-based learning-to-rank algorithms using subsample size. Comput Intell 35(2):1–25
MathSciNet Google Scholar
Ibrahim M, Carman M (2014) Improving scalability and performance of random forest based learning-to-rank algorithms by aggressive subsampling. In: Proceedings of the 12th Australasian data mining conference, pp 91–99
He B, Macdonald C, Ounis I (2008) Retrieval sensitivity under training using different measures. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 67–74
Robertson S (2008) On the optimisation of evaluation metrics. In: Keynote, SIGIR 2008 workshop learning to rank for information retrieval (LR4IR)
Donmez P, Svore KM, Burges CJ (2009) On the local optimality of lambdarank. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 460–467
Yilmaz E, Robertson S (2010) On the choice of effectiveness measures for learning to rank. Inf Retr 13(3):271–290
Google Scholar
Hanbury A, Lupu M (2013) Toward a model of domain-specific search. In: Proceedings of the 10th conference on open research areas in information retrieval, Le Centre De Hautes Etudes Internationales D’Informatique Documentaire, pp 33–36
Hawking D (2004) Challenges in enterprise search. In: Proceedings of the 15th Australasian database conference, vol 27. Australian Computer Society, Inc., pp 15–24
McCallum A, Nigam K, Rennie J, Seymore K (1999) A machine learning approach to building domain-specific search engines. In: IJCAI, vol 99. Citeseer, pp 662–667
Owens L, Brown M, Poore K, Nicolson N (2008) The forrester wave: enterprise search, q2 2008. For information and knowledge management professionals
Yan X, Lau RY, Song D, Li X, Ma J (2011) Toward a semantic granularity model for domain-specific information retrieval. ACM TOIS 29(3):15
Google Scholar
Szummer M, Yilmaz E (2011) Semi-supervised learning to rank with preference regularization. In: Proceedings of the 20th ACM international conference on information and knowledge management (CIKM). ACM, pp 269–278
Tyree S, Weinberger KQ, Agrawal K, Paykin J (2011) Parallel boosted regression trees for web search ranking. In: Proceedings of the 20th international conference on world wide web. ACM, pp 387–396
Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory. Springer, pp 23–37
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232 (English summary)
MathSciNet MATH Google Scholar
Quoc C, Le V (2007) Learning to rank with nonsmooth cost functions. Proc Adv Neural Inf Process Syst 19:193–200
Google Scholar
Ganjisaffar Y, Caruana R, Lopes CV (2011) Bagging gradient-boosted trees for high precision, low variance ranking models. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 85–94
Ganjisaffar Y, Debeauvais T, Javanmardi S, Caruana R, Lopes CV (2011) Distributed tuning of machine learning algorithms using mapreduce clusters. In: Proceedings of the third workshop on large scale data mining: theory and applications. ACM, p 2

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Dhaka, Dhaka, 1000, Bangladesh
Muhammad Ibrahim

Authors

Muhammad Ibrahim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Ibrahim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Implementations and parameter settings of baseline algorithms

The description of different baseline algorithms along with their parameter settings is given as follows.

RankSVM [46] is an SVM-inspired pairwise LtR algorithm which has been used as a baseline in a large number of works on LtR. In our experiments, we use a publicly available implementation of it.^{Footnote 19}

RankBoost [44] is an Adaboost-inspired [64] pairwise algorithm which, instead of using standard exponential loss, uses a pairwise loss. RankLib^{Footnote 20} is a popular package of a number of LtR algorithms from which we use RankBoost’s implementation. We set the number of trees of the ensemble to 500.

AdaRank [47] also uses the AdaBoost framework but unlike RankBoost it adopts a listwise approach. We use the implementation in RankLib with the number of base learners set to 500 and NDCG@10 as the optimization metric for learning.

CoorAsc [49] algorithm uses the coordinate ascent method in a listwise manner. Its RankLib implementation is used.

A popular gradient-boosted regression tree ensemble [65]-based pointwise LtR algorithm is Mart [33]. We use its implementation of RankLib with the following change in its default parameter settings: number of trees = 500, number of leaves for each tree = 7 (according to [17, Ch. 10], any value between 4 and 8 is likely to work well).

One of the most popular LtR algorithms is LambdaMart [48]. It blends the ingenuine idea of approximated gradient of its predecessor, namely LambdaRank [66] with the gradient boosting framework [65]. In the Yahoo LtR Challenge [27], a variant of LambdaMart topped the winning list. We use an open-source implementation of it mentioned in [67].^{Footnote 21} The parameter settings we maintain are as follows: number of trees = 500, number of leaves for each tree = 31 (Ganjisaffar et al. [68] report that value close to this has been found to work well for MSLR-WEB10K). On smaller datasets, in order to mitigate overfitting we set it, like Mart, to 7. The rest of the parameters are kept unchanged. We note that during training, all the six baselines make use of validation sets.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ibrahim, M. An empirical comparison of random forest-based and other learning-to-rank algorithms. Pattern Anal Applic 23, 1133–1155 (2020). https://doi.org/10.1007/s10044-019-00856-6

Download citation

Received: 10 July 2018
Accepted: 18 October 2019
Published: 28 October 2019
Issue Date: August 2020
DOI: https://doi.org/10.1007/s10044-019-00856-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An empirical comparison of random forest-based and other learning-to-rank algorithms

Abstract

Access this article

Similar content being viewed by others

An Outlier Ranking Tree Selection Approach to Extreme Pruning of Random Forests

Feature ranking for multi-target regression

Rankboost $$+$$ : an improvement to Rankboost

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Implementations and parameter settings of baseline algorithms

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An empirical comparison of random forest-based and other learning-to-rank algorithms

Abstract

Access this article

Similar content being viewed by others

An Outlier Ranking Tree Selection Approach to Extreme Pruning of Random Forests

Feature ranking for multi-target regression

Rankboost $$+$$ : an improvement to Rankboost

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Implementations and parameter settings of baseline algorithms

Appendix: Implementations and parameter settings of baseline algorithms

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation