Advertisement

Information Retrieval

, Volume 13, Issue 3, pp 201–215 | Cite as

Efficient algorithms for ranking with SVMs

  • O. Chapelle
  • S. S. Keerthi
Learning to rank for information retrieval

Abstract

RankSVM (Herbrich et al. in Advances in large margin classifiers. MIT Press, Cambridge, MA, 2000; Joachims in Proceedings of the ACM conference on knowledge discovery and data mining (KDD), 2002) is a pairwise method for designing ranking models. SVMLight is the only publicly available software for RankSVM. It is slow and, due to incomplete training with it, previous evaluations show RankSVM to have inferior ranking performance. We propose new methods based on primal Newton method to speed up RankSVM training and show that they are 5 orders of magnitude faster than SVMLight. Evaluation on the Letor benchmark datasets after complete training using such methods shows that the performance of RankSVM is excellent.

Keywords

Ranking Support vector machines AUC optimization 

References

  1. Aizerman, M., Braverman, E., & Rozonoer, L. (1964). Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25, 821–837.MathSciNetGoogle Scholar
  2. Barrett, R., & Romine, C. (1994). Templates for the solution of linear systems: Building blocks for iterative methods. Society for Industrial Mathematics.Google Scholar
  3. Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. In J. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 161–168).Google Scholar
  4. Burges, C. J., Le, Q. V., & Ragno, R. (2007). Learning to rank with nonsmooth cost functions. In B. Schölkopf, J. Platt, & T. Hofmann (Eds.), Advances in neural information processing systems (Vol. 19).Google Scholar
  5. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., et al. (2005). Learning to rank using gradient descent. In Proceedings of the international conference on machine learning.Google Scholar
  6. Cao, Y., Xu, J., Liu, T. Y., Li, H., Huang, Y., & Hon, H. W. (2006). Adapting ranking SVM to document retrieval. In SIGIR.Google Scholar
  7. Cao, Z., Qin, T., Liu, T. Y., Tsai, M. F., & Li, H. (2007). Learning to rank: From pairwise approach to listwise approach. In International conference on machine learning.Google Scholar
  8. Chapelle, O. (2007a). Optimization techniques for support vector machines. Talk at the workshop on Numerical tools and fast algorithms for massive data mining, search engines and applications, UCLA, http://www.ipam.ucla.edu/publications/sews2/sews2_7130.pdf
  9. Chapelle, O. (2007b). Training a support vector machine in the primal. Neural Computation, 19(5), 1155–1178.MATHCrossRefMathSciNetGoogle Scholar
  10. Cossock, D., & Zhang, T. (2006). Subset ranking using regression. In Proceedings of the 19th annual conference on learning theory. Lecture notes in computer science (Vol. 4005, pp. 605–619). Berlin: Springer.Google Scholar
  11. Dembo, R., & Steihaug, T. (1983). Truncated-newton algorithms for large-scale unconstrained optimization. Mathematical Programming, 26(2), 190–212.MATHCrossRefMathSciNetGoogle Scholar
  12. Freund, Y., Iyer, R., Schapire, R. E., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933–969.CrossRefMathSciNetGoogle Scholar
  13. Grinspan, P. (2007). A connection between DCG and a pairwise-defined loss function, Internal Yahoo! memo.Google Scholar
  14. Herbrich, R., Graepel, T., & Obermayer, K. (2000) Large margin rank boundaries for ordinal regression. In B. Smola & S. Schoelkopf (Eds.), Advances in large margin classifiers. Cambridge, MA: MIT Press.Google Scholar
  15. Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of the ACM conference on knowledge discovery and data mining (KDD), ACM.Google Scholar
  16. Joachims, T. (2005). A support vector method for multivariate performance measures. In International conference on machine learning (ICML), pp. 377–384.Google Scholar
  17. Joachims, T. (2006). Training linear SVMs in linear time. In ACM SIGKDD International conference on knowledge discovery and data mining (KDD), pp. 217–226.Google Scholar
  18. Keerthi, S. S., & DeCoste, D. M. (2005). A modified finite Newton method for fast solution of large scale linear SVMs. Journal of Machine Learning Research, 6, 341–361.MathSciNetGoogle Scholar
  19. Keerthi, S. S., & Shevade, S. (2007). A fast tracking algorithm for generalized LARS/LASSO. IEEE Transactions on Neural Networks, 18(6), 1826–1830.CrossRefGoogle Scholar
  20. Kimeldorf, G. S., & Wahba, G. (1970). A correspondence between bayesian estimation on stochastic processes and smoothing by splines. Annals of Mathematical Statistics, 41, 495–502.MATHCrossRefMathSciNetGoogle Scholar
  21. Lewis, D., Yang, Y., Rose, T., & Li, F. (2004). Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361–397.Google Scholar
  22. Liu, T. Y., Xu, J., Qin, T., Xiong, W., & Li, H. (2007). Letor: Benchmark dataset for research on learning to rank for information retrieval. In LR4IR 2007, in conjunction with SIGIR 2007.Google Scholar
  23. MSR. (2008). Ranking SVM on LETOR. Microsoft Research Asia, http://www.research.microsoft.com/en-us/um/beijing/projects/letor/Baselines /RankSVM.htm.
  24. Schölkopf, B., & Smola, A. (2002). Learning with Kernels. Cambridge, MA: MIT Press.Google Scholar
  25. Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal estimated sub-gradient solver for SVM. In Proceedings of the international conference on machine learning.Google Scholar
  26. Shewchuk, J. R. (1994). An introduction to the conjugate gradient method without the agonizing pain. Tech. Rep. CMU-CS-94-125, School of Computer Science, Carnegie Mellon University.Google Scholar
  27. Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.MathSciNetGoogle Scholar
  28. Zheng. Z., Zha, H., Zhang, T., Chapelle, O., Chen, K., & Sun, G. (2008), A general boosting method and its application to learning ranking functions for web search. In Advances in neural information processing systems (Vol. 20, pp. 1697–1704). MIT Press.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Yahoo! ResearchSanta ClaraUSA

Personalised recommendations