Abstract
Ordinal regression can be seen as a special case of preference learning, in which the class labels corresponding with data instances can take values from an ordered finite set. In such a setting, the classes usually have a linguistic interpretation attached by humans to subdivide the data into a number of preference bins. In this chapter, we give a general survey on ordinal regression from a machine learning point of view. In particular, we elaborate on some important connections with ROC analysis that have been introduced recently by the present authors. First, the important role of an underlying ranking function in ordinal regression models is discussed, as well as its impact on the performance evaluation of such models. Subsequently, we describe a new ROC-based performance measure that directly evaluates the underlying ranking function, and we place it in the more general context of ROC analysis as the volume under an r-dimensional ROC surface (VUS) for in general rclasses. Furthermore, we also discuss the scalability of this measure and show that it can be computed very efficiently for large samples. Finally, we present a kernel-based learning algorithm that optimizes VUS as a specific case of structured support vector machines.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
The MOSEK-package can be freely downloaded for noncommercial use from www.mosek.com.
References
S.Agarwal, T.Graepel, R.Herbrich, S.Har-Peled, D.Roth, Generalization bounds for the area under the ROC curve. J. Mach. Learn. Res. 6, 393–425 (2005)
A.Agresti, Categorical Data Analysis, 2nd version. (Wiley, 2002)
K.Ataman, N.Street, Y.Zhang, Learning to rank by maximizing AUC with linear programming, in Proceedings of the IEEE International Joint Conference on Neural Networks(Vancouver, BC, Canada, 2006), pp. 123–129
C.J.C. Burges, A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)
Z.Cao, T.Qin, T.Liu, M.Tsai, H.Li, Learning to rank: from pairwise approach to listwise approach, in Proceedings of the International Conference on Machine Learning(Corvallis, OR, USA, 2007), pp. 129–136
K.Cao-Van, Supervised Ranking, from semantics to algorithms. PhD thesis, Ghent University, Belgium, 2003
W.Chu, Z.Ghahramani, Gaussian processes for ordinal regression. J. Mach. Learn. Res. 6, 1019–1041 (2005)
W.Chu, Z.Ghahramani, Preference learning with Gaussian processes, in Proceedings of the International Conference on Machine Learning(Bonn, Germany, 2005), pp. 137–144
W.Chu, S.Keerthi, New approaches to support vector ordinal regression, in Proceedings of the International Conference on Machine Learning(Bonn, Germany, 2005), pp. 321–328
W.Chu, S.Keerthi, Support vector ordinal regression. Neural Comput. 19(3), 792–815 (2007)
S.Clémençon, N.Vayatis, Ranking the best instances. J. Mach. Learn. Res. 8, 2671–2699 (2007)
W.Cohen, R.Schapire, Y.Singer, Learning to order things, in Advances in Neural Information Processing Systems, vol.10 (MIT, Vancouver, Canada, 1998), pp. 451–457
C.Cortes, M.Mohri, AUC optimization versus error rate minimization, in Advances in Neural Information Processing Systems, vol.16 (MIT, Vancouver, Canada, 2003), pp. 313–320
C.Cortes, M.Mohri. Confidence intervals for the area under the ROC curve, in Advances in Neural Information Processing Systems, vol.17 (MIT, Vancouver, Canada, 2004), pp. 305–312
C.Cortes, M.Mohri, A.Rastogi, Magnitude-preserving ranking algorithms, in Proceedings of the International Conference on Machine Learning(Corvallis, OR, USA, 2007), pp. 169–176
K.Crammer, Y.Singer, Pranking with ranking, in Proceedings of the Conference on Neural Information Processing Systems(Vancouver, Canada, 2001), pp. 641–647
N.Cristianini, J.Shawe-Taylor, An Introduction to Support Vector Machines(Cambridge University Press, 2000)
B.De Baets, H.De Meyer, Transitivity frameworks for reciprocal relations: cycle-transitivity versus { FG}-transitivity. Fuzzy Sets Syst. 152, 249–270 (2005)
B.De Baets, H.De Meyer, B.De Schuymer, S.Jenei, Cyclic evaluation of transitivity of reciprocal relations. Soc. Choice Welfare 26, 217–238 (2006)
B.De Schuymer, H.De Meyer, B.De Baets, Cycle-transitive comparison of independent random variables. J. Multivar. Anal. 96, 352–373 (2005)
B.De Schuymer, H.De Meyer, B.De Baets, S.Jenei, On the cycle-transitivity of the dice model. Theory Decis. 54, 164–185 (2003)
S.Dreiseitl, L.Ohno-Machado, M.Binder, Comparing three-class diagnostic tests by three-way ROC analysis. Med. Decis. Mak. 20, 323–331 (2000)
E.Frank, M.Hall, A simple approach to ordinal classification, in Proceedings of the European Conference on Machine Learning, (Freibourg, Germany, 2001), pp. 146–156
Y.Freund, R.Yier, R.Schapire, Y.Singer, An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)
J.Fürnkranz, Round robin classification. J. Mach. Learn. Res. 2, 723–747 (2002)
J.Fürnkranz, E.Hüllermeier, Pairwise preference learning and ranking. Lect. Notes Comput. Sci. 2837, 145–156 (2003)
M.Gönen, G.Heller, Concordance probability and discriminatory power in proportional hazards regression. Biometrika 92(4), 965–970 (2005)
D.Hand, R.Till, A simple generalization of the area under the ROC curve for multiple class problems. Mach. Learn. 45, 171–186 (2001)
J.Hanley, B.McNeil, The meaning and use of the area under a receiver operating characteristics curve. Radiology 143, 29–36 (1982)
J.Hanley, B.McNeil, A method of comparing receiver operating characteristics curves derived from the same class. Radiology 148, 839–843 (1983)
S.Har-Peled, D.Roth, D.Zimak, Constraint classification: A new approach to multi-class classification and ranking, in Advances in Neural Information Processing Systems, vol.15 (MIT, Vancouver, Canada, 2002), pp. 785–792
E.Harrington, Online ranking/collaborative filtering using the perceptron algorithm, in Proceedings of the 20th International Conference on Machine Learning(Washington, USA, 2003), pp. 250–257
T.Hastie, R.Tibshirani, Classification by pairwise coupling. Ann. Stat. 26(2), 451–471 (1998)
M.Heller, B.Schnörr, Learning sparse representations by non-negative matrix factorization and sequential cone programming. J. Mach. Learn. Res. 7, 1385–1407 (2006)
R.Herbrich, T.Graepel, K.Obermayer, Large margin rank boundaries for ordinal regression, in Advances in Large Margin Classifiers, ed. by A.Smola, P.Bartlett, B.Schölkopf, D.Schuurmans (MIT, 2000), pp. 115–132
A.Herschtal, B.Raskutti, Optimizing area under the ROC curve using gradient descent, in Proceedings of the International Conference on Machine Learning(Bonn, Germany, 2005), pp. 49–57
J.Higgins, Introduction to Modern Nonparametric Statistics(Duxbury, 2004)
E.Hüllermeier, J.Hühn, Is an ordinal class structure useful in classifier learning? Int. J. Data Min. Model. Manag. 1(1), 45–67 (2009)
T.Joachims, A support vector method for multivariate performance measures, in Proceedings of the International Conference on Machine Learning(Bonn, Germany, 2005), pp. 377–384
J.Kelley, The cutting plane method for convex programs. J. Soc. Ind. Appl. Math. 9, 703–712 (1960)
S.Kramer, G.Widmer, B.Pfahringer, M.Degroeve, Prediction of ordinal classes using regression trees. Fundam. Informaticae 24, 1–15 (2000)
G.Lanckriet, N.Cristianini, P.Bartlett, L.El Gaoui, M.Jordan, Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5, 27–72 (2004)
L.Lehmann, Nonparametrics: Statistical Methods based on Ranks(Holden Day, 1975)
S.Lievens, B.De Baets, K.Cao-Van, A probabilistic framework for the design of instance-based supervised ranking algorithms in an ordinal setting. Ann. Oper. Res. 163, 115–142 (2008)
H.Lin, L.Li, Large-margin thresholded ensembles for ordinal regression: Theory and practice. Lect. Notes Comput. Sci. 4264, 319–333 (2006)
R.Luce, P.Suppes, Handbook of Mathematical Psychology, chapter Preference, Utility and Subjective Probability (Wiley, 1965), pp. 249–410
P.McCullagh, Regression models for ordinal data. J. R. Stat. Soc. B 42(2), 109–142 (1980)
C.Nakas, C.Yiannoutsos, Ordered multiple-class ROC analysis with continuous measurements. Stat. Med. 22, 3437–3449 (2004)
M.Öztürk, A.Tsoukiàs, Ph. Vincke, Preference modelling, in Multiple Criteria Decision Analysis. State of the Art Surveys, ed. by J.Figueira, S.Greco, M.Ehrgott (Springer, 2005), pp. 27–71
J.Platt, N.Cristianini, J.Shawe-Taylor, Large margin DAGs for multiclass classification. Adv. Neural Process. Syst. 12, 547–553 (2000)
R.Potharst, J.C. Bioch, Decision trees for ordinal classification. Intell. Data Process. 4(2) (2000)
J.D. Rennie, N.Srebro, Loss functions for preference levels: Regression with discrete, ordered labels in Proceedings of the IJCAI Multidisciplinary Workshop on Advances in Preference Handling, (Edinburgh, Scotland, 2005), pp. 180–186
R.Rifkin, A.Klautau, In defense of one-versus-all classification. J. Mach. Learn. Res. 5, 101–143 (2004)
B.Schölkopf, A.Smola. Learning with Kernels, Support Vector Machines, Regularisation, Optimization and Beyond(MIT, 2002)
A.Shashua, A.Levin, Ranking with large margin principle: Two approaches, in Advances in Neural Information Processing Systems, vol.16 (MIT, Vancouver, Canada, 2003), pp. 937–944
V.Torra, J.Domingo-Ferrer, J.M. Mateo-Sanz, M.Ng, Regression for ordinal variables without underlying continuous variables. Inf. Sci. 176, 465–476 (2006)
Y.Tsochantaridis, T.Joachims, T.Hofmann, Y.Altun, Large margin methods for structured and independent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)
G.Tutz, K.Hechenbichler, Aggregating classifiers with ordinal response structure. J. Stat. Comput. Simul. 75(5), (2004)
W.Waegeman, B.De Baets, On the ERA ranking representability of multi-class classifiers. Artif. Intell. (2009) submitted
W.Waegeman, B.De Baets, L.Boullart, Learning a layered graph with a maximal number of paths connecting source and sink, in Proceedings of the ICML Workshop on Constrained Optimization and Structured Output Spaces(Corvallis, OR, USA, 2007)
W.Waegeman, B.De Baets, L.Boullart, Learning layered ranking functions with structured support vector machines. Neural Netw. 21(10), 1511–1523 (2008)
W.Waegeman, B.De Baets, L.Boullart, On the scalability of ordered multi-class ROC analysis. Comput. Stat. Data Anal. 52, 3371–3388 (2008)
W.Waegeman, B.De Baets, L.Boullart, ROC analysis in ordinal regression learning. Pattern Recognit. Lett. 29, 1–9 (2008)
J.Xu, Y.Cao, H.Li, Y.Huang, Cost-sensitive learning of SVM for ranking, in Proceedings of the 17th European Conference on Machine Learning(Berlin, Germany, 2006), pp. 833–840
L.Yan, R.Dodier, M.Mozer, R.Wolniewiecz, Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic, in Proceedings of the International Conference on Machine Learning(Washington, DC, USA, 2003), pp. 848–855
S.Yu, K.Yu, V.Tresp, H.Kriegel, Collaborative ordinal regression, in Proceedings of the International Conference on Machine Learning(Pittsburgh, PA, 2006), pp. 1089–1096
Acknowledgements
Willem Waegeman has been supported by a grant of the “Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT-Vlaanderen)”. He is currently supported by the Research Foundation Flanders.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Waegeman, W., Baets, B.D. (2010). A Survey on ROC-based Ordinal Regression. In: Fürnkranz, J., Hüllermeier, E. (eds) Preference Learning. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14125-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-14125-6_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14124-9
Online ISBN: 978-3-642-14125-6
eBook Packages: Computer ScienceComputer Science (R0)