# Maximization of AUC and Buffered AUC in binary classification

- 152 Downloads
- 1 Citations

## Abstract

In binary classification, performance metrics that are defined as the probability that some error exceeds a threshold are numerically difficult to optimize directly and also hide potentially important information about the magnitude of errors larger than the threshold. Defining similar metrics, instead, using Buffered Probability of Exceedance (bPOE) generates counterpart metrics that resolve both of these issues. We apply this approach to the case of AUC, the Area Under the ROC curve, and define Buffered AUC (bAUC). We show that bAUC can provide insights into classifier performance not revealed by AUC, while being closely related as the tightest concave lower bound and representable as the area under a modified ROC curve. Additionally, while AUC is numerically difficult to optimize directly, we show that bAUC optimization often reduces to convex or linear programming. Extending these results, we show that AUC and bAUC are special cases of Generalized bAUC and that popular Support Vector Machine (SVM) formulations for approximately maximizing AUC are equivalent to direct maximization of Generalized bAUC. We also prove bAUC generalization bounds for these SVM’s. As a central component to these results, we provide an important, novel formula for calculating bPOE, the inverse of Conditional Value-at-Risk. Using this formula, we show that particular bPOE minimization problems reduce to convex and linear programming.

## Keywords

Buffered Probability of Exceedance Conditional-Value-at-Risk AUC ROC curve Buffered AUC Support Vector Machine Convex programming Classification performance metric Generalization## Mathematics Subject Classification

90C15 90C25 68T10## Notes

### Acknowledgements

Authors would like to thank Prof. R.T. Rockafellar and Dr. Alexander Mafusalov for their valuable comments and suggestions. This work was partially supported by USA Air Force Office of Scientific Research grant: “Design and Redesign of Engineering Systems”, FA9550-12-1-0427, and “New Developments in Uncertainty: Linking Risk Management, Reliability, Statistics and Stochastic Optimization”, FA9550-11-1-0258 as well as the DARPA grant “Risk-Averse Optimization of Large-Scale Multiphysics Systems.”

## References

- 1.Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013). http://archive.ics.uci.edu/ml. Accessed 15 Aug 2017
- 2.Bousquet, O., Elisseeff, A.: Stability and generalization. J. Mach. Learn. Res.
**2**(Mar), 499–526 (2002)MathSciNetzbMATHGoogle Scholar - 3.Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn.
**30**(7), 1145–1159 (1997)Google Scholar - 4.Brefeld, U., Scheffer, T.: AUC maximizing support vector learning. In: Proceedings of the ICML 2005 Workshop on ROC Analysis in Machine Learning (2005)Google Scholar
- 5.Caruana, R., Baluja, S., Mitchell, T., et al.: Using the future to “sort out” the present: Rankprop and multitask learning for medical risk evaluation. Adv. Neural Inf. Process. Syst.
**8**, 959–965 (1996)Google Scholar - 6.Chapelle, O., Keerthi, S.S.: Efficient algorithms for ranking with svms. Inf. Retr.
**13**(3), 201–215 (2010)Google Scholar - 7.Cortes, C., Mohri, M.: AUC optimization vs. error rate minimization. Adv. Neural Inf. Process. Syst.
**16**(16), 313–320 (2004)Google Scholar - 8.Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn.
**20**(3), 273–297 (1995)zbMATHGoogle Scholar - 9.Davis, J.R., Uryasev, S.: Analysis of tropical storm damage using Buffered Probability of Exceedance. Nat. Hazards
**83**, 465–483 (2016)Google Scholar - 10.Egan, J.P.: Signal detection theory and ROC analysis. Academic press series in cognition and perception. Academic Press, London, UK (1975)Google Scholar
- 11.Fawcett, T.: An introduction to roc analysis. Pattern Recogn. Lett.
**27**(8), 861–874 (2006)MathSciNetGoogle Scholar - 12.Ferri, C., Flach, P., Hernández-Orallo, J., Senad, A.: Modifying roc curves to incorporate predicted probabilities. In: Proceedings of the Second Workshop on ROC Analysis in Machine Learning, pp. 33–40 (2005)Google Scholar
- 13.Frittelli, M., Gianin, E.R.: Law invariant convex risk measures. In: Advances in Mathematical Economics, pp. 33–46. Springer, Tokyo (2005)Google Scholar
- 14.Hanley, J.A., McNeil, B.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology
**143**(1), 29–36 (1982)Google Scholar - 15.Herbrich, R., Graepel, T., Obermayer, K.: Large margin rank boundaries for ordinal regression. In: Advances in neural information processing systems (1999)Google Scholar
- 16.Hernández-Orallo, J., Flach, P., Ferri, C.: A unified view of performance metrics: translating threshold choice into expected classification loss. J. Mach. Learn. Res.
**13**(Oct), 2813–2869 (2012)MathSciNetzbMATHGoogle Scholar - 17.Herschtal, A., Raskutti, B.: Optimising area under the ROC curve using gradient descent. In: Proceedings of the Twenty-First International Conference on Machine Learning. ACM, p. 49 (2004)Google Scholar
- 18.Krm, E., Yildirak, K., Weber, G.: A classification problem of credit risk rating investigated and solved by optimization of the ROC curve. CEJOR
**20**(3), 529–557 (2012).**(in the special issue at the occasion of EURO XXIV 2010 in Lisbon)**MathSciNetzbMATHGoogle Scholar - 19.Ling, C.X., Huang, J., Zhang, H.: Auc: a statistically consistent and more discriminating measure than accuracy. IJCAI
**3**, 519–524 (2003)Google Scholar - 20.Mafusalov, A., Shapiro, A., Uryasev, S.: Estimation and asymptotics for buffered probability of exceedance. Eur. J. Oper. Res.
**270**(3), 826–836 (2018)MathSciNetzbMATHGoogle Scholar - 21.Mafusalov, A., Uryasev, S.: Buffered probability of exceedance: mathematical properties and optimization. SIAM J. Optim.
**28**(2), 1077–1103 (2018)MathSciNetzbMATHGoogle Scholar - 22.Miura, K., Yamashita, S., Eguchi, S.: Area under the curve maximization method in credit scoring. J. Risk Model Valid.
**4**(2), 3–25 (2010)Google Scholar - 23.Norton, M., Mafusalov, A., Uryasev, S.: Soft margin support vector classification as buffered probability minimization. J. Mach. Learn. Res.
**18**(1), 2285–2327 (2017)MathSciNetzbMATHGoogle Scholar - 24.Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res.
**12**, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar - 25.Platt, J., et al.: Probabilistic outputs for Support Vector Machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif.
**10**(3), 61–74 (1999)Google Scholar - 26.Provost, F.J., Fawcett, T., et al.: Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. KDD
**97**, 43–48 (1997)Google Scholar - 27.Provost, F.J., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. ICML
**98**, 445–453 (1998)Google Scholar - 28.Rockafellar, R., Royset, J.: On buffered failure probability in design and optimization of structures. Reliab. Eng. Syst. Saf.
**95**, 499–510 (2010)Google Scholar - 29.Rockafellar, R., Uryasev, S.: Optimization of Conditional Value-at-Risk. J. Risk
**2**(3), 21–41 (2000)Google Scholar - 30.Rockafellar, R.T., Uryasev, S.: Conditional Value-at-Risk for general loss distributions. J. Bank. Finance
**26**(7), 1443–1471 (2002)Google Scholar - 31.Schapire, W.W.C.R.E., Singer, Y.: Learning to order things. Adv. Neural Inf. Process. Syst.
**10**, 451 (1998)Google Scholar - 32.Swets, J.A.: Measuring the accuracy of diagnostic systems. Science
**240**(4857), 1285–1293 (1988)MathSciNetzbMATHGoogle Scholar - 33.Swets, J.A., Dawes, R.M., Monahan, J.: Better decisions through. Sci. Am.
**283**, 82–87 (2000)Google Scholar - 34.Tayal, A., Coleman, T.F., Li, Y.: Rankrc: large-scale nonlinear rare class ranking. IEEE Trans. Knowl. Data Eng.
**27**(12), 3347–3359 (2015)Google Scholar - 35.Uryasev, S.: Buffered Probability of Exceedance and buffered service level: definitions and properties. Department of Industrial and Systems Engineering, University of Florida, Research Report 2014-3 (2014)Google Scholar
- 36.Vapnik, V.: Statistical Learning Theory, vol. 1. Wiley, New York (1998)zbMATHGoogle Scholar
- 37.Vinyals, O., Jia, Y., Deng, L., Darrell, T.: Learning with recursive perceptual representations. In: Advances in Neural Information Processing Systems, pp. 2825–2833 (2012)Google Scholar
- 38.Wu, S., Flach, P.: A scored AUC metric for classifier evaluation and selection. In: Second Workshop on ROC Analysis in ML, Bonn, Germany (2005)Google Scholar
- 39.Yan, L., Dodier, R.H., Mozer, M.C., Wolniewicz, R.H.: Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 848–855 (2003)Google Scholar
- 40.Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In: ICML, vol. 1. Citeseer, pp. 609–616 (2001)Google Scholar
- 41.Zadrozny, B., Elkan, C.: Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 694–699 (2002)Google Scholar
- 42.Zou, K.H.: Receiver Operating Characteristic (ROC) Literature Research. On-Line Bibliography (2002). http://www.splwebbwhharvardedu8000. Accessed 15 Aug 2017