Goal scoring, coherent loss and applications to machine learning

Abstract

Motivated by the binary classification problem in machine learning, we study in this paper a class of decision problems where the decision maker has a list of goals, from which he aims to attain the maximal possible number of goals. In binary classification, this essentially means seeking a prediction rule to achieve the lowest probability of misclassification, and computationally it involves minimizing a (difficult) non-convex, 0–1 loss function. To address the intractability, previous methods consider minimizing the cumulative loss—the sum of convex surrogates of the 0–1 loss of each goal. We revisit this paradigm and develop instead an axiomatic framework by proposing a set of salient properties on functions for goal scoring and then propose the coherent loss approach, which is a tractable upper-bound of the loss over the entire set of goals. We show that the proposed approach yields a strictly tighter approximation to the total loss (i.e., the number of missed goals) than any convex cumulative loss approach while preserving the convexity of the underlying optimization problem. Moreover, this approach, applied to for binary classification, also has a robustness interpretation which builds a connection to robust SVMs.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. 1.

    The margin a is introduced to ensure that Theorem 5 holds. Notice that the hinge-loss approximation with or without the margin leads to the same formulation of the standard SVM.

References

  1. 1.

    Ahmed, S., Shapiro, A.: Solving chance-constrained stochastic programs via sampling and integer programming. In: Tutorial in Operations Research, pp. 261–269. Informs (2008)

  2. 2.

    Arora, S., Babai, L., Stern, J., Sweedyk, Z.: The hardness of approximate optima in lattices, codes, and systems of linear equations. J. Comput. Syst. Sci. 54, 317–331 (1997)

    MathSciNet  MATH  Google Scholar 

  3. 3.

    Artzner, P., Delbaen, F., Eber, J., Heath, D.: Coherent measures of risk. Math. Finance 9, 203–228 (1999)

    MathSciNet  MATH  Google Scholar 

  4. 4.

    Asuncion, A., Newman, D.J.: UCI machine learning repository (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html

  5. 5.

    Atamtürk, A., Nemhauser, G.L., Savelsbergh, M.W.P.: The mixed vertex packing problems. Math. Program. 99, 35–53 (2000)

    MathSciNet  MATH  Google Scholar 

  6. 6.

    Ben-David, S., Eiron, N., Long, P.M.: On the difficulty of approximately maximizing agreements. J. Comput. Syst. Sci. 66, 496–513 (2003)

    MathSciNet  MATH  Google Scholar 

  7. 7.

    Bordley, R., LiCalzi, M.: Decision analysis using targets instead of utility functions. Decis. Econ. Finance 23, 53–74 (2000)

    MathSciNet  MATH  Google Scholar 

  8. 8.

    Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, New York, NY, pp. 144–152 (1992)

  9. 9.

    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  10. 10.

    Brown, D., Sim, M.: Satisficing measures for analysis of risky positions. Manag. Sci. 55(1), 71–84 (2009)

    MATH  Google Scholar 

  11. 11.

    Castagnoli, E., LiCalzi, M.: Expected utility without utility. Theory Decis. 41, 281–301 (1996)

    MathSciNet  MATH  Google Scholar 

  12. 12.

    Charnes, A., Cooper, W.W.: Management models and industrial applications of linear programming. Manag. Sci. 4(1), 38–91 (1957)

    MathSciNet  MATH  Google Scholar 

  13. 13.

    Charnes, A., Cooper, W.W.: Chance constrained programming. Manag. Sci. 6, 73–79 (1959)

    MathSciNet  MATH  Google Scholar 

  14. 14.

    Charnes, A., Cooper, W.W., Ferguson, R.: Optimal estimation of executive compensation by linear programming. Manag. Sci. 1, 138–151 (1955)

    MathSciNet  MATH  Google Scholar 

  15. 15.

    Charnes, A., Haynes, K.E., Hazleton, J.E., Ryan, M.J.: An hierarchical goal programming approach to environmental-land use management. In: Mathematical Analysis of Decision Problems in Ecology, pp. 2–13 (1975)

  16. 16.

    Chen, W., Sim, M.: Goal driven optimization. Oper. Res. 57(2), 342–357 (2009)

    MathSciNet  MATH  Google Scholar 

  17. 17.

    Cortes, C., Vapnik, V.N.: Support vector networks. Mach. Learn. 20, 1–25 (1995)

    MATH  Google Scholar 

  18. 18.

    Courtney, J.F., Klastorin, T.D., Ruefli, T.W.: A goal programming approach to urban-suburban location preferences. Manag. Sci. 18(6), 258–268 (1972)

    Google Scholar 

  19. 19.

    Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2002)

    MATH  Google Scholar 

  20. 20.

    Delbaen, F.: Coherent Risk Measures on General Probability Spaces, pp. 1–37. Springer, Berlin (2002)

    Google Scholar 

  21. 21.

    Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    MathSciNet  MATH  Google Scholar 

  22. 22.

    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 28, 337–407 (2000)

    MathSciNet  MATH  Google Scholar 

  23. 23.

    Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In: Blondel, V., Boyd, S., Kimura, H. (eds.) Recent Advances in Learning and Control, pp. 95–110. Springer, Berlin (2008)

    Google Scholar 

  24. 24.

    Grant, M., Boyd, S.: CVX: Matlab Software for Disciplined Convex Programming, Version 1.21 (2011). http://cvxr.com/cvx

  25. 25.

    Gurobi Optimization, I.: Gurobi Optimizer Reference Manual (2013). http://www.gurobi.com

  26. 26.

    Lam, S., Ng, T., Sim, M., Song, J.: Multiple objectives satisficing under uncertainty. Oper. Res. 61(1), 214–227 (2013)

    MathSciNet  MATH  Google Scholar 

  27. 27.

    Lee, Y., Lin, Y., Wahba, G.: Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data. J. Am. Stat. Assoc. 99, 67–81 (2004)

    MathSciNet  MATH  Google Scholar 

  28. 28.

    Liu, Y., Shen, X.: Multicategory \(\varphi \)-learning. J. Am. Stat. Assoc. 101(474), 500–509 (2006)

    MathSciNet  Google Scholar 

  29. 29.

    Luedtke, J., Ahmed, S.: A sample approximation approach for optimization with probabilistic constraints. SIAM J. Optim. 19, 674–699 (2008)

    MathSciNet  MATH  Google Scholar 

  30. 30.

    Nemirovski, A., Shapiro, A.: Scenario approximation of chance constraints. In: Calafiore, G., Dabbene, F. (eds.) Probabilistic and Randomized Methods for Design Under Uncertainty, pp. 3–48. Springer, London (2005)

    Google Scholar 

  31. 31.

    Nemirovski, A., Shapiro, A.: Convex approximations of chance constrained programs. SIAM J. Optim. 17, 969–996 (2006)

    MathSciNet  MATH  Google Scholar 

  32. 32.

    Norton, M., Mafusalov, A., Uryasev, S.: Soft margin support vector classification as buffered probability minimization. J. Mach. Learn. Res. 18, 1–43 (2017)

    MathSciNet  MATH  Google Scholar 

  33. 33.

    Norton, M., Uryasev, S.: Maximization of AUC and buffered AUC in classification. Research Report (2015)

  34. 34.

    Poggio, T., Rifkin, R., Mukherjee, S., Niyogi, P.: General conditions for predictivity in learning theory. Nature 428(6981), 419–422 (2004)

    Google Scholar 

  35. 35.

    Prékopa, A.: On probabilistic constrained programming. In: Proceedings of the Princeton Symposium on Mathematical Programming, pp. 113–138 (1970)

  36. 36.

    Prékopa, A.: Stochastic Programming, pp. 319–371. Kluwer, Dordrecht (1995)

    Google Scholar 

  37. 37.

    Rockafellar, R., Royset, J.: On buffered failure probability in design and optimization of structures. Reliabil. Eng. Syst. Safety 95(5), 499–510 (2010)

    Google Scholar 

  38. 38.

    Schapire, E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37, 297–336 (1999)

    MATH  Google Scholar 

  39. 39.

    Schölkopf, B., Smola, A.J.: Learning with Kernels, pp. 407–423. MIT Press, Cambridge (2002)

    Google Scholar 

  40. 40.

    Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on Stochastic Programming: Modeling and Theory, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2014)

    Google Scholar 

  41. 41.

    Shivaswamy, P.K., Bhattacharyya, C., Smola, A.J.: Second order cone programming approaches for handling missing and uncertain data. J. Mach. Learn. Res. 7, 1283–1314 (2006)

    MathSciNet  MATH  Google Scholar 

  42. 42.

    Simon, H.: A behavior model for rational choice. Q. J. Econ. 69, 99–118 (1955)

    Google Scholar 

  43. 43.

    Simon, H.: Theories of decision-making in economics and behavioral science. Am. Econ. Rev. 49(3), 253–283 (1959)

    Google Scholar 

  44. 44.

    Vapnik, V.N., Chervonenkis, A.: The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognit. Image Anal. 1(3), 260–284 (1991)

    Google Scholar 

  45. 45.

    Vapnik, V.N., Lerner, A.: Pattern recognition using generalized portrait method. Autom. Remote Control 24, 744–780 (1963)

    Google Scholar 

  46. 46.

    Vazirani, V.: Approximation Algorithms. Springer, Berlin (2001)

    Google Scholar 

  47. 47.

    Yang, W., Xu, H.: The Coherent Loss Function for Classification. ICML, Stockholm (2014)

    Google Scholar 

  48. 48.

    Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Stat. 32, 56–85 (2004)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Wenzhuo Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yang, W., Sim, M. & Xu, H. Goal scoring, coherent loss and applications to machine learning. Math. Program. 182, 103–140 (2020). https://doi.org/10.1007/s10107-019-01387-y

Download citation

Keywords

  • Satisficing
  • Goal
  • Robust optimization
  • Classification
  • SVM
  • Coherent loss

Mathematics Subject Classification

  • 90C29