Skip to main content
Log in

Goal scoring, coherent loss and applications to machine learning

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

Motivated by the binary classification problem in machine learning, we study in this paper a class of decision problems where the decision maker has a list of goals, from which he aims to attain the maximal possible number of goals. In binary classification, this essentially means seeking a prediction rule to achieve the lowest probability of misclassification, and computationally it involves minimizing a (difficult) non-convex, 0–1 loss function. To address the intractability, previous methods consider minimizing the cumulative loss—the sum of convex surrogates of the 0–1 loss of each goal. We revisit this paradigm and develop instead an axiomatic framework by proposing a set of salient properties on functions for goal scoring and then propose the coherent loss approach, which is a tractable upper-bound of the loss over the entire set of goals. We show that the proposed approach yields a strictly tighter approximation to the total loss (i.e., the number of missed goals) than any convex cumulative loss approach while preserving the convexity of the underlying optimization problem. Moreover, this approach, applied to for binary classification, also has a robustness interpretation which builds a connection to robust SVMs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The margin a is introduced to ensure that Theorem 5 holds. Notice that the hinge-loss approximation with or without the margin leads to the same formulation of the standard SVM.

References

  1. Ahmed, S., Shapiro, A.: Solving chance-constrained stochastic programs via sampling and integer programming. In: Tutorial in Operations Research, pp. 261–269. Informs (2008)

  2. Arora, S., Babai, L., Stern, J., Sweedyk, Z.: The hardness of approximate optima in lattices, codes, and systems of linear equations. J. Comput. Syst. Sci. 54, 317–331 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  3. Artzner, P., Delbaen, F., Eber, J., Heath, D.: Coherent measures of risk. Math. Finance 9, 203–228 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  4. Asuncion, A., Newman, D.J.: UCI machine learning repository (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html

  5. Atamtürk, A., Nemhauser, G.L., Savelsbergh, M.W.P.: The mixed vertex packing problems. Math. Program. 99, 35–53 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  6. Ben-David, S., Eiron, N., Long, P.M.: On the difficulty of approximately maximizing agreements. J. Comput. Syst. Sci. 66, 496–513 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bordley, R., LiCalzi, M.: Decision analysis using targets instead of utility functions. Decis. Econ. Finance 23, 53–74 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  8. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, New York, NY, pp. 144–152 (1992)

  9. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  10. Brown, D., Sim, M.: Satisficing measures for analysis of risky positions. Manag. Sci. 55(1), 71–84 (2009)

    Article  MATH  Google Scholar 

  11. Castagnoli, E., LiCalzi, M.: Expected utility without utility. Theory Decis. 41, 281–301 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  12. Charnes, A., Cooper, W.W.: Management models and industrial applications of linear programming. Manag. Sci. 4(1), 38–91 (1957)

    Article  MathSciNet  MATH  Google Scholar 

  13. Charnes, A., Cooper, W.W.: Chance constrained programming. Manag. Sci. 6, 73–79 (1959)

    Article  MathSciNet  MATH  Google Scholar 

  14. Charnes, A., Cooper, W.W., Ferguson, R.: Optimal estimation of executive compensation by linear programming. Manag. Sci. 1, 138–151 (1955)

    Article  MathSciNet  MATH  Google Scholar 

  15. Charnes, A., Haynes, K.E., Hazleton, J.E., Ryan, M.J.: An hierarchical goal programming approach to environmental-land use management. In: Mathematical Analysis of Decision Problems in Ecology, pp. 2–13 (1975)

  16. Chen, W., Sim, M.: Goal driven optimization. Oper. Res. 57(2), 342–357 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  17. Cortes, C., Vapnik, V.N.: Support vector networks. Mach. Learn. 20, 1–25 (1995)

    MATH  Google Scholar 

  18. Courtney, J.F., Klastorin, T.D., Ruefli, T.W.: A goal programming approach to urban-suburban location preferences. Manag. Sci. 18(6), 258–268 (1972)

    Article  Google Scholar 

  19. Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2002)

    MATH  Google Scholar 

  20. Delbaen, F.: Coherent Risk Measures on General Probability Spaces, pp. 1–37. Springer, Berlin (2002)

    MATH  Google Scholar 

  21. Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  22. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 28, 337–407 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  23. Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In: Blondel, V., Boyd, S., Kimura, H. (eds.) Recent Advances in Learning and Control, pp. 95–110. Springer, Berlin (2008)

    Chapter  Google Scholar 

  24. Grant, M., Boyd, S.: CVX: Matlab Software for Disciplined Convex Programming, Version 1.21 (2011). http://cvxr.com/cvx

  25. Gurobi Optimization, I.: Gurobi Optimizer Reference Manual (2013). http://www.gurobi.com

  26. Lam, S., Ng, T., Sim, M., Song, J.: Multiple objectives satisficing under uncertainty. Oper. Res. 61(1), 214–227 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  27. Lee, Y., Lin, Y., Wahba, G.: Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data. J. Am. Stat. Assoc. 99, 67–81 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  28. Liu, Y., Shen, X.: Multicategory \(\varphi \)-learning. J. Am. Stat. Assoc. 101(474), 500–509 (2006)

    Article  MathSciNet  Google Scholar 

  29. Luedtke, J., Ahmed, S.: A sample approximation approach for optimization with probabilistic constraints. SIAM J. Optim. 19, 674–699 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  30. Nemirovski, A., Shapiro, A.: Scenario approximation of chance constraints. In: Calafiore, G., Dabbene, F. (eds.) Probabilistic and Randomized Methods for Design Under Uncertainty, pp. 3–48. Springer, London (2005)

    Google Scholar 

  31. Nemirovski, A., Shapiro, A.: Convex approximations of chance constrained programs. SIAM J. Optim. 17, 969–996 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  32. Norton, M., Mafusalov, A., Uryasev, S.: Soft margin support vector classification as buffered probability minimization. J. Mach. Learn. Res. 18, 1–43 (2017)

    MathSciNet  MATH  Google Scholar 

  33. Norton, M., Uryasev, S.: Maximization of AUC and buffered AUC in classification. Research Report (2015)

  34. Poggio, T., Rifkin, R., Mukherjee, S., Niyogi, P.: General conditions for predictivity in learning theory. Nature 428(6981), 419–422 (2004)

    Article  Google Scholar 

  35. Prékopa, A.: On probabilistic constrained programming. In: Proceedings of the Princeton Symposium on Mathematical Programming, pp. 113–138 (1970)

  36. Prékopa, A.: Stochastic Programming, pp. 319–371. Kluwer, Dordrecht (1995)

    Book  MATH  Google Scholar 

  37. Rockafellar, R., Royset, J.: On buffered failure probability in design and optimization of structures. Reliabil. Eng. Syst. Safety 95(5), 499–510 (2010)

    Article  Google Scholar 

  38. Schapire, E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37, 297–336 (1999)

    Article  MATH  Google Scholar 

  39. Schölkopf, B., Smola, A.J.: Learning with Kernels, pp. 407–423. MIT Press, Cambridge (2002)

    Google Scholar 

  40. Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on Stochastic Programming: Modeling and Theory, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2014)

    Book  MATH  Google Scholar 

  41. Shivaswamy, P.K., Bhattacharyya, C., Smola, A.J.: Second order cone programming approaches for handling missing and uncertain data. J. Mach. Learn. Res. 7, 1283–1314 (2006)

    MathSciNet  MATH  Google Scholar 

  42. Simon, H.: A behavior model for rational choice. Q. J. Econ. 69, 99–118 (1955)

    Article  Google Scholar 

  43. Simon, H.: Theories of decision-making in economics and behavioral science. Am. Econ. Rev. 49(3), 253–283 (1959)

    Google Scholar 

  44. Vapnik, V.N., Chervonenkis, A.: The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognit. Image Anal. 1(3), 260–284 (1991)

    Google Scholar 

  45. Vapnik, V.N., Lerner, A.: Pattern recognition using generalized portrait method. Autom. Remote Control 24, 744–780 (1963)

    Google Scholar 

  46. Vazirani, V.: Approximation Algorithms. Springer, Berlin (2001)

    MATH  Google Scholar 

  47. Yang, W., Xu, H.: The Coherent Loss Function for Classification. ICML, Stockholm (2014)

    Google Scholar 

  48. Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Stat. 32, 56–85 (2004)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenzhuo Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, W., Sim, M. & Xu, H. Goal scoring, coherent loss and applications to machine learning. Math. Program. 182, 103–140 (2020). https://doi.org/10.1007/s10107-019-01387-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-019-01387-y

Keywords

Mathematics Subject Classification

Navigation