Skip to main content

Optimization Basics: A Machine Learning View

  • Chapter
  • First Online:
Book cover Linear Algebra and Optimization for Machine Learning

Abstract

Many machine learning models are often cast as continuous optimization problems in multiple variables. The simplest example of such a problem is least-squares regression, which is also viewed as a fundamental problem in linear algebra. This is because solving a (consistent) system of equations is a special case of least-squares regression. In least-squares regression, one finds the best-fit solution to a system of equations that may or may not be consistent, and the loss corresponds to the aggregate squared error of the best fit. The special case of a consistent system of equations yields a loss value of 0. Least-squares regression has a special place in linear algebra, optimization, and machine learning, because it serves as a foundational problem in all three disciplines. Least-squares regression historically preceded the classification problem in machine learning, and the optimization models for classification were often motivated as modifications of the least-squares regression model. The main difference between least-squares regression and classification is that the predicted target variable is numerical in the former, whereas it is discrete (typically binary) in the latter. Therefore, the optimization model for linear regression needs to be “repaired” in order to make it usable for discrete target variables. This chapter will make a special effort to show how least-squares regression is so foundational to machine learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 64.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    It is possible to construct pathological counter-examples where this is not true.

References

  1. C. Aggarwal. Data mining: The textbook. Springer, 2015.

    MATH  Google Scholar 

  2. C. Aggarwal. Machine learning for text. Springer, 2018.

    Book  Google Scholar 

  3. C. Aggarwal. Recommender systems: The textbook. Springer, 2016.

    Book  Google Scholar 

  4. C. Aggarwal. Outlier analysis. Springer, 2017.

    Book  Google Scholar 

  5. M. Bazaraa, H. Sherali, and C. Shetty. Nonlinear programming: theory and algorithms. John Wiley and Sons, 2013.

    MATH  Google Scholar 

  6. J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, pp. 281–305, 2012.

    MathSciNet  MATH  Google Scholar 

  7. D. Bertsekas. Nonlinear programming. Athena scientific, 1999.

    Google Scholar 

  8. D. Bertsimas and J. Tsitsiklis. Introduction to linear optimization. Athena Scientific, 1997.

    Google Scholar 

  9. C. M. Bishop. Pattern recognition and machine learning. Springer, 2007.

    MATH  Google Scholar 

  10. C. M. Bishop. Neural networks for pattern recognition. Oxford University Press, 1995.

    MATH  Google Scholar 

  11. E. Bodewig. Matrix calculus. Elsevier, 2014.

    MATH  Google Scholar 

  12. S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, 2004.

    Book  Google Scholar 

  13. C. Chang and C. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27, 2011. http://www.csie.ntu.edu.tw/~cjlin/libsvm/

  14. C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3), pp. 273–297, 1995.

    MATH  Google Scholar 

  15. N. Cristianini, and J. Shawe-Taylor. An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, 2000.

    Book  Google Scholar 

  16. N. Draper and H. Smith. Applied regression analysis. John Wiley & Sons, 2014.

    MATH  Google Scholar 

  17. R. Duda, P. Hart, and D. Stork. Pattern classification. John Wiley and Sons, 2012.

    MATH  Google Scholar 

  18. R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, pp. 1871–1874, 2008. http://www.csie.ntu.edu.tw/~cjlin/liblinear/

    MATH  Google Scholar 

  19. R. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7: pp. 179–188, 1936.

    Article  Google Scholar 

  20. P. Flach. Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press, 2012.

    Book  Google Scholar 

  21. G. Golub and C. F. Van Loan. Matrix computations, John Hopkins University Press, 2012.

    MATH  Google Scholar 

  22. I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. MIT Press, 2016.

    MATH  Google Scholar 

  23. T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2009.

    Book  Google Scholar 

  24. G. Hinton. Connectionist learning procedures. Artificial Intelligence, 40(1–3), pp. 185–234, 1989.

    Article  Google Scholar 

  25. S. Marsland. Machine learning: An algorithmic perspective, CRC Press, 2015.

    Google Scholar 

  26. T. Minka. A comparison of numerical optimizers for logistic regression. Unpublished Draft, 2003.

    Google Scholar 

  27. T. Mitchell. Machine learning, McGraw Hill, 1997.

    MATH  Google Scholar 

  28. K. Murphy. Machine learning: A probabilistic perspective, MIT Press, 2012.

    MATH  Google Scholar 

  29. J. Nocedal and S. Wright. Numerical optimization. Springer, 2006.

    MATH  Google Scholar 

  30. F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386, 1958.

    Google Scholar 

  31. G. Strang. An introduction to linear algebra, Fifth Edition. Wellseley-Cambridge Press, 2016.

    MATH  Google Scholar 

  32. G. Strang. Linear algebra and its applications, Fourth Edition. Brooks Cole, 2011.

    Google Scholar 

  33. A. Tikhonov and V. Arsenin. Solution of ill-posed problems. Winston and Sons, 1977.

    MATH  Google Scholar 

  34. H. Wendland. Numerical linear algebra: An introduction. Cambridge University Press, 2018.

    MATH  Google Scholar 

  35. B. Widrow and M. Hoff. Adaptive switching circuits. IRE WESCON Convention Record, 4(1), pp. 96–104, 1960.

    Google Scholar 

  36. S. Wright. Coordinate descent algorithms. Mathematical Programming, 151(1), pp. 3–34, 2015.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Aggarwal, C.C. (2020). Optimization Basics: A Machine Learning View. In: Linear Algebra and Optimization for Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-40344-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-40344-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-40343-0

  • Online ISBN: 978-3-030-40344-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics