Skip to main content

Optimization Formulation

  • Chapter
  • First Online:
  • 2621 Accesses

Abstract

Based on the description on the statistics model in the previous section, we formulate the problems that we need to solve from two angles. One is from the field of optimization, the other is from samples of probability distribution. Practically, from the view of efficient algorithms in computers, the representation of the first one is the expectation–maximization (EM) algorithm . The EM algorithm is used to find (local) maximum likelihood parameters of a statistical model in scenarios wherein the equations cannot be solved directly. These models use latent variables along with unknown parameters and known data observations, i.e., either there is a possibility of finding missing values among the data or the model can be formulated in more simple terms by assuming the existence of unobserved data points. A mixture model can be described in simplistic terms with an assumption that each of the observed data points have a corresponding unobserved data point, or latent variable that specifies the mixture component to which each of the data points belong.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. N. Agarwal, Z. Allen-Zhu, B. Bullins, E. Hazan, T. Ma, Finding approximate local minima faster than gradient descent, in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (ACM, New York, 2017), pp. 1195–1199

    MATH  Google Scholar 

  2. S. Bubeck, Convex optimization: algorithms and complexity. Found. Trends in Mach. Learn. 8(3–4), 231–357 (2015)

    Article  MATH  Google Scholar 

  3. F.P. Bretherton, K. Bryan, J.D. Woods et al., Time-dependent greenhouse-gas-induced climate change. Clim. Change IPCC Sci. Assess. 1990, 173–194 (1990)

    Google Scholar 

  4. R. Basri, D. Jacobs, Lambertian reflectance and linear subspaces. IEEE Trans. Pattern Anal. Mach. Intell. 25(2), 218–233 (2003)

    Article  Google Scholar 

  5. G.E.P. Box, G.M. Jenkins, G.C. Reinsel, G.M. Ljung, Time Series Analysis: Forecasting and Control (Wiley, London, 2015)

    MATH  Google Scholar 

  6. M.T. Bahadori, Y. Liu, An examination of practical granger causality inference, in Proceedings of the 2013 SIAM International Conference on data Mining (SIAM, 2013), pp. 467–475

    Google Scholar 

  7. S. Bubeck, Y.T. Lee, R. Eldan, Kernel-based methods for bandit convex optimization, in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (ACM, New York, 2017), pp. 72–85

    MATH  Google Scholar 

  8. S. Bhojanapalli, B. Neyshabur, N. Srebro, Global optimality of local search for low rank matrix recovery, in Advances in Neural Information Processing Systems (2016), pp. 3873–3881

    Google Scholar 

  9. S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press, Cambridge, 2004)

    Book  MATH  Google Scholar 

  10. Y. Carmon, J.C. Duchi, Gradient descent efficiently finds the cubic-regularized non-convex Newton step. arXiv preprint arXiv:1612.00547 (2016)

    Google Scholar 

  11. Y. Carmon, J.C. Duchi, O. Hinder, A. Sidford, Accelerated methods for non-convex optimization. arXiv preprint arXiv:1611.00756 (2016)

    Google Scholar 

  12. C.M. Carvalho, M.S. Johannes, H.F. Lopes, N.G. Polson, Particle learning and smoothing. Stat. Sci. 25, 88–106 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  13. Z. Charles, A. Jalali, R. Willett, Sparse subspace clustering with missing and corrupted data. arXiv preprint: arXiv:1707.02461 (2017)

    Google Scholar 

  14. J. Costeira, T. Kanade, A multibody factorization method for independently moving objects. Int. J. Comput. Vis. 29(3), 159–179 (1998)

    Article  Google Scholar 

  15. F.E. Curtis, D.P. Robinson, M. Samadi, A trust region algorithm with a worst-case iteration complexity of O(𝜖 −3∕2) for nonconvex optimization. Math. Program. 162(1–2), 1–32 (2014)

    MathSciNet  MATH  Google Scholar 

  16. B. Eriksson, L. Balzano, R. Nowak, High rank matrix completion, in Artificial Intelligence and Statistics (2012), pp. 373–381

    Google Scholar 

  17. E. Elhamifar, R. Vidal, Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2765–2781 (2013)

    Article  Google Scholar 

  18. A.D. Flaxman, A.T. Kalai, H.B. McMahan, Online convex optimization in the bandit setting: gradient descent without a gradient, in Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia, 2005), pp. 385–394

    MATH  Google Scholar 

  19. R. Ge, F. Huang, C. Jin, Y. Yuan, Escaping from saddle points—online stochastic gradient for tensor decomposition, in Proceedings of the 28th Conference on Learning Theory (2015), pp. 797–842

    Google Scholar 

  20. R. Ge, C. Jin, Y. Zheng, No spurious local minima in nonconvex low rank problems: a unified geometric analysis, in Proceedings of the 34th International Conference on Machine Learning (2017), pp. 1233–1242

    Google Scholar 

  21. R. Ge, J.D. Lee, T. Ma, Matrix completion has no spurious local minimum, in Advances in Neural Information Processing Systems (2016), pp. 2973–2981

    Google Scholar 

  22. P.E. Gill, W. Murray, Newton-type methods for unconstrained and linearly constrained optimization. Math. Program. 7(1), 311–350 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  23. J.D. Hamilton, Time Series Analysis, vol. 2 (Princeton University Press, Princeton, 1994)

    MATH  Google Scholar 

  24. E. Hazan, K. Levy, Bandit convex optimization: towards tight bounds, in Advances in Neural Information Processing Systems (2014), pp. 784–792

    Google Scholar 

  25. A. Jalali, Y. Chen, S. Sanghavi, H. Xu, Clustering Partially Observed Graphs Via Convex Optimization (ICML, 2011)

    Google Scholar 

  26. M. Joshi, E. Hawkins, R. Sutton, J. Lowe, D. Frame, Projections of when temperature change will exceed 2 [deg] c above pre-industrial levels. Nat. Clim. Change 1(8), 407–412 (2011)

    Article  Google Scholar 

  27. Y. Liu, J.R. Kalagnanam, O. Johnsen, Learning dynamic temporal graphs for oil-production equipment monitoring system, in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, New York, 2009), pp. 1225–1234

    Google Scholar 

  28. J.D. Lee, M. Simchowitz, M.I. Jordan, B. Recht, Gradient descent only converges to minimizers, in Conference on Learning Theory (2016), pp. 1246–1257

    Google Scholar 

  29. X. Li, Z. Wang, J. Lu, R. Arora, J. Haupt, H. Liu, T. Zhao, Symmetry, saddle points, and global geometry of nonconvex matrix factorization. arXiv preprint arXiv:1612.09296 (2016)

    Google Scholar 

  30. D.G. Luenberger, Y. Ye et al., Linear and Nonlinear Programming, vol. 2 (Springer, Berlin, 1984)

    MATH  Google Scholar 

  31. M. Liu, T. Yang, On noisy negative curvature descent: competing with gradient descent for faster non-convex optimization. arXiv preprint arXiv:1709.08571 (2017)

    Google Scholar 

  32. T. Li, W. Zhou, C. Zeng, Q. Wang, Q. Zhou, D. Wang, J. Xu, Y. Huang, W. Wang, M. Zhang et al., DI-DAP: an efficient disaster information delivery and analysis platform in disaster management, in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (ACM, New York, 2016), pp. 1593–1602

    Google Scholar 

  33. J.J. Moré, D.C. Sorensen, On the use of directions of negative curvature in a modified newton method. Math. Program. 16(1), 1–20 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  34. Y. Nesterov, A Method of Solving a Convex Programming Problem with Convergence Rate o (1/k2) Soviet Mathematics Doklady, vol. 27 (1983), pp. 372–376

    MATH  Google Scholar 

  35. Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, vol. 87 (Springer, Berlin, 2013)

    MATH  Google Scholar 

  36. Y. Nesterov, A. Nemirovsky, A general approach to polynomial-time algorithms design for convex programming, Tech. report, Technical report, Centr. Econ. & Math. Inst., USSR Acad. Sci., Moscow, USSR, 1988

    Google Scholar 

  37. Y. Nesterov, B.T. Polyak, Cubic regularization of newton method and its global performance. Math. Program. 108(1), 177–205 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  38. D. Park, A. Kyrillidis, C. Carmanis, S. Sanghavi, Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach, in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (2017), pp. 65–74

    Google Scholar 

  39. B.T. Polyak, Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)

    Article  Google Scholar 

  40. B.T. Polyak, Introduction to Optimization (Translations series in mathematics and engineering) (Optimization Software, 1987)

    Google Scholar 

  41. I. Panageas, G. Piliouras, Gradient descent only converges to minimizers: non-isolated critical points and invariant regions. arXiv preprint arXiv:1605.00405 (2016)

    Google Scholar 

  42. C. Qu, H. Xu, Subspace clustering with irrelevant features via robust dantzig selector, in Advances in Neural Information Processing Systems (2015), pp. 757–765

    Google Scholar 

  43. C.W. Royer, S.J. Wright, Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. arXiv preprint arXiv:1706.03131 (2017)

    Google Scholar 

  44. S.J. Reddi, M. Zaheer, S. Sra, B. Poczos, F. Bach, R. Salakhutdinov, A.J. Smola, A generic approach for escaping saddle points. arXiv preprint arXiv:1709.01434 (2017)

    Google Scholar 

  45. W. Su, S. Boyd, E. Candes, A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights, in Advances in Neural Information Processing Systems (2014), pp. 2510–2518

    Google Scholar 

  46. M. Soltanolkotabi, E. Elhamifar, E.J. Candes, Robust subspace clustering. Ann. Stat. 42(2), 669–699 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  47. M. Soltanolkotabi, Algorithms and theory for clustering and nonconvex quadratic programming. Ph.D. thesis, Stanford University, 2014

    Google Scholar 

  48. J. Sun, Q. Qu, J. Wright, A geometric analysis of phase retrieval, in 2016 IEEE International Symposium on Information Theory (ISIT) (IEEE, Piscataway, 2016), pp. 2379–2383

    Google Scholar 

  49. J. Sun, Q. Qu, J. Wright, Complete dictionary recovery over the sphere I: overview and the geometric picture. IEEE Trans. Inf. Theory 63(2), 853–884 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  50. R. Tibshirani, Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  51. R. Vidal, Subspace clustering. IEEE Signal Process. Mag. 28(2), 52–68 (2011)

    Article  Google Scholar 

  52. S. Wright, J. Nocedal, Numerical Optimization, vol. 35, 7th edn. (Springer, Berlin, 1999), pp. 67–68.

    Google Scholar 

  53. A.C. Wilson, B. Recht, M.I. Jordan, A lyapunov analysis of momentum methods in optimization. arXiv preprint arXiv:1611.02635 (2016)

    Google Scholar 

  54. A. Wibisono, A.C. Wilson, M.I. Jordan, A variational perspective on accelerated methods in optimization. Proc. Nat. Acad. Sci. 113(47), E7351–E7358 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  55. Y.-X. Wang, H. Xu, Noisy sparse subspace clustering. J. Mach. Learn. Res. 17(12), 1–41 (2016)

    MathSciNet  MATH  Google Scholar 

  56. A. Zhang, N. Fawaz, S. Ioannidis, A. Montanari, Guess who rated this movie: identifying users through subspace clustering. arXiv preprint arXiv:1208.1544 (2012)

    Google Scholar 

  57. C. Zeng, Q. Wang, S. Mokhtari, T. Li, Online context-aware recommendation with time varying multi-armed bandit, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, New York, 2016), pp. 2025–2034

    Google Scholar 

  58. C. Zeng, Q. Wang, W. Wang, T. Li, L. Shwartz, Online inference for time-varying temporal dependency discovery from time series, in 2016 IEEE International Conference on Big Data (Big Data) (IEEE, Piscataway, 2016), pp. 1281–1290

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Shi, B., Iyengar, S.S. (2020). Optimization Formulation. In: Mathematical Theories of Machine Learning - Theory and Applications. Springer, Cham. https://doi.org/10.1007/978-3-030-17076-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-17076-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-17075-2

  • Online ISBN: 978-3-030-17076-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics