Optimization Formulation

Shi, Bin; Iyengar, S. S.

doi:10.1007/978-3-030-17076-9_3

Optimization Formulation

Bin Shi³ &
S. S. Iyengar⁴

Chapter
First Online: 13 June 2019

2621 Accesses

Abstract

Based on the description on the statistics model in the previous section, we formulate the problems that we need to solve from two angles. One is from the field of optimization, the other is from samples of probability distribution. Practically, from the view of efficient algorithms in computers, the representation of the first one is the expectation–maximization (EM) algorithm . The EM algorithm is used to find (local) maximum likelihood parameters of a statistical model in scenarios wherein the equations cannot be solved directly. These models use latent variables along with unknown parameters and known data observations, i.e., either there is a possibility of finding missing values among the data or the model can be formulated in more simple terms by assuming the existence of unobserved data points. A mixture model can be described in simplistic terms with an assumption that each of the observed data points have a corresponding unobserved data point, or latent variable that specifies the mixture component to which each of the data points belong.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

N. Agarwal, Z. Allen-Zhu, B. Bullins, E. Hazan, T. Ma, Finding approximate local minima faster than gradient descent, in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (ACM, New York, 2017), pp. 1195–1199
MATH Google Scholar
S. Bubeck, Convex optimization: algorithms and complexity. Found. Trends in Mach. Learn. 8(3–4), 231–357 (2015)
Article MATH Google Scholar
F.P. Bretherton, K. Bryan, J.D. Woods et al., Time-dependent greenhouse-gas-induced climate change. Clim. Change IPCC Sci. Assess. 1990, 173–194 (1990)
Google Scholar
R. Basri, D. Jacobs, Lambertian reflectance and linear subspaces. IEEE Trans. Pattern Anal. Mach. Intell. 25(2), 218–233 (2003)
Article Google Scholar
G.E.P. Box, G.M. Jenkins, G.C. Reinsel, G.M. Ljung, Time Series Analysis: Forecasting and Control (Wiley, London, 2015)
MATH Google Scholar
M.T. Bahadori, Y. Liu, An examination of practical granger causality inference, in Proceedings of the 2013 SIAM International Conference on data Mining (SIAM, 2013), pp. 467–475
Google Scholar
S. Bubeck, Y.T. Lee, R. Eldan, Kernel-based methods for bandit convex optimization, in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (ACM, New York, 2017), pp. 72–85
MATH Google Scholar
S. Bhojanapalli, B. Neyshabur, N. Srebro, Global optimality of local search for low rank matrix recovery, in Advances in Neural Information Processing Systems (2016), pp. 3873–3881
Google Scholar
S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press, Cambridge, 2004)
Book MATH Google Scholar
Y. Carmon, J.C. Duchi, Gradient descent efficiently finds the cubic-regularized non-convex Newton step. arXiv preprint arXiv:1612.00547 (2016)
Google Scholar
Y. Carmon, J.C. Duchi, O. Hinder, A. Sidford, Accelerated methods for non-convex optimization. arXiv preprint arXiv:1611.00756 (2016)
Google Scholar
C.M. Carvalho, M.S. Johannes, H.F. Lopes, N.G. Polson, Particle learning and smoothing. Stat. Sci. 25, 88–106 (2010)
Article MathSciNet MATH Google Scholar
Z. Charles, A. Jalali, R. Willett, Sparse subspace clustering with missing and corrupted data. arXiv preprint: arXiv:1707.02461 (2017)
Google Scholar
J. Costeira, T. Kanade, A multibody factorization method for independently moving objects. Int. J. Comput. Vis. 29(3), 159–179 (1998)
Article Google Scholar
F.E. Curtis, D.P. Robinson, M. Samadi, A trust region algorithm with a worst-case iteration complexity of O(𝜖 ^−3∕2) for nonconvex optimization. Math. Program. 162(1–2), 1–32 (2014)
MathSciNet MATH Google Scholar
B. Eriksson, L. Balzano, R. Nowak, High rank matrix completion, in Artificial Intelligence and Statistics (2012), pp. 373–381
Google Scholar
E. Elhamifar, R. Vidal, Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2765–2781 (2013)
Article Google Scholar
A.D. Flaxman, A.T. Kalai, H.B. McMahan, Online convex optimization in the bandit setting: gradient descent without a gradient, in Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia, 2005), pp. 385–394
MATH Google Scholar
R. Ge, F. Huang, C. Jin, Y. Yuan, Escaping from saddle points—online stochastic gradient for tensor decomposition, in Proceedings of the 28th Conference on Learning Theory (2015), pp. 797–842
Google Scholar
R. Ge, C. Jin, Y. Zheng, No spurious local minima in nonconvex low rank problems: a unified geometric analysis, in Proceedings of the 34th International Conference on Machine Learning (2017), pp. 1233–1242
Google Scholar
R. Ge, J.D. Lee, T. Ma, Matrix completion has no spurious local minimum, in Advances in Neural Information Processing Systems (2016), pp. 2973–2981
Google Scholar
P.E. Gill, W. Murray, Newton-type methods for unconstrained and linearly constrained optimization. Math. Program. 7(1), 311–350 (1974)
Article MathSciNet MATH Google Scholar
J.D. Hamilton, Time Series Analysis, vol. 2 (Princeton University Press, Princeton, 1994)
MATH Google Scholar
E. Hazan, K. Levy, Bandit convex optimization: towards tight bounds, in Advances in Neural Information Processing Systems (2014), pp. 784–792
Google Scholar
A. Jalali, Y. Chen, S. Sanghavi, H. Xu, Clustering Partially Observed Graphs Via Convex Optimization (ICML, 2011)
Google Scholar
M. Joshi, E. Hawkins, R. Sutton, J. Lowe, D. Frame, Projections of when temperature change will exceed 2 [deg] c above pre-industrial levels. Nat. Clim. Change 1(8), 407–412 (2011)
Article Google Scholar
Y. Liu, J.R. Kalagnanam, O. Johnsen, Learning dynamic temporal graphs for oil-production equipment monitoring system, in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, New York, 2009), pp. 1225–1234
Google Scholar
J.D. Lee, M. Simchowitz, M.I. Jordan, B. Recht, Gradient descent only converges to minimizers, in Conference on Learning Theory (2016), pp. 1246–1257
Google Scholar
X. Li, Z. Wang, J. Lu, R. Arora, J. Haupt, H. Liu, T. Zhao, Symmetry, saddle points, and global geometry of nonconvex matrix factorization. arXiv preprint arXiv:1612.09296 (2016)
Google Scholar
D.G. Luenberger, Y. Ye et al., Linear and Nonlinear Programming, vol. 2 (Springer, Berlin, 1984)
MATH Google Scholar
M. Liu, T. Yang, On noisy negative curvature descent: competing with gradient descent for faster non-convex optimization. arXiv preprint arXiv:1709.08571 (2017)
Google Scholar
T. Li, W. Zhou, C. Zeng, Q. Wang, Q. Zhou, D. Wang, J. Xu, Y. Huang, W. Wang, M. Zhang et al., DI-DAP: an efficient disaster information delivery and analysis platform in disaster management, in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (ACM, New York, 2016), pp. 1593–1602
Google Scholar
J.J. Moré, D.C. Sorensen, On the use of directions of negative curvature in a modified newton method. Math. Program. 16(1), 1–20 (1979)
Article MathSciNet MATH Google Scholar
Y. Nesterov, A Method of Solving a Convex Programming Problem with Convergence Rate o (1/k2) Soviet Mathematics Doklady, vol. 27 (1983), pp. 372–376
MATH Google Scholar
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, vol. 87 (Springer, Berlin, 2013)
MATH Google Scholar
Y. Nesterov, A. Nemirovsky, A general approach to polynomial-time algorithms design for convex programming, Tech. report, Technical report, Centr. Econ. & Math. Inst., USSR Acad. Sci., Moscow, USSR, 1988
Google Scholar
Y. Nesterov, B.T. Polyak, Cubic regularization of newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Article MathSciNet MATH Google Scholar
D. Park, A. Kyrillidis, C. Carmanis, S. Sanghavi, Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach, in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (2017), pp. 65–74
Google Scholar
B.T. Polyak, Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Article Google Scholar
B.T. Polyak, Introduction to Optimization (Translations series in mathematics and engineering) (Optimization Software, 1987)
Google Scholar
I. Panageas, G. Piliouras, Gradient descent only converges to minimizers: non-isolated critical points and invariant regions. arXiv preprint arXiv:1605.00405 (2016)
Google Scholar
C. Qu, H. Xu, Subspace clustering with irrelevant features via robust dantzig selector, in Advances in Neural Information Processing Systems (2015), pp. 757–765
Google Scholar
C.W. Royer, S.J. Wright, Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. arXiv preprint arXiv:1706.03131 (2017)
Google Scholar
S.J. Reddi, M. Zaheer, S. Sra, B. Poczos, F. Bach, R. Salakhutdinov, A.J. Smola, A generic approach for escaping saddle points. arXiv preprint arXiv:1709.01434 (2017)
Google Scholar
W. Su, S. Boyd, E. Candes, A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights, in Advances in Neural Information Processing Systems (2014), pp. 2510–2518
Google Scholar
M. Soltanolkotabi, E. Elhamifar, E.J. Candes, Robust subspace clustering. Ann. Stat. 42(2), 669–699 (2014)
Article MathSciNet MATH Google Scholar
M. Soltanolkotabi, Algorithms and theory for clustering and nonconvex quadratic programming. Ph.D. thesis, Stanford University, 2014
Google Scholar
J. Sun, Q. Qu, J. Wright, A geometric analysis of phase retrieval, in 2016 IEEE International Symposium on Information Theory (ISIT) (IEEE, Piscataway, 2016), pp. 2379–2383
Google Scholar
J. Sun, Q. Qu, J. Wright, Complete dictionary recovery over the sphere I: overview and the geometric picture. IEEE Trans. Inf. Theory 63(2), 853–884 (2017)
Article MathSciNet MATH Google Scholar
R. Tibshirani, Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
R. Vidal, Subspace clustering. IEEE Signal Process. Mag. 28(2), 52–68 (2011)
Article Google Scholar
S. Wright, J. Nocedal, Numerical Optimization, vol. 35, 7th edn. (Springer, Berlin, 1999), pp. 67–68.
Google Scholar
A.C. Wilson, B. Recht, M.I. Jordan, A lyapunov analysis of momentum methods in optimization. arXiv preprint arXiv:1611.02635 (2016)
Google Scholar
A. Wibisono, A.C. Wilson, M.I. Jordan, A variational perspective on accelerated methods in optimization. Proc. Nat. Acad. Sci. 113(47), E7351–E7358 (2016)
Article MathSciNet MATH Google Scholar
Y.-X. Wang, H. Xu, Noisy sparse subspace clustering. J. Mach. Learn. Res. 17(12), 1–41 (2016)
MathSciNet MATH Google Scholar
A. Zhang, N. Fawaz, S. Ioannidis, A. Montanari, Guess who rated this movie: identifying users through subspace clustering. arXiv preprint arXiv:1208.1544 (2012)
Google Scholar
C. Zeng, Q. Wang, S. Mokhtari, T. Li, Online context-aware recommendation with time varying multi-armed bandit, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, New York, 2016), pp. 2025–2034
Google Scholar
C. Zeng, Q. Wang, W. Wang, T. Li, L. Shwartz, Online inference for time-varying temporal dependency discovery from time series, in 2016 IEEE International Conference on Big Data (Big Data) (IEEE, Piscataway, 2016), pp. 1281–1290
Book Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Berkeley, USA
Bin Shi
Florida International University, Miami, FL, USA
S. S. Iyengar

Authors

Bin Shi
View author publications
You can also search for this author in PubMed Google Scholar
S. S. Iyengar
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shi, B., Iyengar, S.S. (2020). Optimization Formulation. In: Mathematical Theories of Machine Learning - Theory and Applications. Springer, Cham. https://doi.org/10.1007/978-3-030-17076-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-17076-9_3
Published: 13 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17075-2
Online ISBN: 978-3-030-17076-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics