Gradient Descent Converges to Minimizers: Optimal and Adaptive Step-Size Rules

Shi, Bin; Iyengar, S. S.

doi:10.1007/978-3-030-17076-9_7

Gradient Descent Converges to Minimizers: Optimal and Adaptive Step-Size Rules

Bin Shi³ &
S. S. Iyengar⁴

Chapter
First Online: 13 June 2019

2745 Accesses

Abstract

As mentioned in Chap. 3, gradient descent (GD) and its variants provide the core optimization methodology in machine learning problems. Given a C ¹ or C ² function $f: \mathbb {R}^{n} \rightarrow \mathbb {R}$ with unconstrained variable $x \in \mathbb {R}^{n}$, GD uses the following update rule:

$$\displaystyle x_{t+1} = x_{t} - h_t \nabla f\left (x_t\right ), $$

where h _t are step size, which may be either fixed or vary across iterations. When f is convex, $h_t < \frac {2}{L}$ is a necessary and sufficient condition to guarantee the (worst-case) convergence of GD, where L is the Lipschitz constant of the gradient of the function f. On the other hand, there is far less understanding of GD for non-convex problems. For general smooth non-convex problems, GD is only known to converge to a stationary point (i.e., a point with zero gradient).

Part of this chapter is in the paper titled “gradient decent converges to minimizers: optimal and adaptive step size rules” by Bin Shi et al. (2018) presently under review for publication in INFORMS, Journal on Optimization.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
For the purpose of this paper, strict saddle points include local maximizers.
2.
f ⁿ(x) means the application of f on x repetitively for n times.

References

N. Agarwal, Z. Allen-Zhu, B. Bullins, E. Hazan, T. Ma, Finding approximate local minima faster than gradient descent, in STOC (2017), pp. 1195–1199. http://arxiv.org/abs/1611.01146
S. Bhojanapalli, B. Neyshabur, N. Srebro, Global optimality of local search for low rank matrix recovery, in Advances in Neural Information Processing Systems (2016), pp. 3873–3881
Google Scholar
Y. Carmon, J.C. Duchi, Gradient descent efficiently finds the cubic-regularized non-convex Newton step. arXiv preprint arXiv:1612.00547 (2016)
Google Scholar
Y. Carmon, J.C. Duchi, O. Hinder, A. Sidford, Accelerated methods for non-convex optimization. arXiv preprint arXiv:1611.00756 (2016)
Google Scholar
F.E. Curtis, D.P. Robinson, M. Samadi, A trust region algorithm with a worst-case iteration complexity of O(𝜖 ^−3∕2) for nonconvex optimization. Math. Program. 162(1–2), 1–32 (2014)
MathSciNet MATH Google Scholar
S.S. Du, C. Jin, J.D. Lee, M.I. Jordan, B. Poczos, A. Singh, Gradient descent can take exponential time to escape saddle points, in Proceedings of Advances in Neural Information Processing Systems (NIPS) (2017), pp. 1067–1077
Google Scholar
R. Ge, F. Huang, C. Jin, Y. Yuan, Escaping from saddle points—online stochastic gradient for tensor decomposition, in Proceedings of the 28th Conference on Learning Theory (2015), pp. 797–842
Google Scholar
R. Ge, C. Jin, Y. Zheng, No spurious local minima in nonconvex low rank problems: a unified geometric analysis, in Proceedings of the 34th International Conference on Machine Learning (2017), pp. 1233–1242
Google Scholar
R. Ge, J.D. Lee, T. Ma, Matrix completion has no spurious local minimum, in Advances in Neural Information Processing Systems (2016), pp. 2973–2981
Google Scholar
P.E. Gill, W. Murray, Newton-type methods for unconstrained and linearly constrained optimization. Math. Program. 7(1), 311–350 (1974)
Article MathSciNet MATH Google Scholar
P. Hartman, The stable manifold of a point of a hyperbolic map of a banach space. J. Differ. Equ. 9(2), 360–379 (1971)
Article MathSciNet MATH Google Scholar
P. Hartman, Ordinary Differential Equations, Classics in Applied Mathematics, vol. 38 (Society for Industrial and Applied Mathematics (SIAM), Philadelphia, 2002). Corrected reprint of the second (1982) edition 1982
Google Scholar
C. Jin, R. Ge, P. Netrapalli, S.M. Kakade, M.I. Jordan, How to escape saddle points efficiently, in Proceedings of the 34th International Conference on Machine Learning (2017), pp. 1724–1732
Google Scholar
C. Jin, P. Netrapalli, M.I. Jordan, Accelerated gradient descent escapes saddle points faster than gradient descent. arXiv preprint arXiv:1711.10456 (2017)
Google Scholar
J.D. Lee, I. Panageas, G. Piliouras, M. Simchowitz, M.I. Jordan, B. Recht, First-order methods almost always avoid saddle points. arXiv preprint arXiv:1710.07406 (2017)
Google Scholar
J.D. Lee, M. Simchowitz, M.I. Jordan, B. Recht, Gradient descent only converges to minimizers, in Conference on Learning Theory (2016), pp. 1246–1257
Google Scholar
X. Li, Z. Wang, J. Lu, R. Arora, J. Haupt, H. Liu, T. Zhao, Symmetry, saddle points, and global geometry of nonconvex matrix factorization. arXiv preprint arXiv:1612.09296 (2016)
Google Scholar
M. Liu, T. Yang, On noisy negative curvature descent: competing with gradient descent for faster non-convex optimization. arXiv preprint arXiv:1709.08571 (2017)
Google Scholar
J.J. Moré, D.C. Sorensen, On the use of directions of negative curvature in a modified newton method. Math. Program. 16(1), 1–20 (1979)
Article MathSciNet MATH Google Scholar
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, vol. 87 (Springer, Berlin, 2013)
MATH Google Scholar
Y. Nesterov, B.T. Polyak, Cubic regularization of newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Article MathSciNet MATH Google Scholar
M. O’Neill, S.J. Wright, Behavior of accelerated gradient methods near critical points of nonconvex problems. arXiv preprint arXiv:1706.07993 (2017)
Google Scholar
R. Pemantle, Nonconvergence to unstable points in urn models and stochastic approximations. Ann. Probab. 18(2), 698–712 (1990)
Article MathSciNet MATH Google Scholar
D. Park, A. Kyrillidis, C. Carmanis, S. Sanghavi, Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach, in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (2017), pp. 65–74
Google Scholar
I. Panageas, G. Piliouras, Gradient descent only converges to minimizers: non-isolated critical points and invariant regions. arXiv preprint arXiv:1605.00405 (2016)
Google Scholar
C.W. Royer, S.J. Wright, Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. arXiv preprint arXiv:1706.03131 (2017)
Google Scholar
S.J. Reddi, M. Zaheer, S. Sra, B. Poczos, F. Bach, R. Salakhutdinov, A.J. Smola, A generic approach for escaping saddle points. arXiv preprint arXiv:1709.01434 (2017)
Google Scholar
M. Shub, Global Stability of Dynamical Systems (Springer, Berlin, 2013)
Google Scholar
J. Sun, Q. Qu, J. Wright, A geometric analysis of phase retrieval, in 2016 IEEE International Symposium on Information Theory (ISIT) (IEEE, Piscataway, 2016), pp. 2379–2383
Google Scholar
J. Sun, Q. Qu, J. Wright, Complete dictionary recovery over the sphere I: overview and the geometric picture. IEEE Trans. Inf. Theory 63(2), 853–884 (2017)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Berkeley, USA
Bin Shi
Florida International University, Miami, FL, USA
S. S. Iyengar

Authors

Bin Shi
View author publications
You can also search for this author in PubMed Google Scholar
S. S. Iyengar
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shi, B., Iyengar, S.S. (2020). Gradient Descent Converges to Minimizers: Optimal and Adaptive Step-Size Rules. In: Mathematical Theories of Machine Learning - Theory and Applications. Springer, Cham. https://doi.org/10.1007/978-3-030-17076-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-17076-9_7
Published: 13 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17075-2
Online ISBN: 978-3-030-17076-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics