Exploiting negative curvature in deterministic and stochastic optimization

Curtis, Frank E.; Robinson, Daniel P.

doi:10.1007/s10107-018-1335-8

Exploiting negative curvature in deterministic and stochastic optimization

Full Length Paper
Series B
Published: 03 October 2018

Volume 176, pages 69–94, (2019)
Cite this article

Mathematical Programming Submit manuscript

Frank E. Curtis¹ &
Daniel P. Robinson²

1366 Accesses
13 Citations
Explore all metrics

Abstract

This paper addresses the question of whether it can be beneficial for an optimization algorithm to follow directions of negative curvature. Although prior work has established convergence results for algorithms that integrate both descent and negative curvature steps, there has not yet been extensive numerical evidence showing that such methods offer consistent performance improvements. In this paper, we present new frameworks for combining descent and negative curvature directions: alternating two-step approaches and dynamic step approaches. The aspect that distinguishes our approaches from ones previously proposed is that they make algorithmic decisions based on (estimated) upper-bounding models of the objective function. A consequence of this aspect is that our frameworks can, in theory, employ fixed stepsizes, which makes the methods readily translatable from deterministic to stochastic settings. For deterministic problems, we show that instances of our dynamic framework yield gains in performance compared to related methods that only follow descent steps. We also show that gains can be made in a stochastic setting in cases when a standard stochastic-gradient-type method might make slow progress.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Gradient-free methods for non-smooth convex stochastic optimization with heavy-tailed noise on convex compact

Article 28 August 2023

Nikita Kornilov, Alexander Gasnikov, … Darina Dvinskikh

Fastest rates for stochastic mirror descent methods

Article 09 June 2021

Filip Hanzely & Peter Richtárik

A Note on the Optimal Convergence Rate of Descent Methods with Fixed Step Sizes for Smooth Strongly Convex Functions

Article Open access 22 April 2022

André Uschmajew & Bart Vandereycken

Notes

https://www.tensorflow.org/.

References

Birgin, E.G., Martínez, J.M.: A Box-Constrained Optimization Algorithm with Negative Curvature Directions and Spectral Projected Gradients, pp. 49–60. Springer Vienna, Vienna (2001)
MATH Google Scholar
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization Methods for Large-Scale Machine Learning. arXiv:1606.04838 (2016)
Dauphin, Y., de Vries, H., Bengio, Y.: Equilibrated adaptive learning rates for non-convex optimization. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 1504–1512. Curran Associates Inc, Red Hook (2015)
Google Scholar
Dauphin, Y.N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: Advances in Neural Information Processing Systems, pp. 2933–2941 (2014)
Elhamifar, E., Vidal, R.: Sparse subspace clustering. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 2790–2797. IEEE (2009)
Forsgren, A., Gill, P.E., Murray, W.: Computing modified Newton directions using a partial Cholesky factorization. SIAM J. Sci. Comput. 16(1), 139–150 (1995)
Article MathSciNet MATH Google Scholar
Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points—online stochastic gradient for tensor decomposition. In: JMLR: Workshop and Conference Proceedings, New York, NY, USA. JMLR (2015)
Gill, P.E., Kungurtsev, V., Robinson, D.P.: A stabilized SQP method: global convergence. IMA J. Numer. Anal. 37(1), 407–443 (2017)
Gill, P.E., Kungurtsev, V., Robinson, D.P.: A stabilized SQP method: superlinear convergence. Math. Program. 163, 369–410 (2016)
Goldfarb, D.: Curvilinear path steplength algorithms for minimization which use directions of negative curvature. Math. Program. 18(1), 31–40 (1980)
Article MathSciNet MATH Google Scholar
Gould, N.I.M., Lucidi, S., Roma, M., Toint, P.H.L.: Exploiting negative curvature directions in linesearch methods for unconstrained optimization. Optim. Methods Softw. 14(1–2), 75–98 (2000)
Article MathSciNet MATH Google Scholar
Gould, N.I.M., Orban, D., Toint, P.L.: Cutest: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Comput. Optim. Appl. 60(3), 545–557 (2015)
Article MathSciNet MATH Google Scholar
Jiang, H., Robinson, D.P., Vidal, R.: A nonconvex formulation for low rank subspace clustering: algorithms and convergence analysis. In: Submitted to CVPR (2017)
Jin, C., Ge, R., Netrapalli, P., Kakade, S.M., Jordan, M.I.: How to escape saddle points efficiently. arXiv preprint arXiv:1703.00887 (2017)
Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. arXiv preprint arXiv:1609.04836 (2016)
Lanczos, C.: An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. United States Governm. Press Office Los Angeles, CA (1950)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1009)
Article Google Scholar
Lee, J.D., Panageas, I., Piliouras, G., Simchowitz, M., Jordan, M.I., Recht, B.: First-order methods almost always avoid saddle points. arXiv preprint arXiv:171007406 (2017)
Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only converges to minimizers. In: Feldman, V., Rakhlin, A., Shamir, O. (eds.) 29th Annual Conference on Learning Theory, Volume 49 of Proceedings of Machine Learning Research, PMLR, pp. 1246–1257, Columbia University, New York, New York, USA, 23–26 Jun (2016)
Liu, M., Yang, T.: On noisy negative curvature descent: competing with gradient descent for faster non-convex optimization. arXiv preprint arXiv:1709.08571 (2017)
Martens, J.: Deep learning via hessian-free optimization. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 735–742 (2010)
Mohy-ud-Din, H., Robinson, D.P.: A solver for nonconvex bound-constrained quadratic optimization. SIAM J. Optim. 25(4), 2385–2407 (2015)
Article MathSciNet MATH Google Scholar
Moré, Jorge J., Sorensen, Danny C.: On the use of directions of negative curvature in a modified newton method. Math. Program. 16(1), 1–20 (1979)
Article MathSciNet MATH Google Scholar
Neelakantan, A., Vilnis, L., Le, Q.V., Sutskever, I., Kaiser, L., Kurach, K., Martens, J.: Adding Gradient Noise Improves Learning for Very Deep Networks. arXiv:1511.06807 (2015)
Nocedal, J., Wright, S.J.: Numerical Optimization, vol. 2. Springer, New York (2006)
MATH Google Scholar
Paternain, S., Mokhtari, A., Ribeiro, A.: A Second Order Method for Nonconvex Optimization. arXiv:1707.08028 (2017)
Royer, C.W., Wright, S.J.: Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. arXiv preprint arXiv:1706.03131 (2017)

Download references

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA, USA
Frank E. Curtis
Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA
Daniel P. Robinson

Authors

Frank E. Curtis
View author publications
You can also search for this author in PubMed Google Scholar
Daniel P. Robinson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel P. Robinson.

Additional information

Frank E. Curtis was supported in part by the U.S. Department of Energy under Grant No. DE-SC0010615 and by the U.S. National Science Foundation under Grant No. CCF-1618717. Daniel P. Robinson was funded by the U.S. National Science Foundation under grant No. 1704458.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Curtis, F.E., Robinson, D.P. Exploiting negative curvature in deterministic and stochastic optimization. Math. Program. 176, 69–94 (2019). https://doi.org/10.1007/s10107-018-1335-8

Download citation

Received: 15 October 2017
Accepted: 21 September 2018
Published: 03 October 2018
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s10107-018-1335-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Exploiting negative curvature in deterministic and stochastic optimization

Abstract

Access this article

Similar content being viewed by others

Gradient-free methods for non-smooth convex stochastic optimization with heavy-tailed noise on convex compact

Fastest rates for stochastic mirror descent methods

A Note on the Optimal Convergence Rate of Descent Methods with Fixed Step Sizes for Smooth Strongly Convex Functions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Exploiting negative curvature in deterministic and stochastic optimization

Abstract

Access this article

Similar content being viewed by others

Gradient-free methods for non-smooth convex stochastic optimization with heavy-tailed noise on convex compact

Fastest rates for stochastic mirror descent methods

A Note on the Optimal Convergence Rate of Descent Methods with Fixed Step Sizes for Smooth Strongly Convex Functions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation