Skip to main content
Log in

Exploiting negative curvature in deterministic and stochastic optimization

  • Full Length Paper
  • Series B
  • Published:
Mathematical Programming Submit manuscript

Abstract

This paper addresses the question of whether it can be beneficial for an optimization algorithm to follow directions of negative curvature. Although prior work has established convergence results for algorithms that integrate both descent and negative curvature steps, there has not yet been extensive numerical evidence showing that such methods offer consistent performance improvements. In this paper, we present new frameworks for combining descent and negative curvature directions: alternating two-step approaches and dynamic step approaches. The aspect that distinguishes our approaches from ones previously proposed is that they make algorithmic decisions based on (estimated) upper-bounding models of the objective function. A consequence of this aspect is that our frameworks can, in theory, employ fixed stepsizes, which makes the methods readily translatable from deterministic to stochastic settings. For deterministic problems, we show that instances of our dynamic framework yield gains in performance compared to related methods that only follow descent steps. We also show that gains can be made in a stochastic setting in cases when a standard stochastic-gradient-type method might make slow progress.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://www.tensorflow.org/.

References

  1. Birgin, E.G., Martínez, J.M.: A Box-Constrained Optimization Algorithm with Negative Curvature Directions and Spectral Projected Gradients, pp. 49–60. Springer Vienna, Vienna (2001)

    MATH  Google Scholar 

  2. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization Methods for Large-Scale Machine Learning. arXiv:1606.04838 (2016)

  3. Dauphin, Y., de Vries, H., Bengio, Y.: Equilibrated adaptive learning rates for non-convex optimization. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 1504–1512. Curran Associates Inc, Red Hook (2015)

    Google Scholar 

  4. Dauphin, Y.N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: Advances in Neural Information Processing Systems, pp. 2933–2941 (2014)

  5. Elhamifar, E., Vidal, R.: Sparse subspace clustering. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 2790–2797. IEEE (2009)

  6. Forsgren, A., Gill, P.E., Murray, W.: Computing modified Newton directions using a partial Cholesky factorization. SIAM J. Sci. Comput. 16(1), 139–150 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  7. Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points—online stochastic gradient for tensor decomposition. In: JMLR: Workshop and Conference Proceedings, New York, NY, USA. JMLR (2015)

  8. Gill, P.E., Kungurtsev, V., Robinson, D.P.: A stabilized SQP method: global convergence. IMA J. Numer. Anal. 37(1), 407–443 (2017)

  9. Gill, P.E., Kungurtsev, V., Robinson, D.P.: A stabilized SQP method: superlinear convergence. Math. Program. 163, 369–410 (2016)

  10. Goldfarb, D.: Curvilinear path steplength algorithms for minimization which use directions of negative curvature. Math. Program. 18(1), 31–40 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  11. Gould, N.I.M., Lucidi, S., Roma, M., Toint, P.H.L.: Exploiting negative curvature directions in linesearch methods for unconstrained optimization. Optim. Methods Softw. 14(1–2), 75–98 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  12. Gould, N.I.M., Orban, D., Toint, P.L.: Cutest: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Comput. Optim. Appl. 60(3), 545–557 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  13. Jiang, H., Robinson, D.P., Vidal, R.: A nonconvex formulation for low rank subspace clustering: algorithms and convergence analysis. In: Submitted to CVPR (2017)

  14. Jin, C., Ge, R., Netrapalli, P., Kakade, S.M., Jordan, M.I.: How to escape saddle points efficiently. arXiv preprint arXiv:1703.00887 (2017)

  15. Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. arXiv preprint arXiv:1609.04836 (2016)

  16. Lanczos, C.: An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. United States Governm. Press Office Los Angeles, CA (1950)

  17. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1009)

    Article  Google Scholar 

  18. Lee, J.D., Panageas, I., Piliouras, G., Simchowitz, M., Jordan, M.I., Recht, B.: First-order methods almost always avoid saddle points. arXiv preprint arXiv:171007406 (2017)

  19. Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only converges to minimizers. In: Feldman, V., Rakhlin, A., Shamir, O. (eds.) 29th Annual Conference on Learning Theory, Volume 49 of Proceedings of Machine Learning Research, PMLR, pp. 1246–1257, Columbia University, New York, New York, USA, 23–26 Jun (2016)

  20. Liu, M., Yang, T.: On noisy negative curvature descent: competing with gradient descent for faster non-convex optimization. arXiv preprint arXiv:1709.08571 (2017)

  21. Martens, J.: Deep learning via hessian-free optimization. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 735–742 (2010)

  22. Mohy-ud-Din, H., Robinson, D.P.: A solver for nonconvex bound-constrained quadratic optimization. SIAM J. Optim. 25(4), 2385–2407 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  23. Moré, Jorge J., Sorensen, Danny C.: On the use of directions of negative curvature in a modified newton method. Math. Program. 16(1), 1–20 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  24. Neelakantan, A., Vilnis, L., Le, Q.V., Sutskever, I., Kaiser, L., Kurach, K., Martens, J.: Adding Gradient Noise Improves Learning for Very Deep Networks. arXiv:1511.06807 (2015)

  25. Nocedal, J., Wright, S.J.: Numerical Optimization, vol. 2. Springer, New York (2006)

    MATH  Google Scholar 

  26. Paternain, S., Mokhtari, A., Ribeiro, A.: A Second Order Method for Nonconvex Optimization. arXiv:1707.08028 (2017)

  27. Royer, C.W., Wright, S.J.: Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. arXiv preprint arXiv:1706.03131 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel P. Robinson.

Additional information

Frank E. Curtis was supported in part by the U.S. Department of Energy under Grant No. DE-SC0010615 and by the U.S. National Science Foundation under Grant No. CCF-1618717. Daniel P. Robinson was funded by the U.S. National Science Foundation under grant No. 1704458.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Curtis, F.E., Robinson, D.P. Exploiting negative curvature in deterministic and stochastic optimization. Math. Program. 176, 69–94 (2019). https://doi.org/10.1007/s10107-018-1335-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-018-1335-8

Keywords

Mathematics Subject Classification

Navigation