Abstract
This paper addresses the question of whether it can be beneficial for an optimization algorithm to follow directions of negative curvature. Although prior work has established convergence results for algorithms that integrate both descent and negative curvature steps, there has not yet been extensive numerical evidence showing that such methods offer consistent performance improvements. In this paper, we present new frameworks for combining descent and negative curvature directions: alternating two-step approaches and dynamic step approaches. The aspect that distinguishes our approaches from ones previously proposed is that they make algorithmic decisions based on (estimated) upper-bounding models of the objective function. A consequence of this aspect is that our frameworks can, in theory, employ fixed stepsizes, which makes the methods readily translatable from deterministic to stochastic settings. For deterministic problems, we show that instances of our dynamic framework yield gains in performance compared to related methods that only follow descent steps. We also show that gains can be made in a stochastic setting in cases when a standard stochastic-gradient-type method might make slow progress.
Similar content being viewed by others
References
Birgin, E.G., Martínez, J.M.: A Box-Constrained Optimization Algorithm with Negative Curvature Directions and Spectral Projected Gradients, pp. 49–60. Springer Vienna, Vienna (2001)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization Methods for Large-Scale Machine Learning. arXiv:1606.04838 (2016)
Dauphin, Y., de Vries, H., Bengio, Y.: Equilibrated adaptive learning rates for non-convex optimization. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 1504–1512. Curran Associates Inc, Red Hook (2015)
Dauphin, Y.N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: Advances in Neural Information Processing Systems, pp. 2933–2941 (2014)
Elhamifar, E., Vidal, R.: Sparse subspace clustering. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 2790–2797. IEEE (2009)
Forsgren, A., Gill, P.E., Murray, W.: Computing modified Newton directions using a partial Cholesky factorization. SIAM J. Sci. Comput. 16(1), 139–150 (1995)
Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points—online stochastic gradient for tensor decomposition. In: JMLR: Workshop and Conference Proceedings, New York, NY, USA. JMLR (2015)
Gill, P.E., Kungurtsev, V., Robinson, D.P.: A stabilized SQP method: global convergence. IMA J. Numer. Anal. 37(1), 407–443 (2017)
Gill, P.E., Kungurtsev, V., Robinson, D.P.: A stabilized SQP method: superlinear convergence. Math. Program. 163, 369–410 (2016)
Goldfarb, D.: Curvilinear path steplength algorithms for minimization which use directions of negative curvature. Math. Program. 18(1), 31–40 (1980)
Gould, N.I.M., Lucidi, S., Roma, M., Toint, P.H.L.: Exploiting negative curvature directions in linesearch methods for unconstrained optimization. Optim. Methods Softw. 14(1–2), 75–98 (2000)
Gould, N.I.M., Orban, D., Toint, P.L.: Cutest: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Comput. Optim. Appl. 60(3), 545–557 (2015)
Jiang, H., Robinson, D.P., Vidal, R.: A nonconvex formulation for low rank subspace clustering: algorithms and convergence analysis. In: Submitted to CVPR (2017)
Jin, C., Ge, R., Netrapalli, P., Kakade, S.M., Jordan, M.I.: How to escape saddle points efficiently. arXiv preprint arXiv:1703.00887 (2017)
Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. arXiv preprint arXiv:1609.04836 (2016)
Lanczos, C.: An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. United States Governm. Press Office Los Angeles, CA (1950)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1009)
Lee, J.D., Panageas, I., Piliouras, G., Simchowitz, M., Jordan, M.I., Recht, B.: First-order methods almost always avoid saddle points. arXiv preprint arXiv:171007406 (2017)
Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only converges to minimizers. In: Feldman, V., Rakhlin, A., Shamir, O. (eds.) 29th Annual Conference on Learning Theory, Volume 49 of Proceedings of Machine Learning Research, PMLR, pp. 1246–1257, Columbia University, New York, New York, USA, 23–26 Jun (2016)
Liu, M., Yang, T.: On noisy negative curvature descent: competing with gradient descent for faster non-convex optimization. arXiv preprint arXiv:1709.08571 (2017)
Martens, J.: Deep learning via hessian-free optimization. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 735–742 (2010)
Mohy-ud-Din, H., Robinson, D.P.: A solver for nonconvex bound-constrained quadratic optimization. SIAM J. Optim. 25(4), 2385–2407 (2015)
Moré, Jorge J., Sorensen, Danny C.: On the use of directions of negative curvature in a modified newton method. Math. Program. 16(1), 1–20 (1979)
Neelakantan, A., Vilnis, L., Le, Q.V., Sutskever, I., Kaiser, L., Kurach, K., Martens, J.: Adding Gradient Noise Improves Learning for Very Deep Networks. arXiv:1511.06807 (2015)
Nocedal, J., Wright, S.J.: Numerical Optimization, vol. 2. Springer, New York (2006)
Paternain, S., Mokhtari, A., Ribeiro, A.: A Second Order Method for Nonconvex Optimization. arXiv:1707.08028 (2017)
Royer, C.W., Wright, S.J.: Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. arXiv preprint arXiv:1706.03131 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Frank E. Curtis was supported in part by the U.S. Department of Energy under Grant No. DE-SC0010615 and by the U.S. National Science Foundation under Grant No. CCF-1618717. Daniel P. Robinson was funded by the U.S. National Science Foundation under grant No. 1704458.
Rights and permissions
About this article
Cite this article
Curtis, F.E., Robinson, D.P. Exploiting negative curvature in deterministic and stochastic optimization. Math. Program. 176, 69–94 (2019). https://doi.org/10.1007/s10107-018-1335-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-018-1335-8
Keywords
- Nonconvex optimization
- Second-order methods
- Modified Newton methods
- Negative curvature
- Stochastic optimization
- Machine learning