Advertisement

A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization

  • Clément W. Royer
  • Michael O’Neill
  • Stephen J. WrightEmail author
Full Length Paper Series A
  • 102 Downloads

Abstract

We consider minimization of a smooth nonconvex objective function using an iterative algorithm based on Newton’s method and the linear conjugate gradient algorithm, with explicit detection and use of negative curvature directions for the Hessian of the objective function. The algorithm tracks Newton-conjugate gradient procedures developed in the 1980s closely, but includes enhancements that allow worst-case complexity results to be proved for convergence to points that satisfy approximate first-order and second-order optimality conditions. The complexity results match the best known results in the literature for second-order methods.

Keywords

Smooth nonconvex optimization Newton’s method Conjugate gradient method Optimality conditions Worst-case complexity 

Mathematics Subject Classification

49M05 49M15 65F10 65F15 90C06 90C60 

Notes

Acknowledgements

We thank sincerely the associate editor and two referees, whose comments led us to improve the presentation and to derive stronger results.

Funding

Funding was provided by National Science Foundation (Grant Nos. 1447449, 1628384, 1634597, 1740707), Air Force Office of Scientific Research (Grant No. FA9550-13-1-0138) and Argonne National Laboratory (Grant Nos. 3F-30222, 8F-30039) and DARPA (Grant No. N660011824020).

Supplementary material

References

  1. 1.
    Agarwal, N., Allen-Zhu, Z., Bullins, B., Hazan, E., Ma, T.: Finding approximate local minima faster than gradient descent. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC 2017), PMLR (2017)Google Scholar
  2. 2.
    Allen-Zhu, Z., Li, Y.: NEON2: finding local minima via first-order oracles. In: Proceedings of the 32nd Conference on Neural Information Processing Systems (2018)Google Scholar
  3. 3.
    Birgin, E.G., Martínez, J.M.: The use of quadratic regularization with a cubic descent condition for unconstrained optimization. SIAM J. Optim. 27, 1049–1074 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends\(^{{\copyright }}\) Mach. Learn. 8, 231–357 (2015)Google Scholar
  5. 5.
    Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: “Convex until proven guilty”: dimension-free acceleration of gradient descent on non-convex functions. In: International Conference on Machine Learning, vol. 70, 6–11 August 2017, International Convention Centre, Sydney, Australia, PMLR, pp. 654–663 (2017)Google Scholar
  6. 6.
    Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Accelerated methods for non-convex optimization. SIAM J. Optim. 28, 1751–1772 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Cartis, C., Gould, N.I.M., Toint, P.L.: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization. SIAM J. Optim. 20, 2833–2852 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Cartis, C., Gould, N.I.M., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Math. Program. 127, 245–295 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Cartis, C., Gould, N.I.M., Toint, P.L.: Optimal Newton-type methods for nonconvex optimization. Technical Report naXys-17-2011, Department of Mathematics, FUNDP, Namur (B) (2011)Google Scholar
  10. 10.
    Cartis, C., Gould, N.I.M., Toint, P.L.: Complexity bounds for second-order optimality in unconstrained optimization. J. Complex. 28, 93–108 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Cartis, C., Gould, N.I.M., Toint, P.L.: Worst-case evaluation complexity and optimality of second-order methods for nonconvex smooth optimization. arXiv:1709.07180 (2017)
  12. 12.
    Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust-Region Methods. MPS-SIAM Series on Optimization. Society for Industrial and Applied Mathematics, Philadelphia (2000)CrossRefzbMATHGoogle Scholar
  13. 13.
    Curtis, F.E., Robinson, D.P., Samadi, M.: A trust region algorithm with a worst-case iteration complexity of \(\cal{O}\left(\epsilon ^{-3/2}\right)\) for nonconvex optimization. Math. Program. 162, 1–32 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Curtis, F.E., Robinson, D.P., Samadi, M.: An inexact regularized Newton framework with a worst-case iteration complexity of \(\cal{O}(\epsilon ^{-3/2})\) for nonconvex optimization. IMA J. Numer. Anal. (2018).  https://doi.org/10.1093/imanum/dry022
  15. 15.
    Dembo, R.S., Steihaug, T.: Truncated-Newton algorithms for large-scale unconstrained optimization. Math. Program. 26, 190–212 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Fasano, G., Lucidi, S.: A nonmonotone truncated Newton–Krylov method exploiting negative curvature directions, for large-scale unconstrained optimization. Optim. Lett. 3, 521–535 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Gould, N.I.M., Lucidi, S., Roma, M., Toint, P.L.: Exploiting negative curvature directions in linesearch methods for unconstrained optimization. Optim. Methods Softw. 14, 75–98 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Griewank, A., Walther, A.: Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Frontiers in Applied Mathematics, 2nd edn. SIAM, Philadelphia (2008)CrossRefzbMATHGoogle Scholar
  19. 19.
    Jin, C., Netrapalli, P., Jordan, M.I.: Accelerated gradient descent escapes saddle points faster than gradient descent. In: Proceedings of the 31st Conference on Learning Theory, PMLR, pp. 1042–1085 (2018)Google Scholar
  20. 20.
    Karimi, S., Vavasis, S.A.: A unified convergence bound for conjugate gradient and accelerated gradient. arXiv:1605.00320 (2016)
  21. 21.
    Karimi, S., Vavasis, S.A.: A single potential governing convergence of conjugate gradient, accelerated gradient and geometric descent. arXiv:1712.09498 (2017)
  22. 22.
    Kuczyński, J., Woźniakowski, H.: Estimating the largest eigenvalue by the power and Lanczos algorithms with a random start. SIAM J. Matrix Anal. Appl. 13, 1094–1122 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Martínez, J.M., Raydan, M.: Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization. J. Glob. Optim. 68, 367–385 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108, 177–205 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2006)zbMATHGoogle Scholar
  26. 26.
    Royer, C.W., Wright, S.J.: Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. SIAM J. Optim. 28, 1448–1477 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Steihaug, T.: The conjugate gradient method and trust regions in large scale optimization. SIAM J. Numer. Anal. 20, 626–637 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Xu, Y., Jin, R., Yang, T.: First-order stochastic algorithms for escaping from saddle points in almost linear time. In: Proceedings of the 32nd Conference on Neural Information Processing Systems (2018)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2019

Authors and Affiliations

  1. 1.Wisconsin Institute of DiscoveryUniversity of WisconsinMadisonUSA
  2. 2.Computer Sciences DepartmentUniversity of WisconsinMadisonUSA

Personalised recommendations