Efficient first-order methods for convex minimization: a constructive approach

  • Yoel Drori
  • Adrien B. TaylorEmail author
Full Length Paper Series A


We describe a novel constructive technique for devising efficient first-order methods for a wide range of large-scale convex minimization settings, including smooth, non-smooth, and strongly convex minimization. The technique builds upon a certain variant of the conjugate gradient method to construct a family of methods such that (a) all methods in the family share the same worst-case guarantee as the base conjugate gradient method, and (b) the family includes a fixed-step first-order method. We demonstrate the effectiveness of the approach by deriving optimal methods for the smooth and non-smooth cases, including new methods that forego knowledge of the problem parameters at the cost of a one-dimensional line search per iteration, and a universal method for the union of these classes that requires a three-dimensional search per iteration. In the strongly convex case, we show how numerical tools can be used to perform the construction, and show that the resulting method offers an improved worst-case bound compared to Nesterov’s celebrated fast gradient method.

Mathematics Subject Classification

90C60 90C25 90C22 68Q25 



  1. 1.
    Arjevani, Y., Shalev-Shwartz, S., Shamir, O.: On lower and upper bounds in smooth and strongly convex optimization. J. Mach. Learn. Res. 17(126), 1–51 (2016)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Beck, A.: Quadratic matrix programming. SIAM J. Optim. 17(4), 1224–1238 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Beck, A., Drori, Y., Teboulle, M.: A new semidefinite programming relaxation scheme for a class of quadratic matrix problems. Oper. Res. Lett. 40(4), 298–302 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Bubeck, S., Lee, Y.T., Singh, M.: A geometric alternative to Nesterov’s accelerated gradient descent (2015). arXiv preprint arXiv:1506.08187
  6. 6.
    De Klerk, E., Glineur, F., Taylor, A.B.: On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optim. Lett. 11(7), 1185–1199 (2017)MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems (NIPS), pp. 1646–1654 (2014)Google Scholar
  8. 8.
    Devolder, O., Glineur, F., Nesterov, Y.: Intermediate gradient methods for smooth convex problems with inexact oracle. Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), Technical report (2013)Google Scholar
  9. 9.
    Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1–2), 37–75 (2014)MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Diehl, M., Ferreau, H.J., Haverbeke, N.: Efficient numerical methods for nonlinear MPC and moving horizon estimation. Nonlinear Model Predict. Control 384, 391–417 (2009)zbMATHCrossRefGoogle Scholar
  11. 11.
    Drori, Y.: Contributions to the complexity analysis of optimization algorithms. Ph.D. thesis, Tel-Aviv University (2014)Google Scholar
  12. 12.
    Drori, Y.: The exact information-based complexity of smooth convex minimization. J. Complex. 39, 1–16 (2017)MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program. 145(1–2), 451–482 (2014)MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Drori, Y., Teboulle, M.: An optimal variant of Kelley’s cutting-plane method. Math. Program. 160(1–2), 321–351 (2016)MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Drusvyatskiy, D., Fazel, M., Roy, S.: An optimal first order method based on optimal quadratic averaging. SIAM J. Optim. 28(1), 251–271 (2018)MathSciNetzbMATHCrossRefGoogle Scholar
  16. 16.
    Fazlyab, M., Ribeiro, A., Morari, M., Preciado, V.M.: Analysis of optimization algorithms via integral quadratic constraints: nonstrongly convex problems. SIAM J. Optim. 28(3), 2654–2689 (2018)MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming. version 2.0 beta. (2013)
  18. 18.
    Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bureau Stand. 49(6), 409–436 (1952)MathSciNetzbMATHCrossRefGoogle Scholar
  19. 19.
    Hu, B., Lessard, L.: Dissipativity theory for Nesterov’s accelerated method. In: International Conference on Machine Learning (ICML), pp. 1549–1557 (2017)Google Scholar
  20. 20.
    Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems (NIPS), pp. 315–323 (2013)Google Scholar
  21. 21.
    Karimi, S., Vavasis, S.A.: A unified convergence bound for conjugate gradient and accelerated gradient. (2016). arXiv preprint arXiv:1605.00320
  22. 22.
    Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Program. 159(1–2), 81–107 (2016)MathSciNetzbMATHCrossRefGoogle Scholar
  23. 23.
    Kim, D., Fessler, J.A.: On the convergence analysis of the optimized gradient method. J. Optim. Theory Appl. 172(1), 187–205 (2017)MathSciNetzbMATHCrossRefGoogle Scholar
  24. 24.
    Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: Advances in Neural Information Processing Systems (NIPS), pp. 2663–2671 (2012)Google Scholar
  25. 25.
    Lemaréchal, C., Sagastizábal, C.: Variable metric bundle methods: from conceptual to implementable forms. Math. Program. 76(3), 393–410 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  26. 26.
    Lessard, L., Recht, B., Packard, A.: Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optim. 26(1), 57–95 (2016)MathSciNetzbMATHCrossRefGoogle Scholar
  27. 27.
    Löfberg, J.: YALMIP: a toolbox for modeling and optimization in MATLAB. In: Proceedings of the CACSD Conference (2004)Google Scholar
  28. 28.
    Mosek, A.: The MOSEK Optimization Software, vol. 54 (2010).
  29. 29.
    Narkiss, G., Zibulevsky, M.: Sequential subspace optimization method for large-scale unconstrained problems. In: Technion-IIT, Department of Electrical Engineering (2005)Google Scholar
  30. 30.
    Nemirovski, A.: Orth-method for smooth convex optimization. Izvestia AN SSSR 2, 937–947 (1982). (in Russian)Google Scholar
  31. 31.
    Nemirovski, A.: Information-based complexity of linear operator equations. J. Complex. 8(2), 153–175 (1992)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Nemirovski, A.: Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)MathSciNetzbMATHCrossRefGoogle Scholar
  33. 33.
    Nemirovski, A., Yudin, D.: Information-based complexity of mathematical programming. Izvestia AN SSSR, Ser. Tekhnicheskaya Kibernetika 1 (1983) (in Russian) Google Scholar
  34. 34.
    Nemirovski, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Willey-Interscience, New York (1983)Google Scholar
  35. 35.
    Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(\(1/k^2\))). Soviet Mathematics Doklady 27, 372–376 (1983)zbMATHGoogle Scholar
  36. 36.
    Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, London (2004)zbMATHCrossRefGoogle Scholar
  37. 37.
    Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  38. 38.
    Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)MathSciNetzbMATHCrossRefGoogle Scholar
  39. 39.
    Nesterov, Y., Shikhman, V.: Quasi-monotone subgradient methods for nonsmooth convex minimization. J. Optim. Theory Appl. 165(3), 917–940 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
  40. 40.
    Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)CrossRefGoogle Scholar
  41. 41.
    Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)zbMATHGoogle Scholar
  42. 42.
    Ruszczyński, A.P.: Nonlinear Optimization, vol. 13. Princeton University Press, Princeton (2006)zbMATHCrossRefGoogle Scholar
  43. 43.
    Ryu, E.K., Taylor, A.B., Bergeling, C., Giselsson, P.: Operator splitting performance estimation: tight contraction factors and optimal parameter selection (2018). arXiv preprint arXiv:1812.00146
  44. 44.
    Schmidt, M., Le Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in Neural Information Processing Systems (NIPS), pp. 1458–1466 (2011)Google Scholar
  45. 45.
    Scieur, D., Roulet, V., Bach, F., d’Aspremont, A.: Integration methods and optimization algorithms. In: Advances in Neural Information Processing Systems (NIPS), pp. 1109–1118 (2017)Google Scholar
  46. 46.
    Su, W., Boyd, S., Candes, E.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. In: Advances in Neural Information Processing Systems (NIPS), pp. 2510–2518 (2014)Google Scholar
  47. 47.
    Taylor, A.: Convex interpolation and performance estimation of first-order methods for convex optimization. Ph.D. thesis, Université catholique de Louvain (2017)Google Scholar
  48. 48.
    Taylor, A.B., Hendrickx, J.M., Glineur, F.: Exact worst-case performance of first-order methods for composite convex optimization. SIAM J. Optim. 27(3), 1283–1313 (2017)MathSciNetzbMATHCrossRefGoogle Scholar
  49. 49.
    Taylor, A.B., Hendrickx, J.M., Glineur, F.: Performance estimation toolbox (PESTO): automated worst-case analysis of first-order optimization methods. In: IEEE 56th Annual Conference on Decision and Control (CDC), pp. 1278–1283 (2017)Google Scholar
  50. 50.
    Taylor, A.B., Hendrickx, J.M., Glineur, F.: Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program. 161(1–2), 307–345 (2017)MathSciNetzbMATHCrossRefGoogle Scholar
  51. 51.
    Taylor, A.B., Hendrickx, J.M., Glineur, F.: Exact worst-case convergence rates of the proximal gradient method for composite convex minimization. J. Optim. Theory Appl. 178(2), 455–476 (2018)MathSciNetzbMATHCrossRefGoogle Scholar
  52. 52.
    Van Scoy, B., Freeman, R.A., Lynch, K.M.: The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Control Syst. Lett. 2(1), 49–54 (2018)CrossRefGoogle Scholar
  53. 53.
    Wilson, A.C., Recht, B., Jordan, M.I.: A Lyapunov analysis of momentum methods in optimization. (2016). arXiv preprint arXiv:1611.02635
  54. 54.
    Wright, S.: Coordinate descent algorithms. Math. Program. 151(1), 3–34 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
  55. 55.
    Wright, S., Nocedal, J.: Numerical optimization. Science 35, 67–68 (1999)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2019

Authors and Affiliations

  1. 1.Google LLCMountain ViewUSA
  2. 2.INRIA, Département d’Informatique de l’ENSÉcole Normale Supérieure, CNRS, PSL Research UniversityParisFrance

Personalised recommendations