Advertisement

New analysis of linear convergence of gradient-type methods via unifying error bound conditions

  • Hui ZhangEmail author
Full Length Paper Series A
  • 109 Downloads

Abstract

This paper reveals that a common and central role, played in many error bound (EB) conditions and a variety of gradient-type methods, is a residual measure operator. On one hand, by linking this operator with other optimality measures, we define a group of abstract EB conditions, and then analyze the interplay between them; on the other hand, by using this operator as an ascent direction, we propose an abstract gradient-type method, and then derive EB conditions that are necessary and sufficient for its linear convergence. The former provides a unified framework that not only allows us to find new connections between many existing EB conditions, but also paves a way to construct new ones. The latter allows us to claim the weakest conditions guaranteeing linear convergence for a number of fundamental algorithms, including the gradient method, the proximal point algorithm, and the forward–backward splitting algorithm. In addition, we show linear convergence for the proximal alternating linearized minimization algorithm under a group of equivalent EB conditions, which are strictly weaker than the traditional strongly convex condition. Moreover, by defining a new EB condition, we show Q-linear convergence of Nesterov’s accelerated forward–backward algorithm without strong convexity. Finally, we verify EB conditions for a class of dual objective functions.

Keywords

Residual measure operator Error bound Gradient descent Linear convergence Proximal point algorithm Forward–backward splitting algorithm Proximal alternating linearized minimization Nesterov’s acceleration Dual objective function 

Mathematics Subject Classification

90C25 90C60 65K10 49M29 

Notes

Acknowledgements

I am grateful to the anonymous referees, the associate editor, and the coeditor Prof. Adrian S. Lewis for many useful comments, which allowed me to significantly improve the original presentation. I would like to thank Prof. Zaiwen Wen for his invitation and hospitality during my visit to BeiJing International Center for Mathematical Research, and to thank Prof. Dmitriy Drusvyatskiy for a careful reading of an early draft of this manuscript, and for valuable comments and suggestions. I also thank Profs. Chao Ding, Bin Dong, Lei Guo, Yongjin Liu, Deren Han, Mark Schmidt, Anthony Man-Cho So, and Wotao Yin for their time and many helpful discussions with me. Further thanks due to my cousin Boya Ouyang who helped me with my English writing, and to PhD students Ke Guo, Wei Peng, Ziyang Yuan, Xiaoya Zhang, who looked over the manuscript and corrected several typos. While visiting Chinese Academy of Sciences, I was particularly fortunate to be acquainted with Prof. Florian Jarre, who carefully read and polished this paper. This work is supported by the National Science Foundation of China (Nos. 11501569 and 61571008).

References

  1. 1.
    Artacho, F.J.A., Geoffroy, M.H.: Metric subregularity of the convex subdifferential in Banach spaces. J. Nonlinear Convex A. 15(15), 35–47 (2015)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. Ser. B 116, 5–16 (2009)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. Ser. A 137(1–2), 91–129 (2013)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Attouch, H., Cabot, A.: Convergence rates of inertial forward–backward algorithms. SIAM J. Optim. 28(1), 849–874 (2018)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward–backard method is actually \(o(k^{-2})\). SIAM J. Optim. 26(3), 1824–1834 (2016)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Banjac, G., Margellos, K., Goulart, P.J.: On the convergence of a regularized Jacobi algorithm for convex optimization. IEEE Trans. Autom. Control 63(4), 1113–1119 (2018)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)zbMATHGoogle Scholar
  8. 8.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 17(4), 183–202 (2009)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Bertsekas, D.P.: Convex Optimization Theory. Athena Scientific and Tsinghua University Press, Belmont and Beijing (2011)zbMATHGoogle Scholar
  10. 10.
    Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. Ser. A 165(2), 471–507 (2017)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. Ser. A 146(1–2), 459–494 (2014)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Brézis, H.: Opérateurs maximaux monotones et semi-groupes de contractions dans les espaces de Hilbert. North-Holland Mathematics Studies, vol. 5. North-Holland Publishing Co., Amsterdam-London (1973)Google Scholar
  13. 13.
    Bruck, R.E.: Asymptotic convergence of nonlinear contraction semigroups in Hilbert space. J. Funct. Anal. 18(1), 15–26 (1975)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Candés, E.J., Li, X., Soltanolkotabi, M.: Phase retrieval via wirtinger flow: theory and algorithms. IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Cruz, J., Li, G., Nghia, T.: On the Q-linear convergence of forward–backward splitting method and uniqueness of optimal solution to Lasso. Researchgate, November. arXiv:1806.06333v1 [math.OC] (2018)
  17. 17.
    Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings. Springer, Berlin (2009)zbMATHGoogle Scholar
  18. 18.
    Drori, Y., Teboulle, M.: Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program. Ser. A 145(1–2), 451–482 (2014)zbMATHGoogle Scholar
  19. 19.
    Drusvyatskiy, D., Kempton, C.: An accelerated algorithm for minimizaing convex compositions. arXiv:1605.00125v1 [math.OC] 30 Apr 2016
  20. 20.
    Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. Published online: 15 Mar (2018)Google Scholar
  21. 21.
    Drusvyatskiy, D., Mordukhovich, B.S., Nghia, T.T.A.: Second-order growth, tilt stability, and metric regularity of the subdifferential. J. Convex Anal. 21(4), 1165–1192 (2014)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Garrigos, G., Rosasco, L., Villa, S.: Convergence of the forward–backward algorithm: beyond the worst-case with the help of geometry. arXiv:1703.09477v2 [math.OC] 29 Mar 2017
  23. 23.
    Gong, P., Ye, J.: Linear convergence of variance-reduced projected stochastic gradient without strong convexity. arXiv:1406.1102v2 [cs.NA] 10 July 2015
  24. 24.
    Hale, E.T., Yin, W.T., Zhang, Y.: Fixed point continuation for \(\ell _1\)-minimization: methodology and convergence. SIAM J. Optim. 19(3), 1107–1130 (2008)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Hoffman, A.: On approximate solutions of systems of linear inequalities. J. Res. Natl. Bur. Stand. 49, 263–265 (1952)MathSciNetGoogle Scholar
  26. 26.
    Hong, M., Wang, X., Razaviyayn, M., Luo, Z.-Q.: Iteration complexity analysis of block coordinate descent methods. Math. Program. Ser. A 163, 85–114 (2017)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of proximal-gradient methods under the Polyak–Łojasiewicz condition. arXiv:1608.04636v1 [cs.LG] 16 Aug 2016 (2016)
  28. 28.
    Karimi H., Nutini J., Schmidt M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In: Frasconi P., Landwehr N., Manco G., Vreeken J. (eds.) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2016. Lecture Notes in Computer Science, vol 9851. Springer, Cham (2016)Google Scholar
  29. 29.
    Karimi, S., Vavasis, S.: A unified convergence bound for conjugate gradient and accelerated gradient. arXiv:1605.003200v1 [math.OC] 11 May 2016
  30. 30.
    Kim, D., Fessler, J.: Optimized first-order methods for smooth convex minimization. Math. Program. Ser. A 159(1), 81–107 (2016)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Lai, M.J., Yin, W.T.: Augmented \(\ell _1\) and nuclear-norm models with a globally linearly convergent algorithm. SIAM J. Imaging Sci. 6(2), 1059–1091 (2013)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Lojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 1–34. Published online: 10 Aug (2017)Google Scholar
  33. 33.
    Li, X., Zhao, T., Arora, R., Liu, H., Hong, M.: An improved convergence analysis of cyclic block coordinate descent-type methods for strongly convex minimization. arXiv:1607.02793v1 [math.OC] 10 July 2016
  34. 34.
    Liu, H., Yue, M.-C., So, A.M.-C.: On the estimation performance and convergence rate of the generalized power method for phase synchronization. SIAM J. Optim. 27(4), 2426–2446 (2017)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Liu, J., Wright, S.J.: Asynchronous stochastic coordinate descent: parallelism and convergence properties. SIAM J. Optim. 25(1), 351–376 (2015)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. Colloques internationaux du CNRS 117, 87–89 (1963)zbMATHGoogle Scholar
  37. 37.
    Luke, D.R., Thao, N.H., Teboulle, M.: Necessary conditions for linear convergence of picard iterations and application to alternating projections. arXiv:1704.08926v1 [math.OC] 28 Apr 2017
  38. 38.
    Luo, Z.Q., Tseng, P.: On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control Optim. 30(2), 408–425 (1992)MathSciNetzbMATHGoogle Scholar
  39. 39.
    Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46–47(1), 157–178 (1993)MathSciNetzbMATHGoogle Scholar
  40. 40.
    Luque, F.: Asymptotic convergence analysis of the proximal point algorithm. SIAM J. Optim. 22(2), 277–293 (1984)MathSciNetzbMATHGoogle Scholar
  41. 41.
    Ma, C.X., Gudapati, N.V., Jahani, M., Tappenden, R., Takác̆, M.: Underestimate sequences via quadratic averaging. arXiv:1710.03695 (2017)
  42. 42.
    Necoara, I., Nesterov, Y., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. Math. Program. Ser. A. Published online: 22 Jan (2018)Google Scholar
  43. 43.
    Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Dordrecht (2004)zbMATHGoogle Scholar
  44. 44.
    Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (1997)zbMATHGoogle Scholar
  45. 45.
    Polyak, B.T.: Gradient methods for the minimisation of functionals. USSR Comput. Math. Math. Phys. 3(4), 864–878 (1963)zbMATHGoogle Scholar
  46. 46.
    Robinson, S.M.: Some continuity properties of polyhedral multifunctions. Math. Program. Study 14, 206–214 (1981)MathSciNetzbMATHGoogle Scholar
  47. 47.
    Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)zbMATHGoogle Scholar
  48. 48.
    Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin (1998)zbMATHGoogle Scholar
  49. 49.
    Ruszczyński, A.P.: Nonlinear Optimization, vol. 13. Princeton University Press, Princeton (2006)zbMATHGoogle Scholar
  50. 50.
    Schöpfer, F.: Linear convergence of descent methods for the unconstrained minimization of restricted strongly convex functions. SIAM J. Optim. 26(3), 1883–1911 (2016)MathSciNetzbMATHGoogle Scholar
  51. 51.
    Shefi, R., Teboulle, M.: On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems. EURO J. Comput. Optim. 4, 27–46 (2016)MathSciNetzbMATHGoogle Scholar
  52. 52.
    So, M.C.: Non-asymptotic convergence analysis of inexact gradient methods for machine learning without strong convexity. arXiv:1309.0113v1 [math.OC] 31 Aug 2013
  53. 53.
    Su, W., Boyd, S., Candés, E.J.: A differential equation for modeling Nesterovs accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17, 1–13 (2016)MathSciNetzbMATHGoogle Scholar
  54. 54.
    Taylor, A.B., Hendrickx, J.M., Glineur, F.: Exact worst-case convergence rates of the proximal gradient method for composite convex minimization. J Optim. Theory Appl. 178(2), 455–476 (2018)MathSciNetzbMATHGoogle Scholar
  55. 55.
    Taylor, A.B., Hendrickx, J.M., Glineur, F.: Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program. Ser. A 161(1–2), 307–345 (2017)MathSciNetzbMATHGoogle Scholar
  56. 56.
    Tu, S., Boczar, R., Simchowitz, M., Recht, B.: Low-rank solutions of linear matrix equations via procrustes flow. arXiv:1507.03566v2 [math.OC] 5 Feb 2016
  57. 57.
    Wang, P.W., Lin, C.J.: Iteration complexity of feasible descent methods for convex optimization. J. Mach. Learn. Res. 15(2), 1523–1548 (2014)MathSciNetzbMATHGoogle Scholar
  58. 58.
    Wilson, A.C., Recht, B., Jordan, M.I.: A Lyapunov analysis of momentum methods in optimization. arXiv:1611.02635v1 [math.OC] 8 Nov 2016
  59. 59.
    Xiao, L., Zhang, T.: A proximal-gradient homotopy method for the sparse least-squares problem. SIAM J. Optim. 23(2), 1062–1091 (2013)MathSciNetzbMATHGoogle Scholar
  60. 60.
    Yang, L., Arora, R., Braverman, V., Zhao, T.: The physical systems behind optimization algorithms. arXiv:1612.02803v1 [cs.LG] 8 Dec 2016
  61. 61.
    Yun, C., Sra, S., Jadbabaie, A.: Global optimality conditions for deep neural networks. arXiv:1707.02444v1 [cs.LG] 8 July 2017
  62. 62.
    Zhang, H.: New analysis of linear convergence of gradient-type methods via unifying error bound conditions. arXiv:1606.00269v3 [math.OC] 17 Aug 2016
  63. 63.
    Zhang, H.: The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth. Optim. Lett. 11(4), 817–833 (2017)MathSciNetzbMATHGoogle Scholar
  64. 64.
    Zhang, H., Cheng, L.: Restricted strong convexity and its applications to convergence analysis of gradient-type methods. Optim. Lett. 9(5), 961–979 (2015)MathSciNetzbMATHGoogle Scholar
  65. 65.
    Zhang, H., Cheng, L.: Projected shrinkage algorithm for box-constrained \(\ell _1\)-minimization. Optim. Lett. 11(1), 55–70 (2017)MathSciNetzbMATHGoogle Scholar
  66. 66.
    Zhang, H., Cheng, L., Yin, W.T.: A dual algorithm for a class of augmented convex signal recovery models. Commun. Math. Sci. 13(1), 103–112 (2015)MathSciNetzbMATHGoogle Scholar
  67. 67.
    Zhang, H., Yin, W.T.: Gradient methods for convex minimization: better rates under weaker conditions. Technical report, CAM Report 13-17, UCLA (2013)Google Scholar
  68. 68.
    Zhou, Y., Liang, Y.: Characterization of gradient dominance and regularity conditions for neural networks. arXiv:1710.06910 (2017)
  69. 69.
    Zhou, Z., So, M.C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. Ser. A 165(2), 689–728 (2017)MathSciNetzbMATHGoogle Scholar
  70. 70.
    Zolezzi, T.: On equiwellset minimum problems. Appl. Math. Optim. 4, 203–223 (1978)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2019

Authors and Affiliations

  1. 1.Department of MathematicsNational University of Defense TechnologyChangshaPeople’s Republic of China

Personalised recommendations