Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria

Abstract

We consider optimization algorithms that successively minimize simple Taylor-like models of the objective function. Methods of Gauss–Newton type for minimizing the composition of a convex function and a smooth map are common examples. Our main result is an explicit relationship between the step-size of any such algorithm and the slope of the function at a nearby point. Consequently, we (1) show that the step-sizes can be reliably used to terminate the algorithm, (2) prove that as long as the step-sizes tend to zero, every limit point of the iterates is stationary, and (3) show that conditions, akin to classical quadratic growth, imply that the step-sizes linearly bound the distance of the iterates to the solution set. The latter so-called error bound property is typically used to establish linear (or faster) convergence guarantees. Analogous results hold when the step-size is replaced by the square root of the decrease in the model’s value. We complete the paper with extensions to when the models are minimized only inexactly.

This is a preview of subscription content, access via your institution.

Notes

  1. 1.

    Since the first version of this work [22], a number of new algorithms were developed building on our viewpoint. For example [16] analyze stochastic subgradient methods, [35, 54] consider algorithms for adversarial learning and saddle-point problems, while [49] discuss generic line-search procedures using Taylor-like models built from Bregman divergences.

  2. 2.

    One such univariate example is \(\min _x f(x)=|\frac{1}{2}x^2+x|\). The prox-linear algorithm for convex composite minimization [23, Algorithm 5.1] initiated to the right of the origin—a minimizer of f—will generate a sequence \(x_k\rightarrow 0\) with \(|f'(x_k)|\rightarrow 1\).

  3. 3.

    By stationary, we mean that zero is a limiting subgradient of the function at the point.

References

  1. 1.

    Aragón Artacho, F.J., Geoffroy, M.H.: Characterization of metric regularity of subdifferentials. J. Convex Anal. 15(2), 365–380 (2008)

    MathSciNet  Google Scholar 

  2. 2.

    Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2, Ser. A), 91–129 (2013)

    MathSciNet  Google Scholar 

  3. 3.

    Bai, Y., Duchi, J., Mei, S.: Proximal algorithms for constrained composite optimization, with applications to solving low-rank SDPs (2019). Preprint arXiv:1903.00184

  4. 4.

    Beck, A., Teboulle, M.: A fast iterative shrinkage–thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    MathSciNet  Google Scholar 

  5. 5.

    Bolte, J., Daniilidis, A., Lewis, A.S., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)

    MathSciNet  Google Scholar 

  6. 6.

    Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)

    Google Scholar 

  7. 7.

    Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017)

    MathSciNet  Google Scholar 

  8. 8.

    Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2, Ser. A), 459–494 (2014)

    MathSciNet  Google Scholar 

  9. 9.

    Burke, J.V.: Descent methods for composite nondifferentiable optimization problems. Math. Program. 33(3), 260–279 (1985)

    MathSciNet  Google Scholar 

  10. 10.

    Burke, J.V., Ferris, M.C.: A Gauss–Newton method for convex composite optimization. Math. Program. 71(2), 179–194 (1995)

    MathSciNet  Google Scholar 

  11. 11.

    Burke, J.V., Lewis, A.S., Overton, M.L.: A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. Optim. 15(3), 751–779 (2005)

    MathSciNet  Google Scholar 

  12. 12.

    Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for l-1 regularized optimization. Math. Program. 157(2), 375–396 (2016)

    MathSciNet  Google Scholar 

  13. 13.

    Cartis, C., Gould, N.I.M., Toint, P.L.: On the evaluation complexity of composite function minimization with applications to nonconvex nonlinear programming. SIAM J. Optim. 21(4), 1721–1739 (2011)

    MathSciNet  Google Scholar 

  14. 14.

    Charisopoulos, V., Davis, D., Díaz, M., Drusvyatskiy, D.: Composite optimization for robust blind deconvolution (2019). arXiv preprint arXiv:1901.01624

  15. 15.

    Clarke, F.H., Ledyaev, Y., Stern, R.I., Wolenski, P.R.: Nonsmooth Analysis and Control Theory. Texts in Mathematics, vol. 178. Springer, New York (1998)

    Google Scholar 

  16. 16.

    Davis, D., Drusvyatskiy, D.: Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207–239 (2019)

    MathSciNet  Google Scholar 

  17. 17.

    De Giorgi, E., Marino, A., Tosques, M.: Problemi di evoluzione in spazi metrici e curve di massima pendenza. Atti Acad. Nat. Lincei Rend. Cl. Sci. Fiz. Mat. Natur. 68, 180–187 (1980)

    Google Scholar 

  18. 18.

    Drusvyatskiy, D.: Slope and geometry in variational mathematics. PhD thesis, Cornell University (2013)

  19. 19.

    Drusvyatskiy, D., Ioffe, A.D.: Quadratic growth and critical point stability of semi-algebraic functions. Math. Program. 153(2, Ser. A), 635–653 (2015)

    MathSciNet  Google Scholar 

  20. 20.

    Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Curves of descent. SIAM J. Control Optim. 53(1), 114–138 (2015)

    MathSciNet  Google Scholar 

  21. 21.

    Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Transversality and alternating projections for nonconvex sets. Found. Comput. Math. 15(6), 1637–1651 (2015)

    MathSciNet  Google Scholar 

  22. 22.

    Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria (2016). arXiv:1610.03446 (Ver. 1)

  23. 23.

    Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3), 919–948 (2018)

    MathSciNet  Google Scholar 

  24. 24.

    Drusvyatskiy, D., Mordukhovich, B.S., Nghia, T.T.A.: Second-order growth, tilt stability, and metric regularity of the subdifferential. J. Convex Anal. 21(4), 1165–1192 (2014)

    MathSciNet  Google Scholar 

  25. 25.

    Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Prog. (2016). https://doi.org/10.1007/s1010, arXiv:1605.00125

  26. 26.

    Duchi, J.C., Ruan, F.: Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. Inf. Infer. J. IMA 8(3), 471–529 (2018)

    MathSciNet  Google Scholar 

  27. 27.

    Ekeland, I.: On the variational principle. J. Math. Anal. Appl. 47, 324–353 (1974)

    MathSciNet  Google Scholar 

  28. 28.

    Fletcher, R.: A model algorithm for composite nondifferentiable optimization problems. In: Sorensen, D.C., Wets, R.J.B. (eds.) Nondifferential and Variational Techniques in Optimization (Lexington, Ky., 1980). Mathematical Programming Studies, vol. 17, pp. 67–76. Springer, Berlin (1982)

    Google Scholar 

  29. 29.

    Fletcher, R.: A model algorithm for composite nondifferentiable optimization problems. In: Sorensen, D.C., Wets, R.J.B. (eds.) Nondifferential and Variational Techniques in Optimization, pp. 67–76. Springer, Berlin (1982)

    Google Scholar 

  30. 30.

    Geiping, J., Moeller, M.: Composite optimization by nonconvex majorization–minimization. SIAM J. Imaging Sci. 11(4), 2494–2528 (2018)

    MathSciNet  Google Scholar 

  31. 31.

    Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156(1–2, Ser. A), 59–99 (2016)

    MathSciNet  Google Scholar 

  32. 32.

    Goldstein, A.A.: Optimization of Lipschitz continuous functions. Math. Program. 13(1), 14–22 (1977)

    MathSciNet  Google Scholar 

  33. 33.

    Ioffe, A.D.: Metric regularity and subdifferential calculus. Uspekhi Mat. Nauk 55(3(333)), 103–162 (2000)

    MathSciNet  Google Scholar 

  34. 34.

    Ioffe, A.D.: Variational Analysis of Regular Mappings. Springer Monographs in Mathematics. Springer, Berlin (2017)

    Google Scholar 

  35. 35.

    Jin, C., Netrapalli, P., Jordan, M.I.: Minmax optimization: stable limit points of gradient descent ascent are locally optimal (2019). arXiv preprint arXiv:1902.00618

  36. 36.

    Klatte, D., Kummer, B.: Nonsmooth Equations in Optimization: Regularity, Calculus, Methods and Applications. Nonconvex Optimization and Its Applications, vol. 60. Kluwer Academic Publishers, Dordrecht (2002)

    Google Scholar 

  37. 37.

    Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier (Grenoble) 48(3), 769–783 (1998)

    MathSciNet  Google Scholar 

  38. 38.

    Lewis, A.S., Wright, S.J.: A proximal method for composite minimization. Math. Program. 158, 1–46 (2015)

    MathSciNet  Google Scholar 

  39. 39.

    Luo, Z.-Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46/47(1–4), 157–178 (1993). Degeneracy in optimization problems

    MathSciNet  Google Scholar 

  40. 40.

    Martinet, B.: Régularisation d’inéquations variationnelles par approximations successives. Rev. Française Informat. Recherche Opérationnelle 4(Ser. R–3), 154–158 (1970)

    MathSciNet  Google Scholar 

  41. 41.

    Martinet, B.: Détermination approchée d’un point fixe d’une application pseudo-contractante. Cas de l’application prox. C. R. Acad. Sci. Paris Sér. A-B 274, A163–A165 (1972)

    Google Scholar 

  42. 42.

    Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1, Ser. A), 177–205 (2006)

    MathSciNet  Google Scholar 

  43. 43.

    Nesterov, Y.: A method for solving the convex programming problem with convergence rate \(O(1/k^{2})\). Dokl. Akad. Nauk SSSR 269(3), 543–547 (1983)

    MathSciNet  Google Scholar 

  44. 44.

    Nesterov, Y.: Modified Gauss–Newton scheme with worst case guarantees for global performance. Optim. Methods Softw. 22(3), 469–483 (2007)

    MathSciNet  Google Scholar 

  45. 45.

    Nesterov, Y.: Accelerating the cubic regularization of Newton’s method on convex problems. Math. Program. 112(1, Ser. B), 159–181 (2008)

    MathSciNet  Google Scholar 

  46. 46.

    Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1, Ser. B), 125–161 (2013)

    MathSciNet  Google Scholar 

  47. 47.

    Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2006)

    Google Scholar 

  48. 48.

    Noll, D., Prot, O., Rondepierre, A.: A proximity control algorithm to minimize nonsmooth and nonconvex functions. Pac. J. Optim. 4(3), 571–604 (2008)

    MathSciNet  Google Scholar 

  49. 49.

    Ochs, P., Fadili, J., Brox, T.: Non-smooth non-convex Bregman minimization: unification and new algorithms. J. Optim. Theory Appl. 181, 1–35 (2017)

    MathSciNet  Google Scholar 

  50. 50.

    Poliquin, R.A., Rockafellar, R.T.: Prox-regular functions in variational analysis. Trans. Am. Math. Soc. 348, 1805–1838 (1996)

    MathSciNet  Google Scholar 

  51. 51.

    Polyak, B.T.: Gradient methods for the minimisation of functionals. USSR Comput. Math. Math. Phys. 3(4), 864–878 (1963)

    Google Scholar 

  52. 52.

    Powell, M.J.D.: General algorithms for discrete nonlinear approximation calculations. In: Chui, C.K., Schumaker, L.L., Ward, J.D. (eds.) Approximation Theory, IV (College Station, Tex., 1983), pp. 187–218. Academic Press, New York (1983)

    Google Scholar 

  53. 53.

    Powell, M.J.D.: On the global convergence of trust region algorithms for unconstrained minimization. Math. Program. 29(3), 297–303 (1984)

    MathSciNet  Google Scholar 

  54. 54.

    Rafique, H., Liu, M., Lin, Q., Yang, T.: Non-convex min–max optimization: provable algorithms and applications in machine learning (2018). arXiv preprint arXiv:1810.02060

  55. 55.

    Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)

    MathSciNet  Google Scholar 

  56. 56.

    Rockafellar, R.T.: Proximal subgradients, marginal values, and augmented Lagrangians in nonconvex optimization. Math. Oper. Res. 6(3), 424–436 (1981)

    MathSciNet  Google Scholar 

  57. 57.

    Rockafellar, R.T., Dontchev, A.L.: Implicit Functions and Solution Mappings. Monographs in Mathematics. Springer, Berlin (2009)

    Google Scholar 

  58. 58.

    Scheinberg, K., Tang, X.: Practical Inexact Proximal Quasi-Newton Method with Global Complexity Analysis. Mathematical Programming, pp. 1–35. Springer, Berlin (2016)

    Google Scholar 

  59. 59.

    Wild, S.M.: Solving Derivative-Free Nonlinear Least Squares Problems with POUNDERS. Argonne National Lab, Lemont (2014)

    Google Scholar 

  60. 60.

    Wright, S.J.: Convergence of an inexact algorithm for composite nonsmooth optimization. IMA J. Numer. Anal. 10(3), 299–321 (1990)

    MathSciNet  Google Scholar 

  61. 61.

    Yuan, Y.: On the superlinear convergence of a trust region algorithm for nonsmooth optimization. Math. Program. 31(3), 269–285 (1985)

    MathSciNet  Google Scholar 

  62. 62.

    Zhang, R., Treiman, J.: Upper-Lipschitz multifunctions and inverse subdifferentials. Nonlinear Anal. 24(2), 273–286 (1995)

    MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank the two anonymous referees and the Associate Editor for their insightful comments, which have improved the exposition of this work.

Author information

Affiliations

Authors

Corresponding author

Correspondence to D. Drusvyatskiy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Research of Drusvyatskiy was partially supported by the AFOSR YIP award FA9550-15-1-0237. Research of Lewis was supported in part by National Science Foundation Grant DMS-1208338. Research of all three authors was supported in part by by the US-Israel Binational Science Foundation Grant 2014241.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Drusvyatskiy, D., Ioffe, A.D. & Lewis, A.S. Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria. Math. Program. 185, 357–383 (2021). https://doi.org/10.1007/s10107-019-01432-w

Download citation

Keywords

  • Taylor-like model
  • Error-bound
  • Slope
  • Subregularity
  • Kurdyka–Łojasiewicz inequality
  • Ekeland’s principle

Mathematics Subject Classification

  • 65K05
  • 90C30
  • 49M37
  • 65K10