Skip to main content

Convergence Rate of Proximal Inertial Algorithms Associated with Moreau Envelopes of Convex Functions

  • Chapter
  • First Online:

Abstract

In a Hilbert space setting \({\mathcal H}\), we develop new inertial proximal-based algorithms that aim to rapidly minimize a convex lower-semicontinuous proper function \(\varPhi : \mathcal H \rightarrow {\mathbb R} \cup \{+\infty \}\). The guiding idea is to use an accelerated proximal scheme where, at each step, Φ is replaced by its Moreau envelope, with varying approximation parameter. This leads to consider a Relaxed Inertial Proximal Algorithm (RIPA) with variable parameters which take into account the effects of inertia, relaxation, and approximation. (RIPA) was first introduced to solve general maximally monotone inclusions, in which case a judicious adjustment of the parameters makes it possible to obtain the convergence of the iterates towards the equilibria. In the case of convex minimization problems, convergence analysis of (RIPA) was initially addressed by Attouch and Cabot, based on its formulation as an inertial gradient method with varying potential functions. We propose a new approach to this algorithm, along with further developments, based on its formulation as a proximal algorithm associated with varying Moreau envelopes. For convenient choices of the parameters, we show the fast optimization property of the function values, with the order o(k −2), and the weak convergence of the iterates. This is in line with the recent studies of Su-Boyd-Candès, Chambolle-Dossal, Attouch-Peypouquet. We study the impact of geometric assumptions on the convergence rates, and the stability of the results with respect to perturbations and errors. Finally, in the case of structured minimization problems smooth + nonsmooth, based on this approach, we introduce new proximal-gradient inertial algorithms for which similar convergence rates are shown.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Álvarez, F.: On the minimizing property of a second-order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38, 1102–1119 (2000)

    Article  MathSciNet  Google Scholar 

  2. Álvarez, F., Attouch, H., Bolte, J., Redont, P.: A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics. J. Math. Pures Appl. 81, 747–779 (2002)

    MATH  Google Scholar 

  3. Apidopoulos, V., Aujol, J.-F., Dossal, Ch.: Convergence rate of inertial Forward-Backward algorithm beyond Nesterov’s rule. Math. Prog. (Ser. A). , 1–20 (2018)

    Google Scholar 

  4. Attouch, H.: Variational Analysis for Functions and Operators. Pitman (1984)

    Google Scholar 

  5. Attouch, H., Bolte, J., Redont, P.: Optimizing properties of an inertial dynamical system with geometric damping. Link with proximal methods. Control Cybernet. 31, 643–657 (2002)

    MATH  Google Scholar 

  6. Attouch, H., Cabot, A.: Asymptotic stabilization of inertial gradient dynamics with time-dependent viscosity. J. Differential Equations 263, 5412–5458 (2017)

    Article  MathSciNet  Google Scholar 

  7. Attouch, H., Cabot, A.: Convergence rates of inertial forward-backward algorithms. SIAM J. Optim. 28, 849–874 (2018)

    Article  MathSciNet  Google Scholar 

  8. Attouch, H., Cabot, A.: Convergence of damped inertial dynamics governed by regularized maximally monotone operators. J. Differential Equations to appear. HAL-01648383v2 (2018)

    Google Scholar 

  9. Attouch, H., Cabot, A.: Convergence of a relaxed inertial proximal algorithm for maximally monotone operators. HAL-01708905 (2018)

    Google Scholar 

  10. Attouch, H., Cabot, A.: Convergence rate of a relaxed inertial proximal algorithm for convex minimization. HAL-01807041 (2018)

    Google Scholar 

  11. Attouch, H., Cabot, A., Redont, P.: The dynamics of elastic shocks via epigraphical regularization of a differential inclusion. Adv. Math. Sci. Appl. 12, 273–306 (2002)

    MathSciNet  MATH  Google Scholar 

  12. Attouch, H., Cabot, A., Chbani, Z., Riahi, H.: Accelerated forward-backward algorithms with perturbations. Application to Tikhonov regularization. J. Optim. Th. Appl. 179, 1–36 (2018)

    Article  Google Scholar 

  13. Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Prog. (Ser. B) 168, 123–175 (2018)

    Google Scholar 

  14. Attouch, H., Chbani, Z., Riahi, H.: Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3, ESAIM: COCV 25 (2019)

    Google Scholar 

  15. Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1∕k 2. SIAM J. Optim. 26, 1824–1834 (2016)

    Article  MathSciNet  Google Scholar 

  16. Attouch, H., Peypouquet, J.: Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators. Math. Prog. 174, 319–432 (2019)

    Article  Google Scholar 

  17. Attouch, H., Peypouquet, J., Redont, P.: Fast convex minimization via inertial dynamics with Hessian driven damping. J. Differential Equations 261, 5734–5783 (2016)

    Article  MathSciNet  Google Scholar 

  18. Aujol, J.-F., Dossal, Ch.: Stability of over-relaxations for the Forward-Backward algorithm, application to FISTA. SIAM J. Optim. 25, 2408–2433 (2015)

    Article  MathSciNet  Google Scholar 

  19. Baillon, J.-B.:, Un exemple concernant le comportement asymptotique de la solution du problème \(\frac {du}{dt} + \partial \phi (u) \ni 0\). J. Functional Anal. 28, 369–376 (1978)

    Google Scholar 

  20. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert spaces. Springer (2011)

    Google Scholar 

  21. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  22. Beck, A., Teboulle, M.: Gradient-Based Algorithms with Applications in Signal Recovery Problems. In Convex Optimization in Signal Processing and Communications, D. Palomar and Y. Eldar Eds., Cambridge University Press, 33–88 (2010)

    Google Scholar 

  23. Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Lojasiewicz Inequalities and Applications. Trans. AMS 362, 3319–3363 (2010)

    Article  Google Scholar 

  24. Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Prog. 165, 471–507 (2017)

    Article  MathSciNet  Google Scholar 

  25. Brézis, H.: Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution. North Holland, (1972)

    Google Scholar 

  26. Chambolle, A., Dossal, Ch.: On the convergence of the iterates of the Fast Iterative Shrinkage Thresholding Algorithm. J. Optim. Theory Appl. 166, 968–982 (2015)

    Article  MathSciNet  Google Scholar 

  27. Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Prog. (Ser. A) 145, 451–482 (2014)

    Google Scholar 

  28. Güler, O.: On the convergence of the proximal point algorithm for convex optimization. SIAM J. Control Optim. 29 403–419 (1991)

    Article  MathSciNet  Google Scholar 

  29. Imbert, C.: Convex Analysis techniques for Hopf-Lax formulae in Hamilton-Jacobi equations. J. of Nonlinear Convex Anal. 2 333–343 (2001)

    MathSciNet  MATH  Google Scholar 

  30. Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Prog. to appear. DOI 10.1007/s10107–015-0949-3.

    Google Scholar 

  31. Liang, J., Fadili, J., Peyré, G.: Local linear convergence of forward-backward under partial smoothness. Advances in Neural Information Processing Systems, 1970–1978 (2014)

    Google Scholar 

  32. May, R.: Asymptotic for a second-order evolution equation with convex potential and vanishing damping term. Turkish J. Math. 41, 681–685 (2017)

    Article  MathSciNet  Google Scholar 

  33. Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1∕k 2). Soviet Math. Doklady 27, 372–376 (1983)

    MATH  Google Scholar 

  34. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston, MA (2004)

    Google Scholar 

  35. Parikh, N., Boyd, S.: Proximal algorithms. Foundations and Trends in Optimization 1, 123–231 (2013)

    Google Scholar 

  36. Peypouquet, J.: Convex Optimization in Normed Spaces: Theory, Methods and Examples. Springer (2015)

    Book  Google Scholar 

  37. Villa, S., Salzo, S., Baldassarres, L., Verri A.: Accelerated and inexact forward-backward. SIAM J. Optim. 23, 1607–1633 (2013)

    Article  MathSciNet  Google Scholar 

  38. Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Neural Information Processing Systems 27, 2510–2518 (2014)

    MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by FONDECYT Grant 1181179 and CMM-Conicyt PIA AFB170001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hedy Attouch .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1.1 Some Properties of the Moreau Envelope

For a detailed presentation of the Moreau envelope, we refer the reader to [20, 25, 35, 36]. We merely point out the following properties, of constant use here:

  1. (i)

    the function λ ∈ ]0, +[↦Φ λ(x) is nonincreasing for each \(x\in {\mathcal H}\);

  2. (ii)

    the equality \(\inf _{\mathcal H}\varPhi =\inf _{\mathcal H}\varPhi _\lambda \) holds in \({\mathbb R}\cup \{-\infty \}\) for all λ > 0;

  3. (iii)

    \( \operatorname *{\mbox{arg min}}\varPhi = \operatorname *{\mbox{arg min}}\varPhi _\lambda \) for all λ > 0.

It will be convenient to consider the Moreau envelope as a function of the two variables \(x \in {\mathcal H}\) and λ ∈ ]0, +[. Its differentiability with respect to (x, λ) plays a crucial role in our analysis.

1.1.1.1 a

Let us first recall some classical facts concerning the differentiability of the function xΦ λ(x) for fixed λ > 0. The infimum in (1.2) is attained at a unique point

$$\displaystyle \begin{aligned} \mbox{prox}_{\lambda\varPhi}(x) = \mbox{{argmin}}_{ \xi \in \mathcal H}\left\lbrace \varPhi (\xi) + \frac{1}{2\lambda}\| x -\xi \|{}^2\right\rbrace, \end{aligned} $$
(1.68)

which gives

$$\displaystyle \begin{aligned} \varPhi_{\lambda}(x) = \varPhi (\mbox{prox}_{\lambda\varPhi} (x)) + \frac{1}{2\lambda} \|x - \mbox{prox}_{\lambda\varPhi}(x) \|{}^2 . \end{aligned} $$
(1.69)

Writing the optimality condition for (1.68), we get \( \mbox{prox}_{\lambda \varPhi }(x) + \lambda \partial \varPhi \left ( \mbox{prox}_{\lambda \varPhi }(x) \right ) \ni x,\) that is

$$\displaystyle \begin{aligned}\mbox{prox}_{\lambda\varPhi}(x) = \left( I + \lambda\partial \varPhi \right)^{-1} (x). \end{aligned}$$

Thus, proxλΦ is the resolvent of index λ > 0 of the maximal monotone operator ∂Φ. As a consequence, the mapping \(\mbox{prox}_{\lambda \varPsi }: {\mathcal H} \to {\mathcal H}\) is firmly nonexpansive. The function xΦ λ(x) is continuously differentiable, with

$$\displaystyle \begin{aligned} \nabla \varPhi_{\lambda}(x) = \frac{1}{\lambda} \left( x- \mbox{prox}_{\lambda\varPhi}(x) \right) . \end{aligned} $$
(1.70)

Equivalently

$$\displaystyle \begin{aligned} \nabla \varPhi_{\lambda} = \frac{1}{\lambda} \left( I- \left( I + \lambda\partial \varPhi \right)^{-1} \right)= \left( \partial \varPhi \right)_{\lambda} \end{aligned} $$
(1.71)

which is the Yosida approximation of the maximal monotone operator ∂Φ. As such, ∇Φ λ is Lipschitz continuous, with Lipschitz constant \(\frac {1}{\lambda } \), and \(\varPhi _{\lambda } \in \mathcal C^{1,1}\).

1.1.1.2 b

A less known result is the \({\mathcal C}^1\)-regularity of the function λΦ λ(x), for each \(x \in {\mathcal H}\). Its derivative is given by

$$\displaystyle \begin{aligned} \frac{d}{d\lambda}\varPhi_\lambda(x)=-\frac{1}{2}\|\nabla\varPhi_{\lambda}(x)\|{}^2 . \end{aligned} $$
(1.72)

This result is known as the Lax-Hopf formula for the above first-order Hamilton-Jacobi equation, see [4, Remark 3.2; Lemma 3.27], [8, Lemma A.1], and [29].

Lemma 1.7

For each \(x\in {\mathcal H}\) , the real-valued function λΦ λ(x) is continuously differentiable on ]0, +[, with

$$\displaystyle \begin{aligned} \frac{d}{d\lambda}\varPhi_\lambda(x)=-\frac{1}{2}\|\nabla\varPhi_{\lambda}(x)\|{}^2. \end{aligned} $$
(1.73)

As a consequence, for any \(x \in {\mathcal H}\), λ > 0 and μ > 0,

$$\displaystyle \begin{aligned} (\varPhi_\lambda)_{\mu}(x)= \varPhi_{(\lambda +\mu)}(x). \end{aligned} $$
(1.74)

Indeed, (1.74) is the semi-group property satisfied by the orbits of the autonomous evolution equation (1.72). Differentiating (1.74) with respect to x, and using (1.71) gives the classical resolvent equation

$$\displaystyle \begin{aligned} (A_\lambda)_{\mu}= A_{(\lambda +\mu)}, \end{aligned} $$
(1.75)

where A = ∂Φ. Indeed, (1.75) is valid for a general maximally monotone operator A, see, for example, [20, Proposition 23.6] or [25, Proposition 2.6].

1.1.2 Auxiliary Results

Theorem 1.6.2

Let λ : [t 0, +[→]0, +[ be continuous and nondecreasing. Let \(\varPhi :{\mathcal H}\to \mathbb R\cup \{+\infty \}\) be convex, lower-semicontinuous, and proper. Then, given any x 0 and v 0 in \({\mathcal H}\) , system (1.1) has a unique twice continuously differentiable global solution \(x:[t_0,+\infty [\to {\mathcal H}\) verifying x(t 0) = x 0 , \(\dot x(t_0)=v_0\).

Proof

The assertion appeals to the most elementary form of the Cauchy-Lipschitz theorem (see any textbook) and hinges on the (t, x)-continuity of ∇Φ λ and on its Lipschitz continuity with respect to x, uniform with respect to t.

Indeed, for t ∈ [t 0, +[ and \((x,x')\in {\mathcal H}\times {\mathcal H}\) we have

$$\displaystyle \begin{aligned} \|\nabla\varPhi_{\lambda(t)}(x')-\nabla\varPhi_{\lambda(t)}(x)\|\leq \frac{1}{\lambda(t)}\|x'-x\|\leq \frac{1}{\lambda(t_0)}\|x'-x\|. \end{aligned}$$

Next, the continuity of \(\nabla \varPhi _{\lambda (t)}(x)=\frac {1}{\lambda (t)}(x-\mbox{prox}_{\lambda (t)\varPhi }x)\) boils down to the continuity of the mapping \((t,x)\in [t_0,+\infty [\times {\mathcal H}\to \mbox{prox}_{\lambda (t)\varPhi }x\in {\mathcal H}\). For (t, x) and (t′, x′) in \([t_0,+\infty [\times {\mathcal H}\) we have

$$\displaystyle \begin{aligned} \|\mbox{prox}_{\lambda(t')\varPhi}x'-\mbox{prox}_{\lambda(t)\varPhi}x\|\leq \|\mbox{prox}_{\lambda(t')\varPhi}x'-\mbox{prox}_{\lambda(t')\varPhi}x\|+\|\mbox{prox}_{\lambda(t')\varPhi}x-\mbox{prox}_{\lambda(t)\varPhi}x\|. \end{aligned}$$

But, since proxλΦ is nonexpansive

$$\displaystyle \begin{aligned} \|\mbox{prox}_{\lambda(t')\varPhi}x'-\mbox{prox}_{\lambda(t')\varPhi}x\|\leq\|x'-x\|; \end{aligned}$$

and also (see [20, Prop. 23.28(iii)])

$$\displaystyle \begin{aligned} \|\mbox{prox}_{\lambda(t')\varPhi}x-\mbox{prox}_{\lambda(t)\varPhi}x\|\leq \left|\frac{\lambda(t')}{\lambda(t)}-1\right|\|\mbox{prox}_{\lambda(t)\varPhi}x-x\|. \end{aligned}$$

Therefore

$$\displaystyle \begin{aligned} \|\mbox{prox}_{\lambda(t')\varPhi}x'-\mbox{prox}_{\lambda(t)\varPhi}x\|\leq \|x'-x\|+|\lambda(t')-\lambda(t)|\|\nabla\varPhi_{\lambda(t)}(x)\|, \end{aligned}$$

which proves the continuity of proxλΦ at point (t, x).

Let us state the discrete version of Opial’s lemma.

Lemma 1.8

Let S be a nonempty subset of \(\mathcal H\) , and (x k) a sequence of elements of \(\mathcal H\) . Assume that

  1. (i)

    for every z  S, limk→+x k − zexists;

  2. (ii)

    every weak sequential cluster point of (x k), as k ∞, belongs to S.

Then x k converges weakly as k ∞ to a point in S.

We shall also make use of the following discrete version of the Gronwall lemma:

Lemma 1.9

Let (a k) be a sequence of nonnegative numbers such that, for all \(k\in \mathbb N\)

$$\displaystyle \begin{aligned}a_k^2 \leq c^2 + \sum_{j=1}^k \beta_j a_j,\end{aligned} $$

where (β j) is a summable sequence of nonnegative numbers, and c ≥ 0. Then, \(\displaystyle a_k \leq c + \sum _{j=1}^{\infty } \beta _j\) for all \(k\in \mathbb N\).

Proof

For \(k\in \mathbb N\), set A k :=max1≤mk a m. Then, for 1 ≤ m ≤ k, we have

$$\displaystyle \begin{aligned}a_m^2 \leq c^2 + \sum_{j=1}^m \beta_j a_j \leq c^2 + A_k \sum_{j=1}^{\infty} \beta_j.\end{aligned} $$

Taking the maximum over 1 ≤ m ≤ k, we obtain

$$\displaystyle \begin{aligned}A_k^2 \leq c^2 + A_k \sum_{j=1}^{\infty} \beta_j. \end{aligned}$$

Bounding by the roots of the corresponding quadratic equation, we obtain the result.

The next lemma provides an estimate of the convergence rate of a sequence that is summable with respect to weights.

Lemma 1.10 ([7, Lemma 22])

Let (τ k) be a nonnegative sequence such that \(\sum _{k=1}^{+\infty } \tau _{k}=+\infty \) . Assume that (𝜖 k) is a nonnegative and nonincreasing sequence satisfying \(\sum _{k=1}^{+\infty } \tau _{k}\,\epsilon _k<+\infty \) . Then we have \(\epsilon _k=o\left (\frac {1}{\sum _{i=1}^k \tau _i}\right ) \quad \mathit{\mbox{as }} k\to +\infty .\)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Attouch, H., Peypouquet, J. (2019). Convergence Rate of Proximal Inertial Algorithms Associated with Moreau Envelopes of Convex Functions. In: Bauschke, H., Burachik, R., Luke, D. (eds) Splitting Algorithms, Modern Operator Theory, and Applications. Springer, Cham. https://doi.org/10.1007/978-3-030-25939-6_1

Download citation

Publish with us

Policies and ethics