Convergence Rate of Proximal Inertial Algorithms Associated with Moreau Envelopes of Convex Functions

Attouch, Hedy; Peypouquet, Juan

doi:10.1007/978-3-030-25939-6_1

Convergence Rate of Proximal Inertial Algorithms Associated with Moreau Envelopes of Convex Functions

Hedy Attouch⁴ &
Juan Peypouquet⁵

Chapter
First Online: 07 November 2019

1092 Accesses
2 Citations

Abstract

In a Hilbert space setting ${\mathcal H}$, we develop new inertial proximal-based algorithms that aim to rapidly minimize a convex lower-semicontinuous proper function $\varPhi : \mathcal H \rightarrow {\mathbb R} \cup \{+\infty \}$. The guiding idea is to use an accelerated proximal scheme where, at each step, Φ is replaced by its Moreau envelope, with varying approximation parameter. This leads to consider a Relaxed Inertial Proximal Algorithm (RIPA) with variable parameters which take into account the effects of inertia, relaxation, and approximation. (RIPA) was first introduced to solve general maximally monotone inclusions, in which case a judicious adjustment of the parameters makes it possible to obtain the convergence of the iterates towards the equilibria. In the case of convex minimization problems, convergence analysis of (RIPA) was initially addressed by Attouch and Cabot, based on its formulation as an inertial gradient method with varying potential functions. We propose a new approach to this algorithm, along with further developments, based on its formulation as a proximal algorithm associated with varying Moreau envelopes. For convenient choices of the parameters, we show the fast optimization property of the function values, with the order o(k ⁻²), and the weak convergence of the iterates. This is in line with the recent studies of Su-Boyd-Candès, Chambolle-Dossal, Attouch-Peypouquet. We study the impact of geometric assumptions on the convergence rates, and the stability of the results with respect to perturbations and errors. Finally, in the case of structured minimization problems smooth + nonsmooth, based on this approach, we introduce new proximal-gradient inertial algorithms for which similar convergence rates are shown.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Álvarez, F.: On the minimizing property of a second-order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38, 1102–1119 (2000)
Article MathSciNet Google Scholar
Álvarez, F., Attouch, H., Bolte, J., Redont, P.: A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics. J. Math. Pures Appl. 81, 747–779 (2002)
MATH Google Scholar
Apidopoulos, V., Aujol, J.-F., Dossal, Ch.: Convergence rate of inertial Forward-Backward algorithm beyond Nesterov’s rule. Math. Prog. (Ser. A). , 1–20 (2018)
Google Scholar
Attouch, H.: Variational Analysis for Functions and Operators. Pitman (1984)
Google Scholar
Attouch, H., Bolte, J., Redont, P.: Optimizing properties of an inertial dynamical system with geometric damping. Link with proximal methods. Control Cybernet. 31, 643–657 (2002)
MATH Google Scholar
Attouch, H., Cabot, A.: Asymptotic stabilization of inertial gradient dynamics with time-dependent viscosity. J. Differential Equations 263, 5412–5458 (2017)
Article MathSciNet Google Scholar
Attouch, H., Cabot, A.: Convergence rates of inertial forward-backward algorithms. SIAM J. Optim. 28, 849–874 (2018)
Article MathSciNet Google Scholar
Attouch, H., Cabot, A.: Convergence of damped inertial dynamics governed by regularized maximally monotone operators. J. Differential Equations to appear. HAL-01648383v2 (2018)
Google Scholar
Attouch, H., Cabot, A.: Convergence of a relaxed inertial proximal algorithm for maximally monotone operators. HAL-01708905 (2018)
Google Scholar
Attouch, H., Cabot, A.: Convergence rate of a relaxed inertial proximal algorithm for convex minimization. HAL-01807041 (2018)
Google Scholar
Attouch, H., Cabot, A., Redont, P.: The dynamics of elastic shocks via epigraphical regularization of a differential inclusion. Adv. Math. Sci. Appl. 12, 273–306 (2002)
MathSciNet MATH Google Scholar
Attouch, H., Cabot, A., Chbani, Z., Riahi, H.: Accelerated forward-backward algorithms with perturbations. Application to Tikhonov regularization. J. Optim. Th. Appl. 179, 1–36 (2018)
Article Google Scholar
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Prog. (Ser. B) 168, 123–175 (2018)
Google Scholar
Attouch, H., Chbani, Z., Riahi, H.: Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3, ESAIM: COCV 25 (2019)
Google Scholar
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1∕k ². SIAM J. Optim. 26, 1824–1834 (2016)
Article MathSciNet Google Scholar
Attouch, H., Peypouquet, J.: Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators. Math. Prog. 174, 319–432 (2019)
Article Google Scholar
Attouch, H., Peypouquet, J., Redont, P.: Fast convex minimization via inertial dynamics with Hessian driven damping. J. Differential Equations 261, 5734–5783 (2016)
Article MathSciNet Google Scholar
Aujol, J.-F., Dossal, Ch.: Stability of over-relaxations for the Forward-Backward algorithm, application to FISTA. SIAM J. Optim. 25, 2408–2433 (2015)
Article MathSciNet Google Scholar
Baillon, J.-B.:, Un exemple concernant le comportement asymptotique de la solution du problème $\frac {du}{dt} + \partial \phi (u) \ni 0$. J. Functional Anal. 28, 369–376 (1978)
Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert spaces. Springer (2011)
Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: Gradient-Based Algorithms with Applications in Signal Recovery Problems. In Convex Optimization in Signal Processing and Communications, D. Palomar and Y. Eldar Eds., Cambridge University Press, 33–88 (2010)
Google Scholar
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Lojasiewicz Inequalities and Applications. Trans. AMS 362, 3319–3363 (2010)
Article Google Scholar
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Prog. 165, 471–507 (2017)
Article MathSciNet Google Scholar
Brézis, H.: Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution. North Holland, (1972)
Google Scholar
Chambolle, A., Dossal, Ch.: On the convergence of the iterates of the Fast Iterative Shrinkage Thresholding Algorithm. J. Optim. Theory Appl. 166, 968–982 (2015)
Article MathSciNet Google Scholar
Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Prog. (Ser. A) 145, 451–482 (2014)
Google Scholar
Güler, O.: On the convergence of the proximal point algorithm for convex optimization. SIAM J. Control Optim. 29 403–419 (1991)
Article MathSciNet Google Scholar
Imbert, C.: Convex Analysis techniques for Hopf-Lax formulae in Hamilton-Jacobi equations. J. of Nonlinear Convex Anal. 2 333–343 (2001)
MathSciNet MATH Google Scholar
Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Prog. to appear. DOI 10.1007/s10107–015-0949-3.
Google Scholar
Liang, J., Fadili, J., Peyré, G.: Local linear convergence of forward-backward under partial smoothness. Advances in Neural Information Processing Systems, 1970–1978 (2014)
Google Scholar
May, R.: Asymptotic for a second-order evolution equation with convex potential and vanishing damping term. Turkish J. Math. 41, 681–685 (2017)
Article MathSciNet Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1∕k ²). Soviet Math. Doklady 27, 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston, MA (2004)
Google Scholar
Parikh, N., Boyd, S.: Proximal algorithms. Foundations and Trends in Optimization 1, 123–231 (2013)
Google Scholar
Peypouquet, J.: Convex Optimization in Normed Spaces: Theory, Methods and Examples. Springer (2015)
Book Google Scholar
Villa, S., Salzo, S., Baldassarres, L., Verri A.: Accelerated and inexact forward-backward. SIAM J. Optim. 23, 1607–1633 (2013)
Article MathSciNet Google Scholar
Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Neural Information Processing Systems 27, 2510–2518 (2014)
MATH Google Scholar

Download references

Acknowledgements

This work was supported by FONDECYT Grant 1181179 and CMM-Conicyt PIA AFB170001.

Author information

Authors and Affiliations

IMAG, Univ. Montpellier, CNRS, Montpellier, France
Hedy Attouch
Departamento de Ingeniería Matemática & Centro de Modelamiento Matemático (CNRS UMI2807), FCFM, Universidad de Chile, Santiago, Chile
Juan Peypouquet

Authors

Hedy Attouch
View author publications
You can also search for this author in PubMed Google Scholar
Juan Peypouquet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hedy Attouch .

Editor information

Editors and Affiliations

Department of Mathematics, University of British Columbia, Kelowna, BC, Canada
Heinz H. Bauschke
School of IT & Mathematical Sciences, University of South Australia, Mawson Lakes, SA, Australia
Regina S. Burachik
Inst. Numerische & Angewandte Mathematik, Universität Göttingen, Göttingen, Niedersachsen, Germany
D. Russell Luke

Appendix

1.1.1 Some Properties of the Moreau Envelope

For a detailed presentation of the Moreau envelope, we refer the reader to [20, 25, 35, 36]. We merely point out the following properties, of constant use here:

(i)
the function λ ∈ ]0, +∞[↦Φ _λ(x) is nonincreasing for each $x\in {\mathcal H}$;
(ii)
the equality $\inf _{\mathcal H}\varPhi =\inf _{\mathcal H}\varPhi _\lambda $ holds in ${\mathbb R}\cup \{-\infty \}$ for all λ > 0;
(iii)
$ \operatorname *{\mbox{arg min}}\varPhi = \operatorname *{\mbox{arg min}}\varPhi _\lambda $ for all λ > 0.

It will be convenient to consider the Moreau envelope as a function of the two variables $x \in {\mathcal H}$ and λ ∈ ]0, +∞[. Its differentiability with respect to (x, λ) plays a crucial role in our analysis.

1.1.1.1 a

Let us first recall some classical facts concerning the differentiability of the function x↦Φ _λ(x) for fixed λ > 0. The infimum in (1.2) is attained at a unique point

$$\displaystyle \begin{aligned} \mbox{prox}_{\lambda\varPhi}(x) = \mbox{{argmin}}_{ \xi \in \mathcal H}\left\lbrace \varPhi (\xi) + \frac{1}{2\lambda}\| x -\xi \|{}^2\right\rbrace, \end{aligned} $$

(1.68)

which gives

$$\displaystyle \begin{aligned} \varPhi_{\lambda}(x) = \varPhi (\mbox{prox}_{\lambda\varPhi} (x)) + \frac{1}{2\lambda} \|x - \mbox{prox}_{\lambda\varPhi}(x) \|{}^2 . \end{aligned} $$

(1.69)

Writing the optimality condition for (1.68), we get $ \mbox{prox}_{\lambda \varPhi }(x) + \lambda \partial \varPhi \left ( \mbox{prox}_{\lambda \varPhi }(x) \right ) \ni x,$ that is

$$\displaystyle \begin{aligned}\mbox{prox}_{\lambda\varPhi}(x) = \left( I + \lambda\partial \varPhi \right)^{-1} (x). \end{aligned}$$

Thus, prox_λΦ is the resolvent of index λ > 0 of the maximal monotone operator ∂Φ. As a consequence, the mapping $\mbox{prox}_{\lambda \varPsi }: {\mathcal H} \to {\mathcal H}$ is firmly nonexpansive. The function x↦Φ _λ(x) is continuously differentiable, with

$$\displaystyle \begin{aligned} \nabla \varPhi_{\lambda}(x) = \frac{1}{\lambda} \left( x- \mbox{prox}_{\lambda\varPhi}(x) \right) . \end{aligned} $$

(1.70)

Equivalently

$$\displaystyle \begin{aligned} \nabla \varPhi_{\lambda} = \frac{1}{\lambda} \left( I- \left( I + \lambda\partial \varPhi \right)^{-1} \right)= \left( \partial \varPhi \right)_{\lambda} \end{aligned} $$

(1.71)

which is the Yosida approximation of the maximal monotone operator ∂Φ. As such, ∇Φ _λ is Lipschitz continuous, with Lipschitz constant $\frac {1}{\lambda } $, and $\varPhi _{\lambda } \in \mathcal C^{1,1}$.

1.1.1.2 b

A less known result is the ${\mathcal C}^1$-regularity of the function λ↦Φ _λ(x), for each $x \in {\mathcal H}$. Its derivative is given by

$$\displaystyle \begin{aligned} \frac{d}{d\lambda}\varPhi_\lambda(x)=-\frac{1}{2}\|\nabla\varPhi_{\lambda}(x)\|{}^2 . \end{aligned} $$

(1.72)

This result is known as the Lax-Hopf formula for the above first-order Hamilton-Jacobi equation, see [4, Remark 3.2; Lemma 3.27], [8, Lemma A.1], and [29].

Lemma 1.7

For each $x\in {\mathcal H}$ , the real-valued function λ↦Φ _λ(x) is continuously differentiable on ]0, +∞[, with

$$\displaystyle \begin{aligned} \frac{d}{d\lambda}\varPhi_\lambda(x)=-\frac{1}{2}\|\nabla\varPhi_{\lambda}(x)\|{}^2. \end{aligned} $$

(1.73)

As a consequence, for any $x \in {\mathcal H}$, λ > 0 and μ > 0,

$$\displaystyle \begin{aligned} (\varPhi_\lambda)_{\mu}(x)= \varPhi_{(\lambda +\mu)}(x). \end{aligned} $$

(1.74)

Indeed, (1.74) is the semi-group property satisfied by the orbits of the autonomous evolution equation (1.72). Differentiating (1.74) with respect to x, and using (1.71) gives the classical resolvent equation

$$\displaystyle \begin{aligned} (A_\lambda)_{\mu}= A_{(\lambda +\mu)}, \end{aligned} $$

(1.75)

where A = ∂Φ. Indeed, (1.75) is valid for a general maximally monotone operator A, see, for example, [20, Proposition 23.6] or [25, Proposition 2.6].

1.1.2 Auxiliary Results

Theorem 1.6.2

Let λ : [t ₀, +∞[→]0, +∞[ be continuous and nondecreasing. Let $\varPhi :{\mathcal H}\to \mathbb R\cup \{+\infty \}$ be convex, lower-semicontinuous, and proper. Then, given any x ₀ and v ₀ in ${\mathcal H}$ , system (1.1) has a unique twice continuously differentiable global solution $x:[t_0,+\infty [\to {\mathcal H}$ verifying x(t ₀) = x ₀ , $\dot x(t_0)=v_0$.

Proof

The assertion appeals to the most elementary form of the Cauchy-Lipschitz theorem (see any textbook) and hinges on the (t, x)-continuity of ∇Φ _λ and on its Lipschitz continuity with respect to x, uniform with respect to t.

Indeed, for t ∈ [t ₀, +∞[ and $(x,x')\in {\mathcal H}\times {\mathcal H}$ we have

$$\displaystyle \begin{aligned} \|\nabla\varPhi_{\lambda(t)}(x')-\nabla\varPhi_{\lambda(t)}(x)\|\leq \frac{1}{\lambda(t)}\|x'-x\|\leq \frac{1}{\lambda(t_0)}\|x'-x\|. \end{aligned}$$

Next, the continuity of $\nabla \varPhi _{\lambda (t)}(x)=\frac {1}{\lambda (t)}(x-\mbox{prox}_{\lambda (t)\varPhi }x)$ boils down to the continuity of the mapping $(t,x)\in [t_0,+\infty [\times {\mathcal H}\to \mbox{prox}_{\lambda (t)\varPhi }x\in {\mathcal H}$. For (t, x) and (t′, x′) in $[t_0,+\infty [\times {\mathcal H}$ we have

$$\displaystyle \begin{aligned} \|\mbox{prox}_{\lambda(t')\varPhi}x'-\mbox{prox}_{\lambda(t)\varPhi}x\|\leq \|\mbox{prox}_{\lambda(t')\varPhi}x'-\mbox{prox}_{\lambda(t')\varPhi}x\|+\|\mbox{prox}_{\lambda(t')\varPhi}x-\mbox{prox}_{\lambda(t)\varPhi}x\|. \end{aligned}$$

But, since prox_λΦ is nonexpansive

$$\displaystyle \begin{aligned} \|\mbox{prox}_{\lambda(t')\varPhi}x'-\mbox{prox}_{\lambda(t')\varPhi}x\|\leq\|x'-x\|; \end{aligned}$$

and also (see [20, Prop. 23.28(iii)])

$$\displaystyle \begin{aligned} \|\mbox{prox}_{\lambda(t')\varPhi}x-\mbox{prox}_{\lambda(t)\varPhi}x\|\leq \left|\frac{\lambda(t')}{\lambda(t)}-1\right|\|\mbox{prox}_{\lambda(t)\varPhi}x-x\|. \end{aligned}$$

Therefore

$$\displaystyle \begin{aligned} \|\mbox{prox}_{\lambda(t')\varPhi}x'-\mbox{prox}_{\lambda(t)\varPhi}x\|\leq \|x'-x\|+|\lambda(t')-\lambda(t)|\|\nabla\varPhi_{\lambda(t)}(x)\|, \end{aligned}$$

which proves the continuity of prox_λΦ at point (t, x).

Let us state the discrete version of Opial’s lemma.

Lemma 1.8

Let S be a nonempty subset of $\mathcal H$ , and (x _k) a sequence of elements of $\mathcal H$ . Assume that

(i)
for every z ∈ S, lim_k→+∞∥x _k − z∥ exists;
(ii)
every weak sequential cluster point of (x _k), as k →∞, belongs to S.

Then x _k converges weakly as k →∞ to a point in S.

We shall also make use of the following discrete version of the Gronwall lemma:

Lemma 1.9

Let (a _k) be a sequence of nonnegative numbers such that, for all $k\in \mathbb N$

$$\displaystyle \begin{aligned}a_k^2 \leq c^2 + \sum_{j=1}^k \beta_j a_j,\end{aligned} $$

where (β _j) is a summable sequence of nonnegative numbers, and c ≥ 0. Then, $\displaystyle a_k \leq c + \sum _{j=1}^{\infty } \beta _j$ for all $k\in \mathbb N$.

Proof

For $k\in \mathbb N$, set A _k :=max_1≤m≤k a _m. Then, for 1 ≤ m ≤ k, we have

$$\displaystyle \begin{aligned}a_m^2 \leq c^2 + \sum_{j=1}^m \beta_j a_j \leq c^2 + A_k \sum_{j=1}^{\infty} \beta_j.\end{aligned} $$

Taking the maximum over 1 ≤ m ≤ k, we obtain

$$\displaystyle \begin{aligned}A_k^2 \leq c^2 + A_k \sum_{j=1}^{\infty} \beta_j. \end{aligned}$$

Bounding by the roots of the corresponding quadratic equation, we obtain the result.

The next lemma provides an estimate of the convergence rate of a sequence that is summable with respect to weights.

Lemma 1.10 ([7, Lemma 22])

Let (τ _k) be a nonnegative sequence such that $\sum _{k=1}^{+\infty } \tau _{k}=+\infty $ . Assume that (𝜖 _k) is a nonnegative and nonincreasing sequence satisfying $\sum _{k=1}^{+\infty } \tau _{k}\,\epsilon _k<+\infty $ . Then we have $\epsilon _k=o\left (\frac {1}{\sum _{i=1}^k \tau _i}\right ) \quad \mathit{\mbox{as }} k\to +\infty .$

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Attouch, H., Peypouquet, J. (2019). Convergence Rate of Proximal Inertial Algorithms Associated with Moreau Envelopes of Convex Functions. In: Bauschke, H., Burachik, R., Luke, D. (eds) Splitting Algorithms, Modern Operator Theory, and Applications. Springer, Cham. https://doi.org/10.1007/978-3-030-25939-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-25939-6_1
Published: 07 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25938-9
Online ISBN: 978-3-030-25939-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Abstract

Buying options

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1.1 Some Properties of the Moreau Envelope

1.1.1.1 a

1.1.1.2 b

Lemma 1.7

1.1.2 Auxiliary Results

Theorem 1.6.2

Proof

Proof

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation