Abstract
In a Hilbert space setting \({\mathcal H}\), we develop new inertial proximal-based algorithms that aim to rapidly minimize a convex lower-semicontinuous proper function \(\varPhi : \mathcal H \rightarrow {\mathbb R} \cup \{+\infty \}\). The guiding idea is to use an accelerated proximal scheme where, at each step, Φ is replaced by its Moreau envelope, with varying approximation parameter. This leads to consider a Relaxed Inertial Proximal Algorithm (RIPA) with variable parameters which take into account the effects of inertia, relaxation, and approximation. (RIPA) was first introduced to solve general maximally monotone inclusions, in which case a judicious adjustment of the parameters makes it possible to obtain the convergence of the iterates towards the equilibria. In the case of convex minimization problems, convergence analysis of (RIPA) was initially addressed by Attouch and Cabot, based on its formulation as an inertial gradient method with varying potential functions. We propose a new approach to this algorithm, along with further developments, based on its formulation as a proximal algorithm associated with varying Moreau envelopes. For convenient choices of the parameters, we show the fast optimization property of the function values, with the order o(k −2), and the weak convergence of the iterates. This is in line with the recent studies of Su-Boyd-Candès, Chambolle-Dossal, Attouch-Peypouquet. We study the impact of geometric assumptions on the convergence rates, and the stability of the results with respect to perturbations and errors. Finally, in the case of structured minimization problems smooth + nonsmooth, based on this approach, we introduce new proximal-gradient inertial algorithms for which similar convergence rates are shown.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Álvarez, F.: On the minimizing property of a second-order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38, 1102–1119 (2000)
Álvarez, F., Attouch, H., Bolte, J., Redont, P.: A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics. J. Math. Pures Appl. 81, 747–779 (2002)
Apidopoulos, V., Aujol, J.-F., Dossal, Ch.: Convergence rate of inertial Forward-Backward algorithm beyond Nesterov’s rule. Math. Prog. (Ser. A). , 1–20 (2018)
Attouch, H.: Variational Analysis for Functions and Operators. Pitman (1984)
Attouch, H., Bolte, J., Redont, P.: Optimizing properties of an inertial dynamical system with geometric damping. Link with proximal methods. Control Cybernet. 31, 643–657 (2002)
Attouch, H., Cabot, A.: Asymptotic stabilization of inertial gradient dynamics with time-dependent viscosity. J. Differential Equations 263, 5412–5458 (2017)
Attouch, H., Cabot, A.: Convergence rates of inertial forward-backward algorithms. SIAM J. Optim. 28, 849–874 (2018)
Attouch, H., Cabot, A.: Convergence of damped inertial dynamics governed by regularized maximally monotone operators. J. Differential Equations to appear. HAL-01648383v2 (2018)
Attouch, H., Cabot, A.: Convergence of a relaxed inertial proximal algorithm for maximally monotone operators. HAL-01708905 (2018)
Attouch, H., Cabot, A.: Convergence rate of a relaxed inertial proximal algorithm for convex minimization. HAL-01807041 (2018)
Attouch, H., Cabot, A., Redont, P.: The dynamics of elastic shocks via epigraphical regularization of a differential inclusion. Adv. Math. Sci. Appl. 12, 273–306 (2002)
Attouch, H., Cabot, A., Chbani, Z., Riahi, H.: Accelerated forward-backward algorithms with perturbations. Application to Tikhonov regularization. J. Optim. Th. Appl. 179, 1–36 (2018)
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Prog. (Ser. B) 168, 123–175 (2018)
Attouch, H., Chbani, Z., Riahi, H.: Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3, ESAIM: COCV 25 (2019)
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1∕k 2. SIAM J. Optim. 26, 1824–1834 (2016)
Attouch, H., Peypouquet, J.: Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators. Math. Prog. 174, 319–432 (2019)
Attouch, H., Peypouquet, J., Redont, P.: Fast convex minimization via inertial dynamics with Hessian driven damping. J. Differential Equations 261, 5734–5783 (2016)
Aujol, J.-F., Dossal, Ch.: Stability of over-relaxations for the Forward-Backward algorithm, application to FISTA. SIAM J. Optim. 25, 2408–2433 (2015)
Baillon, J.-B.:, Un exemple concernant le comportement asymptotique de la solution du problème \(\frac {du}{dt} + \partial \phi (u) \ni 0\). J. Functional Anal. 28, 369–376 (1978)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert spaces. Springer (2011)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Beck, A., Teboulle, M.: Gradient-Based Algorithms with Applications in Signal Recovery Problems. In Convex Optimization in Signal Processing and Communications, D. Palomar and Y. Eldar Eds., Cambridge University Press, 33–88 (2010)
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Lojasiewicz Inequalities and Applications. Trans. AMS 362, 3319–3363 (2010)
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Prog. 165, 471–507 (2017)
Brézis, H.: Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution. North Holland, (1972)
Chambolle, A., Dossal, Ch.: On the convergence of the iterates of the Fast Iterative Shrinkage Thresholding Algorithm. J. Optim. Theory Appl. 166, 968–982 (2015)
Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Prog. (Ser. A) 145, 451–482 (2014)
Güler, O.: On the convergence of the proximal point algorithm for convex optimization. SIAM J. Control Optim. 29 403–419 (1991)
Imbert, C.: Convex Analysis techniques for Hopf-Lax formulae in Hamilton-Jacobi equations. J. of Nonlinear Convex Anal. 2 333–343 (2001)
Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Prog. to appear. DOI 10.1007/s10107–015-0949-3.
Liang, J., Fadili, J., Peyré, G.: Local linear convergence of forward-backward under partial smoothness. Advances in Neural Information Processing Systems, 1970–1978 (2014)
May, R.: Asymptotic for a second-order evolution equation with convex potential and vanishing damping term. Turkish J. Math. 41, 681–685 (2017)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1∕k 2). Soviet Math. Doklady 27, 372–376 (1983)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston, MA (2004)
Parikh, N., Boyd, S.: Proximal algorithms. Foundations and Trends in Optimization 1, 123–231 (2013)
Peypouquet, J.: Convex Optimization in Normed Spaces: Theory, Methods and Examples. Springer (2015)
Villa, S., Salzo, S., Baldassarres, L., Verri A.: Accelerated and inexact forward-backward. SIAM J. Optim. 23, 1607–1633 (2013)
Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Neural Information Processing Systems 27, 2510–2518 (2014)
Acknowledgements
This work was supported by FONDECYT Grant 1181179 and CMM-Conicyt PIA AFB170001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1.1 Some Properties of the Moreau Envelope
For a detailed presentation of the Moreau envelope, we refer the reader to [20, 25, 35, 36]. We merely point out the following properties, of constant use here:
-
(i)
the function λ ∈ ]0, +∞[↦Φ λ(x) is nonincreasing for each \(x\in {\mathcal H}\);
-
(ii)
the equality \(\inf _{\mathcal H}\varPhi =\inf _{\mathcal H}\varPhi _\lambda \) holds in \({\mathbb R}\cup \{-\infty \}\) for all λ > 0;
-
(iii)
\( \operatorname *{\mbox{arg min}}\varPhi = \operatorname *{\mbox{arg min}}\varPhi _\lambda \) for all λ > 0.
It will be convenient to consider the Moreau envelope as a function of the two variables \(x \in {\mathcal H}\) and λ ∈ ]0, +∞[. Its differentiability with respect to (x, λ) plays a crucial role in our analysis.
1.1.1.1 a
Let us first recall some classical facts concerning the differentiability of the function x↦Φ λ(x) for fixed λ > 0. The infimum in (1.2) is attained at a unique point
which gives
Writing the optimality condition for (1.68), we get \( \mbox{prox}_{\lambda \varPhi }(x) + \lambda \partial \varPhi \left ( \mbox{prox}_{\lambda \varPhi }(x) \right ) \ni x,\) that is
Thus, proxλΦ is the resolvent of index λ > 0 of the maximal monotone operator ∂Φ. As a consequence, the mapping \(\mbox{prox}_{\lambda \varPsi }: {\mathcal H} \to {\mathcal H}\) is firmly nonexpansive. The function x↦Φ λ(x) is continuously differentiable, with
Equivalently
which is the Yosida approximation of the maximal monotone operator ∂Φ. As such, ∇Φ λ is Lipschitz continuous, with Lipschitz constant \(\frac {1}{\lambda } \), and \(\varPhi _{\lambda } \in \mathcal C^{1,1}\).
1.1.1.2 b
A less known result is the \({\mathcal C}^1\)-regularity of the function λ↦Φ λ(x), for each \(x \in {\mathcal H}\). Its derivative is given by
This result is known as the Lax-Hopf formula for the above first-order Hamilton-Jacobi equation, see [4, Remark 3.2; Lemma 3.27], [8, Lemma A.1], and [29].
Lemma 1.7
For each \(x\in {\mathcal H}\) , the real-valued function λ↦Φ λ(x) is continuously differentiable on ]0, +∞[, with
As a consequence, for any \(x \in {\mathcal H}\), λ > 0 and μ > 0,
Indeed, (1.74) is the semi-group property satisfied by the orbits of the autonomous evolution equation (1.72). Differentiating (1.74) with respect to x, and using (1.71) gives the classical resolvent equation
where A = ∂Φ. Indeed, (1.75) is valid for a general maximally monotone operator A, see, for example, [20, Proposition 23.6] or [25, Proposition 2.6].
1.1.2 Auxiliary Results
Theorem 1.6.2
Let λ : [t 0, +∞[→]0, +∞[ be continuous and nondecreasing. Let \(\varPhi :{\mathcal H}\to \mathbb R\cup \{+\infty \}\) be convex, lower-semicontinuous, and proper. Then, given any x 0 and v 0 in \({\mathcal H}\) , system (1.1) has a unique twice continuously differentiable global solution \(x:[t_0,+\infty [\to {\mathcal H}\) verifying x(t 0) = x 0 , \(\dot x(t_0)=v_0\).
Proof
The assertion appeals to the most elementary form of the Cauchy-Lipschitz theorem (see any textbook) and hinges on the (t, x)-continuity of ∇Φ λ and on its Lipschitz continuity with respect to x, uniform with respect to t.
Indeed, for t ∈ [t 0, +∞[ and \((x,x')\in {\mathcal H}\times {\mathcal H}\) we have
Next, the continuity of \(\nabla \varPhi _{\lambda (t)}(x)=\frac {1}{\lambda (t)}(x-\mbox{prox}_{\lambda (t)\varPhi }x)\) boils down to the continuity of the mapping \((t,x)\in [t_0,+\infty [\times {\mathcal H}\to \mbox{prox}_{\lambda (t)\varPhi }x\in {\mathcal H}\). For (t, x) and (t′, x′) in \([t_0,+\infty [\times {\mathcal H}\) we have
But, since proxλΦ is nonexpansive
and also (see [20, Prop. 23.28(iii)])
Therefore
which proves the continuity of proxλΦ at point (t, x).
Let us state the discrete version of Opial’s lemma.
Lemma 1.8
Let S be a nonempty subset of \(\mathcal H\) , and (x k) a sequence of elements of \(\mathcal H\) . Assume that
-
(i)
for every z ∈ S, limk→+∞∥x k − z∥ exists;
-
(ii)
every weak sequential cluster point of (x k), as k →∞, belongs to S.
Then x k converges weakly as k →∞ to a point in S.
We shall also make use of the following discrete version of the Gronwall lemma:
Lemma 1.9
Let (a k) be a sequence of nonnegative numbers such that, for all \(k\in \mathbb N\)
where (β j) is a summable sequence of nonnegative numbers, and c ≥ 0. Then, \(\displaystyle a_k \leq c + \sum _{j=1}^{\infty } \beta _j\) for all \(k\in \mathbb N\).
Proof
For \(k\in \mathbb N\), set A k :=max1≤m≤k a m. Then, for 1 ≤ m ≤ k, we have
Taking the maximum over 1 ≤ m ≤ k, we obtain
Bounding by the roots of the corresponding quadratic equation, we obtain the result.
The next lemma provides an estimate of the convergence rate of a sequence that is summable with respect to weights.
Lemma 1.10 ([7, Lemma 22])
Let (τ k) be a nonnegative sequence such that \(\sum _{k=1}^{+\infty } \tau _{k}=+\infty \) . Assume that (𝜖 k) is a nonnegative and nonincreasing sequence satisfying \(\sum _{k=1}^{+\infty } \tau _{k}\,\epsilon _k<+\infty \) . Then we have \(\epsilon _k=o\left (\frac {1}{\sum _{i=1}^k \tau _i}\right ) \quad \mathit{\mbox{as }} k\to +\infty .\)
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Attouch, H., Peypouquet, J. (2019). Convergence Rate of Proximal Inertial Algorithms Associated with Moreau Envelopes of Convex Functions. In: Bauschke, H., Burachik, R., Luke, D. (eds) Splitting Algorithms, Modern Operator Theory, and Applications. Springer, Cham. https://doi.org/10.1007/978-3-030-25939-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-25939-6_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25938-9
Online ISBN: 978-3-030-25939-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)