Abstract
We study the behavior of the trajectories of a second-order differential equation with vanishing damping, governed by the Yosida regularization of a maximally monotone operator with time-varying index, along with a new Regularized Inertial Proximal Algorithm obtained by means of a convenient finite-difference discretization. These systems are the counterpart to accelerated forward–backward algorithms in the context of maximally monotone operators. A proper tuning of the parameters allows us to prove the weak convergence of the trajectories to zeroes of the operator. Moreover, it is possible to estimate the rate at which the speed and acceleration vanish. We also study the effect of perturbations or computational errors that leave the convergence properties unchanged. We also analyze a growth condition under which strong convergence can be guaranteed. A simple example shows the criticality of the assumptions on the Yosida approximation parameter, and allows us to illustrate the behavior of these systems compared with some of their close relatives.
Similar content being viewed by others
Notes
The idea consisting in regularizing with the help of the Moreau envelopes an inertial dynamic governed by a nonsmooth operator was already used in the modeling of elastic shocks in [5].
References
Álvarez, F.: On the minimizing property of a second-order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38(4), 1102–1119 (2000)
Álvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 9(1–2), 3–11 (2001)
Attouch, H., Cabot, A.: Asymptotic stabilization of inertial gradient dynamics with time-dependent viscosity. J. Differ. Equ. 263(9), 5412–5458 (2017)
Attouch, H., Cabot, A.: Convergence rates of inertial forward-backward algorithms, HAL-01453170 (2017)
Attouch, H., Cabot, A., Redont, P.: The dynamics of elastic shocks via epigraphical regularization of a differential inclusion. Adv. Math. Sci. Appl. 12(1), 273–306 (2002)
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing damping, to appear in Math. Program. https://doi.org/10.1007/s10107-016-0992-8
Attouch, H., Maingé, P.E.: Asymptotic behavior of second order dissipative evolution equations combining potential with non-potential effects. ESAIM Control Optim. Calc. Var. 17(3), 836–857 (2011)
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward–backward method is actually faster than \(\frac{1}{k^2}\). SIAM J. Optim. 26(3), 1824–1834 (2016)
Attouch, H., Peypouquet, J., Redont, P.: Fast convergence of regularized inertial dynamics for nonsmooth convex optimization, Working paper. (2017)
Attouch, H., Soueycatt, M.: Augmented Lagrangian and proximal alternating direction methods of multipliers in Hilbert spaces. Applications to games, PDE’s and control. Pac. J. Optim. 5(1), 17–37 (2009)
Attouch, H., Wets, R.: Epigraphical processes: laws of large numbers for random LSC functions. Sem. Anal. Convexe Montp. 20, 13–29 (1990)
Attouch, H., Wets, R.: Quantitative stability of variational systems: I, the epigraphical distance. Trans. Am. Math. Soc. 328(2), 695–729 (1991)
Attouch, H., Wets, R.: Quantitative stability of variational systems: II, a framework for nonlinear conditioning. SIAM J. Optim. 3, 359–381 (1993)
Bauschke, H., Combettes, P.: Convex Analysis and Monotone Operator Theory in Hilbert spaces, CMS Books in Mathematics. Springer, Berlin (2011)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Brézis, H.: Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution, Lecture Notes 5, North Holland, (1972)
Brézis, H.: Functional Analysis, Sobolev Spaces and Partial Differential Equations. Springer, Berlin (2011)
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017)
Cabot, A., Engler, H., Gadat, S.: On the long time behavior of second order differential equations with asymptotically small dissipation. Trans. Am. Math. Soc. 361, 5983–6017 (2009)
Chambolle, A., Dossal, Ch.: On the convergence of the iterates of the fast iterative shrinkage thresholding algorithm. J. Optim. Theory Appl. 166, 968–982 (2015)
Haraux, A.: Systèmes dynamiques dissipatifs et applications, RMA 17, Masson, (1991)
Jendoubi, M.A., May, R.: Asymptotics for a second-order differential equation with nonautonomous damping and an integrable source term. Appl. Anal. 94(2), 436–444 (2015)
Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Program. 159(1–2), 81–107 (2016). Ser. A
May, R.: Asymptotic for a second order evolution equation with convex potential and vanishing damping term. Turk. J. Math. 41(3), 681–685 (2017)
Matet, S., Rosasco, L., Villa, S., Vu, B.C.: Don’t relax: early stopping for convex regularization. arXiv:1707.05422v1 [math.OC] (2017)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Sov. Math. Dokl. 27, 372–376 (1983)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, of Applied Optimization, vol. 87. Kluwer Academic Publishers, Boston (2004)
Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Am. Math. Soc. 73, 591–597 (1967)
Peypouquet, J.: Convex Otimization in Normed Spaces: Theory, Methods and Examples. With a Foreword by Hedy Attouch. Springer Briefs in Optimization, p. xiv+124. Springer, Cham (2015)
Peypouquet, J., Sorin, S.: Evolution equations for maximal monotone operators: asymptotic analysis in continuous and discrete time. J. Convex Anal. 17(3–4), 1113–1163 (2010)
Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Rockafellar, R.T.: Monotone operators associated with saddle-functions and mini-max problems, In: Nonlinear operators and nonlinear equations of evolution in Banach spaces 2. In: 18th Proceedings of Symposia in Pure Mathematics, F.E. Browder Ed., American Mathematical Society, pp. 241–250 (1976)
Rockafellar, R.T.: Augmented lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1, 97–116 (1976)
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis, Grundlehren der mathematischen Wissenschafte, vol. 317. Springer, Berlin (1998)
Schmidt, M., Le Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in Neural Information Processing Systems (NIPS), (2011)
Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Neural Inf. Process. Syst. 27, 2510–2518 (2014)
Villa, S., Salzo, S., Baldassarres, L., Verri, A.: Accelerated and inexact forward–backward. SIAM J. Optim. 23(3), 1607–1633 (2013)
Acknowledgements
The authors thank P. Redont for his careful and constructive reading of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
H. Attouch: Effort sponsored by the Air Force Office of Scientific Research, Air Force Material Command, USAF, under Grant Number F49550-1 5-1-0500. J. Peypouquet: supported by Fondecyt Grant 1140829, Millenium Nucleus ICM/FIC RC130003 and Basal Project CMM Universidad de Chile.
Auxiliary results
Auxiliary results
1.1 Yosida regularization of an operator A
Given a maximally monotone operator A and \(\lambda >0\), the resolvent of A with index \(\lambda \) and the Yosida regularization of A with parameter \(\lambda \) are defined by
respectively. The operator \(J_{\lambda A}: {\mathcal {H}}\rightarrow {\mathcal {H}}\) is nonexpansive and eveywhere defined (indeed it is firmly non-expansive). Moreover, \(A_{\lambda }\) is \(\lambda \)-cocoercive: for all \(x, y \in {\mathcal {H}}\) we have
This property immediately implies that \(A_{\lambda }: {\mathcal {H}}\rightarrow {\mathcal {H}}\) is \(\frac{1}{\lambda }\)-Lipschitz continuous. Another property that proves useful is the resolvent equation (see, for example, [16, Proposition 2.6] or [14, Proposition 23.6])
which is valid for any \(\lambda , \mu >0\). This property allows to compute simply the resolvent of \(A_\lambda \): for any \(\lambda , \mu >0\) by
Also note that for any \(x \in {\mathcal {H}}\), and any \(\lambda >0\)
Finally, for any \(\lambda >0\), A and \(A_{\lambda }\) have the same solution set \(S:=A_{\lambda }^{-1} (0) = A^{-1}(0)\). For a detailed presentation of the properties of the maximally monotone operators and the Yosida approximation, the reader can consult [14] or [16].
1.2 Existence and uniqueness of solution in the presence of a source term
Let us first establish the existence and uniqueness of the solution trajectory of the Cauchy problem associated to the continuous regularized dynamic (1) with a source term.
Lemma A.1
Take \(t_0>0\). Let us suppose that \(\lambda : [t_0, +\infty [ \rightarrow {\mathbb {R}}^+\) is a measurable function such that \(\lambda (t) \ge \underline{\lambda }\) for some \(\underline{\lambda }>0\). Suppose that \(f \in L^1 ([t_0, T], {\mathcal {H}})\) for all \(T \ge t_0\). Then, for any \(x_0 \in {\mathcal {H}}, \ v_0 \in {\mathcal {H}} \), there exists a unique strong global solution \(x: [t_0,+\infty [ \rightarrow {\mathcal {H}}\) of the Cauchy problem
Proof
The argument is standard, and consists in writing (85) as a first-order system in the phase space. By setting
the system can be written as
Using the \(\frac{1}{\lambda }\)-Lipschitz continuity property of \(A_{\lambda }\), one can easily verify that the conditions of the Cauchy–Lipschitz theorem are satisfied. Precisely, we can apply the non-autonomous version of this theorem given in [21, Proposition 6.2.1]. Thus, we obtain a strong solution, that is, \(t\mapsto \dot{x}(t)\) is locally absolutely continuous. If, moreover, we suppose that the functions \(\lambda (\cdot )\) and f are continuous, then the solution is a classical solution of class \({\mathcal {C}}^2\). \(\square \)
1.3 Opial’s Lemma
The following results are often referred to as Opial’s Lemma [28]. To our knowledge, it was first written in this form in Baillon’s thesis. See [30] for a proof.
Lemma A.2
Let S be a nonempty subset of \({\mathcal {H}}\) and let \(x: [0,+\infty [ \rightarrow {\mathcal {H}}\). Assume that
-
(i)
for every \(z\in S\), \(\lim _{t\rightarrow \infty }\Vert x(t)-z\Vert \) exists;
-
(ii)
every weak sequential limit point of x(t), as \(t\rightarrow \infty \), belongs to S.
Then x(t) converges weakly as \(t\rightarrow \infty \) to a point in S.
Its discrete version is
Lemma A.3
Let S be a non empty subset of \({\mathcal {H}}\), and \((x_k)\) a sequence of elements of \({\mathcal {H}}\). Assume that
-
(i)
for every \(z\in S\), \(\lim _{k\rightarrow +\infty }\Vert x_k-z\Vert \) exists;
-
(ii)
every weak sequential limit point of \((x_k)\), as \(k\rightarrow \infty \), belongs to S.
Then \(x_k\) converges weakly as \(k\rightarrow \infty \) to a point in S.
1.4 Variation of the function \(\gamma \mapsto \gamma A_{\gamma }x\)
Lemma A.4
Let \(\gamma , \delta >0\), and \(x, y\in {\mathcal {H}}\). Then, for each \(z\in S= A^{-1} (0)\), and all \(t \ge t_0\), we have
Proof
We use successively the definition of the Yosida approximation, the resolvent identity [14, Proposition 23.28 (i)], and the nonexpansive property of the resolvent, to obtain
Since \(J_{\gamma A}z =z\) for \(z\in S\), and using again the nonexpansive property of the resolvent, we deduce that
which gives the claim. \(\square \)
1.5 On integration and decay
Lemma A.5
Let \(w,\eta :[t_0,+\infty [\rightarrow [0,+\infty [\) be absolutely continuous functions such that \(\eta \notin L^1 (t_0, +\infty )\),
and \(|\dot{w}(t)| \le \eta (t)\) for almost every \(t>t_0\). Then, \(\lim _{t\rightarrow +\infty } w(t) =0\).
Proof
First, for almost every \(t>t_0\), we have
Therefore, \(|\frac{d}{dt} w^2|\) belongs to \(L^1\). This implies that \(\lim _{t\rightarrow +\infty } w^2(t) \) exists. Since w is nonnegative, it follows that \(\lim _{t\rightarrow +\infty } w(t) \) exists as well. But this limit is necessarily zero because \(\eta \notin L^1\). \(\square \)
1.6 On boundedness and anchoring
Lemma A.6
Let \(t_0>0\), and let \(w: [t_0, +\infty [ \rightarrow \mathbb {R}\) be a continuously differentiable function which is bounded from below. Given a nonegative function \(\theta \), let us assume that
for some \(\alpha > 1\), almost every \(t>t_0\), and some nonnegative function \(k\in L^1 (t_0, +\infty )\). Then, the positive part \([\dot{w}]_+\) of \(\dot{w}\) belongs to \(L^1(t_0,+\infty )\), and \(\lim _{t\rightarrow +\infty }w(t)\) exists. Moreover, we have \(\int _{t_0}^{+\infty } \theta (t) dt < + \infty \).
Proof
Multiply (88) by \(t^{\alpha -1}\) to obtain
By integration, we obtain
Hence,
and so,
Applying Fubini’s Theorem, we deduce that
As a consequence,
This implies \(\lim _{t\rightarrow +\infty }w(t)\) exists. Back to (89), integrating from \(t_0\) to t, using Fubini’s Theorem again, and then letting t tend to \(+\infty \), we obtain
Hence \(\int _{t_0}^{\infty }\theta (s) ds < + \infty \). \(\square \)
1.7 A summability result for real sequences
Lemma A.7
Let \(\alpha >1\), and let \((h_k)\) be a sequence of real numbers which is bounded from below, and such that
for all \(k\ge 1\). Suppose that \((\omega _k)\), and \((\theta _k)\) are two sequences of nonnegative numbers, such that \(\sum _k k\theta _{k} <+\infty \). Then
Proof
Since \((\omega _k)\) is nonegative, we have
Setting \(b_k := [h_k - h_{k-1} ]_{+}\) the positive part of \(h_k - h_{k-1}\), we immediately infer that
for all \(k\ge 1\). Multiplying by k and rearranging the terms, we obtain
Summing for \(k=1,\dots , K\), and using the telescopic property, along with the fact that \(Kb_{K+1}\ge 0\), we deduce that
which gives
Let us now prove that \(\sum _{k \in \mathbb {N}} k\omega _k < +\infty \), which is the most delicate part of the proof. To this end, write \(\delta _k= h_{k} - h_{k-1}\), and \(\alpha _k =\left( 1- \frac{\alpha }{k}\right) \), so that (90) becomes
An immediate recurrence (it can be easily seen by induction) shows that
with the convention \(\prod _{j=k+1}^k \alpha _j=1\). To simplify the notation, write \(A_{i}^k=\prod _{j=i}^k \alpha _j\). Sum the above inequality for \(k=1,\dots , K\) to deduce that
Now, using Fubini’s Theorem, we obtain
Simple computations (using integrals in the estimations) show that
and
(see also [4] for further details). Letting \(K\rightarrow +\infty \) in (92), we deduce that
for appropriate constants C and D. \(\square \)
1.8 A discrete Gronwall lemma
Lemma A.8
Let \(c\ge 0\) and let \((a_k)\) and \((\beta _j )\) be nonnegative sequences such that \((\beta _j )\) is summable and
for all \(k\in \mathbb {N}\). Then, \(\displaystyle a_k \le c + \sum _{j=1}^{\infty } \beta _j\) for all \(k\in \mathbb {N}\).
Proof
For \(k\in \mathbb {N}\), set \(A_k := \max _{1\le m \le k} a_m \). Then, for \(1\le m\le k\), we have
Taking the maximum over \(1\le m\le k\), we obtain
Bounding by the roots of the corresponding quadratic equation, we obtain the result. \(\square \)
Rights and permissions
About this article
Cite this article
Attouch, H., Peypouquet, J. Convergence of inertial dynamics and proximal algorithms governed by maximally monotone operators. Math. Program. 174, 391–432 (2019). https://doi.org/10.1007/s10107-018-1252-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-018-1252-x
Keywords
- Asymptotic stabilization
- Large step proximal method
- Damped inertial dynamics
- Lyapunov analysis
- Maximally monotone operators
- Time-dependent viscosity
- Vanishing viscosity
- Yosida regularization