Smoothing Alternating Direction Methods for Fully Nonsmooth Constrained Convex Optimization

Tran-Dinh, Quoc; Cevher, Volkan

doi:10.1007/978-3-319-97478-1_4

Quoc Tran-Dinh¹⁴ &
Volkan Cevher¹⁵

Part of the book series: Lecture Notes in Mathematics ((LNM,volume 2227))

2183 Accesses

Abstract

We propose two new alternating direction methods to solve “fully” nonsmooth constrained convex problems. Our algorithms have the best known worst-case iteration-complexity guarantee under mild assumptions for both the objective residual and feasibility gap. Through theoretical analysis, we show how to update all the algorithmic parameters automatically with clear impact on the convergence performance. We also provide a representative numerical example showing the advantages of our methods over the classical alternating direction methods using a well-known feasibility problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

A. Alotaibi, P.L. Combettes, N. Shahzad, Best approximation from the Kuhn-Tucker set of composite monotone inclusions. Numer. Funct. Anal. Optim. 36(12), 1513–1532 (2015)
Article MathSciNet Google Scholar
H.H. Bauschke, P.L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces (Springer, Berlin, 2011)
Book Google Scholar
A. Beck, M. Teboulle, A fast dual proximal gradient algorithm for convex minimization and applications. Oper. Res. Lett. 42(1), 1–6 (2014)
Article MathSciNet Google Scholar
J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for non-convex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Article MathSciNet Google Scholar
S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article Google Scholar
R.S. Burachik, V. Martín-Márquez, An approach for the convex feasibility problem via monotropic programming. J. Math. Anal. Appl. 453(2), 746–760 (2017)
Article MathSciNet Google Scholar
X. Cai, D. Han, X. Yuan, On the convergence of the direct extension of ADMM for three-block separable convex minimization models with one strongly convex function. Comput. Optim. Appl. 66(1), 39–73 (2017)
Article MathSciNet Google Scholar
E. Candès, B. Recht, Exact matrix completion via convex optimization. Commun. ACM 55(6), 111–119 (2012)
Article Google Scholar
V. Cevher, S. Becker, M. Schmidt, Convex optimization for big data: scalable, randomized, and parallel algorithms for big data analytics. IEEE Signal Process. Mag. 31(5), 32–43 (2014)
Article Google Scholar
A. Chambolle, T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Article MathSciNet Google Scholar
D. Davis, W. Yin, Convergence rate analysis of several splitting schemes, in Splitting Methods in Communication, Imaging, Science, and Engineering (Springer, Cham, 2016), pp. 115–163
Book Google Scholar
D. Davis, W. Yin, Faster convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions. Math. Oper. Res. 42(3), 783–805 (2017)
Article MathSciNet Google Scholar
D. Davis, W. Yin, A three-operator splitting scheme and its optimization applications. Tech. Report. (2015)
Google Scholar
W. Deng, W. Yin, On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)
Article MathSciNet Google Scholar
J. Eckstein, D. Bertsekas, On the Douglas Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
Article MathSciNet Google Scholar
E. Ghadimi, A. Teixeira, I. Shames, M. Johansson, Optimal parameter selection for the alternating direction method of multipliers (ADMM): quadratic problems. IEEE Trans. Autom. Control 60(3), 644–658 (2015)
Article MathSciNet Google Scholar
T. Goldstein, B. ODonoghue, S. Setzer, Fast alternating direction optimization methods. SIAM J. Imaging Sci. 7(3), 1588–1623 (2012)
Article MathSciNet Google Scholar
B. He, X. Yuan, On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers. Numer. Math. 130(3), 567–577 (2012)
Article MathSciNet Google Scholar
B. He, X. Yuan, On the O(1∕n) convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)
Article MathSciNet Google Scholar
T. Lin, S. Ma, S. Zhang, On the global linear convergence of the ADMM with multi- block variables. SIAM J. Optim. 25(3), 1478–1497 (2015)
Article MathSciNet Google Scholar
T. Lin, S. Ma, S. Zhang, Iteration complexity analysis of multi-block ADMM for a family of convex minimization without strong convexity. J. Sci. Comput. 69(1), 52–81 (2016)
Article MathSciNet Google Scholar
T. Lin, S. Ma, S. Zhang, An extragradient-based alternating direction method for convex minimization. Found. Comput. Math. 17(1), 35–59 (2017)
Article MathSciNet Google Scholar
I. Necoara, J. Suykens, Applications of a smoothing technique to decomposition in convex optimization. IEEE Trans. Autom. Control 53(11), 2674–2679 (2008)
Article MathSciNet Google Scholar
A. Nemirovskii, Prox-method with rate of convergence O(1∕t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
Article MathSciNet Google Scholar
A. Nemirovskii, D. Yudin, Problem Complexity and Method Efficiency in Optimization (Wiley Interscience, New York, 1983)
Google Scholar
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization, vol. 87 (Kluwer Academic Publishers, Norwell, 2004)
Google Scholar
Y. Nesterov, Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Article MathSciNet Google Scholar
Y. Ouyang, Y. Chen, G. Lan, E.J. Pasiliao, An accelerated linearized alternating direction method of multiplier. SIAM J. Imaging Sci. 8(1), 644–681 (2015)
Article MathSciNet Google Scholar
N. Parikh, S. Boyd, Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2013)
Google Scholar
R.T. Rockafellar, Convex Analysis. Princeton Mathematics Series, vol. 28 (Princeton University Press, Princeton, 1970)
Google Scholar
R. Shefi, M. Teboulle, Rate of convergence analysis of decomposition methods based on the proximal method of multipliers for convex minimization. SIAM J. Optim. 24(1), 269–297 (2014)
Article MathSciNet Google Scholar
R. Shefi, M. Teboulle, On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems. EURO J. Comput. Optim. 4(1), 27–46 (2016)
Article MathSciNet Google Scholar
F. Simon, R. Holger, A Mathematical Introduction to Compressive Sensing (Springer, New York, 2013)
MATH Google Scholar
M. Tao, X. Yuan, On the O(1∕t)-convergence rate of alternating direction method with logarithmic-quadratic proximal regularization. SIAM J. Optim. 22(4), 1431–1448 (2012)
Article MathSciNet Google Scholar
Q. Tran-Dinh, V. Cevher, Constrained convex minimization via model-based excessive gap, in Proceedings of the Neural Information Processing Systems (NIPS), Montreal, vol. 27 Dec. 2014, pp. 721–729
Google Scholar
Q. Tran-Dinh, O. Fercoq, V. Cevher, A smooth primal-dual optimization framework for nonsmooth composite convex minimization. SIAM J. Optim. 28, 96–134 (2018)
Article MathSciNet Google Scholar
P. Tseng, D. Bertsekas, Relaxation methods for problems with strictly convex cost and linear constraints. Math. Oper. Res. 16(3), 462–481 (1991)
Article MathSciNet Google Scholar
W. Wang, A. Banerjee, Bregman alternating direction method of multipliers, in Advances in Neural Information Processing Systems 27 (NIPS 2014), pp. 1–9
Google Scholar
E. Wei, A. Ozdaglar, On the O(1∕k)-convergence of asynchronous distributed alternating direction method of multipliers, in Global Conference on Signal and Information Processing (GlobalSIP) (IEEE, Piscataway, 2013), pp. 551–554
Google Scholar

Download references

Acknowledgements

QTD’s work was supported in part by the NSF-grant No. DMS-1619884, USA. VC’s work was supported by European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no 725594—time-data). The authors would like to acknowledge Dr. C.B., Vu, and Dr. V.Q. Nguyen with their help on verifying the technical proofs and the numerical experiment. The authors also thank Mr. Ahmet Alacaoglu, Mr. Nhan Pham, and Ms. Yuzixuan Zhu for their careful proofreading.

Author information

Authors and Affiliations

Department of Statistics and Operations Research, The University of North Carolina at Chapel Hill (UNC), Chapel Hill, NC, USA
Quoc Tran-Dinh
Laboratory for Information and Inference Systems (LIONS), École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Volkan Cevher

Authors

Quoc Tran-Dinh
View author publications
You can also search for this author in PubMed Google Scholar
Volkan Cevher
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quoc Tran-Dinh .

Editor information

Editors and Affiliations

Department of Automatic Control, Lund University, Lund, Sweden
Pontus Giselsson
Department of Automatic Control, Lund University, Lund, Sweden
Anders Rantzer

Appendix: Proofs of Technical Results

This appendix provides full proofs of technical results presented in the main text.

4.1.1 Proof of Lemma 2: The Primal-Dual Bounds

First, using the fact that $-d(\lambda ) \leq -d^{\star } = f^{\star } \leq \mathcal {L}(x, \lambda ^{\star }) = f(x) + \langle \lambda ^{\star }, Au + Bv - c\rangle \leq f(x) + \Vert \lambda ^{\star }\Vert \Vert Au + Bv - c\Vert $, we get

$$\displaystyle \begin{aligned} -\Vert \lambda^{\star}\Vert\Vert Au + Bv - c\Vert \leq f(x) - f^{\star} \leq f(x) + d(\lambda), \end{aligned} $$

(4.47)

which is exactly the lower bound (4.14).

Next, since A ^⊤λ ^⋆ ∈ ∂g(u ^⋆) due to (4.8), by Fenchel-Young’s inequality, we have g(u ^⋆) + g ^∗(A ^⊤λ ^⋆) = 〈A ^⊤λ ^⋆, u ^⋆〉, which implies g ^∗(A ^⊤λ ^⋆) = 〈A ^⊤λ ^⋆, u ^⋆〉− g(u ^⋆). Using this relation and the definition of φ _γ, we have

$$\displaystyle \begin{aligned} \varphi_{\gamma}(\lambda) &:= \max\left\{\langle A^{\top}\lambda,u\rangle - g(u) - \gamma b_{\mathcal{U}}(u,\bar{u}^c)\right\} \geq \langle A^{\top}\lambda,u^{\star}\rangle - g(u^{\star}) - \gamma b_{\mathcal{U}}(u^{\star},\bar{u}^c)\\ &=\langle A^{\top}\lambda^{\star},u^{\star} \rangle - g(u^{\star}) + \langle A^{\top}(\lambda - \lambda^{\star}),u^{\star}\rangle - \gamma b_{\mathcal{U}}(u^{\star},\bar{u}^c)\\ &= g^{\ast}(A^{\top}\lambda^{\star}) + \langle A^{\top}(\lambda - \lambda^{\star}), u^{\star}\rangle - \gamma b_{\mathcal{U}}(u^{\star},\bar{u}^c)\\ &= \varphi(\lambda^{\star}) + \langle \lambda - \lambda^{\star},Au^{\star}\rangle - \gamma b_{\mathcal{U}}(u^{\star},\bar{u}^c). \end{aligned} $$

Alternatively, we have ψ(λ) ≥ ψ(λ ^⋆) + 〈∇ψ(λ ^⋆), λ − λ ^⋆〉, where ∇ψ(λ ^⋆) = B∇h ^∗(B ^⊤λ ^⋆) − c = Bv ^⋆ − c due to the last relation in (4.8), where ∇h ^∗(B ^⊤λ ^⋆) ∈ ∂h ^∗(B ^⊤λ ^⋆) is one subgradient of ∂h ^∗. Hence, ψ(λ) ≥ ψ(λ ^⋆) + 〈λ − λ ^⋆, Bv ^⋆ − c〉. Adding this inequality to the last estimation with the fact that d _γ = φ _γ + ψ and d = φ + ψ, we obtain

$$\displaystyle \begin{aligned} \hspace{-6pt}d_{\gamma}(\lambda) \geq d(\lambda^{\star}) + \langle \lambda - \lambda^{\star}, Au^{\star} + Bv^{\star} - c\rangle - \gamma b_{\mathcal{U}}(u^{\star},\bar{u}^c) \overset{(4.8)}{=} d^{\star} - \gamma b_{\mathcal{U}}(u^{\star},\bar{u}^c) \end{aligned} $$

(4.48)

Using this inequality with d ^⋆ = −f ^⋆ and the definition (4.13) of f _β we have

$$\displaystyle \begin{aligned} &f(x) - f^{\star} \overset{(4.13)+(4.48)}{\leq} f_{\beta}(x) + d_{\gamma}(\lambda) + \gamma b_{\mathcal{U}}(u^{\star},\bar{u}^c) - \frac{1}{2\beta}\Vert Au + Bv - c\Vert^2 \\ &\quad = G_{\gamma\beta}(w) + \gamma b_{\mathcal{U}}(u^{\star},\bar{u}^c) - \frac{1}{2\beta}\Vert Au + Bv - c\Vert^2. \end{aligned} $$

(4.49)

Let $S := G_{\gamma \beta }(w) + \gamma b_{\mathcal {U}}(u^{\star },\bar {u}^c)$. Then, by dropping the last term $- \frac {1}{2\beta }\Vert Au + Bv - c\Vert ^2$ in (4.49), we obtain the first inequality of (4.15).

Let t := ∥Au + Bv − c∥. Using again (4.47) and (4.49), we can see that $\frac {1}{2\beta }t^2 - \Vert \lambda ^{\star }\Vert t - S \leq 0$. Solving this quadratic inequation w.r.t. t and noting that t ≥ 0, we obtain the second bound of (4.15). The last estimate of (4.15) is a direct consequence of (4.49), the first one of (4.15). Finally, from (4.47), we have f(x) ≥ f ^⋆ −∥λ ^⋆∥∥Au + Bv − c∥. Substituting this into (4.49) we get $d(\lambda ) - d^{\star } - \Vert \lambda ^{\star }\Vert \Vert Au + Bv - c\Vert \leq S - \frac {1}{2\beta }\Vert Au + Bv - c\Vert ^2$, which implies

$$\displaystyle \begin{aligned} d(\lambda) - d^{\star} \leq S - (1/(2\beta))\Vert Au + Bv - c\Vert^2 + \Vert \lambda^{\star}\Vert\Vert Au + Bv - c\Vert. \end{aligned}$$

By discarding − (1∕(2β))∥Au + Bv − c∥² and using the second estimate of (4.15) into the last estimate, we obtain the last inequality of (4.15). $\square $

4.1.2 Convergence Analysis of Algorithm 1

We provide a full proof of Lemmas and Theorems related to the convergence of Algorithm 1. First, we prove the following key lemma, which will be used to prove Lemma 3.

Lemma 8

Let $\bar {\lambda }^{k+1}$ be generated by (SAMA). Then

(4.50)

where

$$\displaystyle \begin{aligned} \begin{array}{ll} \hat{\ell}_{\gamma_{k+1}}(\lambda) &:= \varphi_{\gamma_{k+1}}(\hat{\lambda}^k) + \langle \nabla{\varphi_{\gamma_{k+1}}}(\hat{\lambda}^k), \lambda - \hat{\lambda}^k\rangle + \psi(\lambda) \vspace{1ex}\\ & \leq d_{\gamma_{k+1}}(\lambda) - \frac{\gamma_{k+1}}{2}\Vert u^{\ast}_{\gamma_{k+1}}(A^{\top}\lambda) - \hat{u}^{k+1} \Vert^2. \end{array} \end{aligned} $$

(4.51)

In addition, for any z, γ _k, γ _k+1 > 0, the function $g_{\gamma }^{\ast }$ defined by (4.11) satisfies

$$\displaystyle \begin{aligned} g^{\ast}_{\gamma_{k+1}}(z) \leq g^{\ast}_{\gamma_k}(z) + (\gamma_k - \gamma_{k+1})b_{\mathcal{U}}(u^{\ast}_{\gamma_{k+1}}(z), \bar{u}^c). \end{aligned} $$

(4.52)

Proof

First, it is well-known that SAMA is equivalent to the proximal-gradient step applying to the smoothed dual problem

$$\displaystyle \begin{aligned} \min_{\lambda}\left\{ \varphi_{\gamma_{k+1}}(\lambda) + \psi(\lambda) : \lambda\in\mathbb{R}^n\right\}. \end{aligned}$$

This proximal-gradient step can be presented as

$$\displaystyle \begin{aligned} \bar{\lambda}^{k+1} := \mathrm{prox}_{\eta_k\psi}\left(\hat{\lambda}^k - \eta_k\nabla{\varphi_{\gamma_{k+1}}}(\hat{\lambda}^k)\right). \end{aligned}$$

We write down the optimality condition of this corresponding minimization problem of this step as

$$\displaystyle \begin{aligned} 0 \in \partial{\psi}(\bar{\lambda}^{k+1}) + \nabla{\varphi_{\gamma_{k+1}}}(\hat{\lambda}^k) + \eta_k^{-1}(\bar{\lambda}^{k+1} - \hat{\lambda}^k). \end{aligned}$$

Using this condition and the convexity of ψ, for any $\nabla {\psi }(\bar {\lambda }^{k+1})\in \partial {\psi }(\bar {\lambda }^{k+1})$, we have

$$\displaystyle \begin{aligned} \psi(\bar{\lambda}^{k+1}) &\leq \psi(\lambda) + \langle \nabla{\psi}(\bar{\lambda}^{k+1}),\bar{\lambda}^{k+1} - \lambda\rangle \\ &= \psi(\lambda) + \langle \nabla{\varphi_{\gamma_{k+1}}}(\hat{\lambda}^k),\lambda - \bar{\lambda}^{k+1}\rangle + \eta_k^{-1}\langle \bar{\lambda}^{k+1} - \hat{\lambda}^k, \lambda - \bar{\lambda}^{k+1}\rangle. \end{aligned} $$

(4.53)

Next, by the definition $\varphi _{\gamma }(\lambda ) := g^{\ast }_{\gamma }(A^{\top }\lambda )$, we can show from (4.11) that $\hat {u}^{k+1} = u^{\ast }_{\gamma _{k+1}}(A^{\top }\hat {\lambda }^k)$. Since $g^{\ast }_{\gamma }$ is (1∕γ)-Lipschitz gradient continuous, we have

$$\displaystyle \begin{aligned} \frac{\gamma}{2}\Vert \nabla{g}^{\ast}_{\gamma}(z) - \nabla{g}^{\ast}_{\gamma}(\hat{z})\Vert^2 \leq g^{\ast}_{\gamma}(z) - g^{\ast}_{\gamma}(\hat{z}) - \langle \nabla{g}^{\ast}_{\gamma}(\hat{z}), z - \hat{z}\rangle \leq \frac{1}{2\gamma}\Vert z - \hat{z}\Vert^2. \end{aligned}$$

Using this inequality with γ := γ _k+1, $\nabla {g^{\ast }_{\gamma _{k+1}}}(A^{\top }\lambda ) = u^{\ast }_{\gamma _{k+1}}(A^{\top }\lambda )$, $\nabla {g^{\ast }_{\gamma _{k+1}}}(A^{\top }\hat {\lambda }^k) = u^{\ast }_{\gamma _{k+1}}(A^{\top }\hat {\lambda }^k) = \hat {u}^{k+1}$, and $\nabla {\varphi _{\gamma _{k+1}}}(\lambda ) = A\nabla {g^{\ast }_{\gamma _{k+1}}}(A^{\top }\lambda )$, we have

$$\displaystyle \begin{aligned} \begin{array}{ll} \frac{\gamma_{k+1}}{2}\Vert u^{\ast}_{\gamma_{k+1}}(A^{\top}\lambda) - \hat{u}^{k+1}\Vert^2 &\leq \varphi_{\gamma_{k+1}}(\lambda) - \varphi_{\gamma_{k+1}}(\hat{\lambda}^k)- \langle \nabla{\varphi}_{\gamma_{k+1}}(\hat{\lambda}^k), \lambda - \hat{\lambda}^{k}\rangle \vspace{1ex}\\ & \leq \frac{1}{2\gamma_{k+1}}\Vert A^{\top}(\lambda - \hat{\lambda}^k)\Vert^2 \leq \frac{\Vert A\Vert^2}{2\gamma_{k+1}}\Vert \lambda - \hat{\lambda}^k\Vert^2. \end{array}\end{aligned} $$

(4.54)

Using (4.54) with $\lambda = \bar {\lambda }^{k+1}$, we have

$$\displaystyle \begin{aligned} \varphi_{\gamma_{k+1}}(\bar{\lambda}^{k+1}) \leq \varphi_{\gamma_{k+1}}(\hat{\lambda}^k) + \langle \nabla{\varphi}_{\gamma_{k+1}}(\hat{\lambda}^k),\bar{\lambda}^{k+1} - \hat{\lambda}^k\rangle + \frac{\Vert A\Vert^2}{2\gamma_{k+1}}\Vert \bar{\lambda}^{k+1} - \hat{\lambda}^k\Vert^2. \end{aligned} $$

Summing up this inequality and (4.53), then using the definition of $\hat {\ell }_{\gamma _{k+1}}(\lambda )$ in (4.51), we obtain

$$\displaystyle \begin{aligned} d_{\gamma_{k+1}}(\bar{\lambda}^{k+1}) \leq \hat{\ell}_{\gamma_{k+1}}(\lambda) + \tfrac{1}{\eta_k}\langle \bar{\lambda}^{k+1} - \hat{\lambda}^k, \lambda - \hat{\lambda}^k\rangle - \left(\tfrac{1}{\eta_k} - \tfrac{\Vert A\Vert^2}{2\gamma_{k+1}}\right)\Vert \bar{\lambda}^{k+1} - \hat{\lambda}^k\Vert^2. \end{aligned} $$

(4.55)

Here, the second inequality in (4.51) follows from the right-hand side of (4.54).

Now, using (4.55) with $\lambda := \bar {\lambda }^k$, then combining with (4.51), we get

$$\displaystyle \begin{aligned} \begin{array}{ll} d_{\gamma_{k+1}}(\bar{\lambda}^{k+1}) &\leq d_{\gamma_{k+1}}(\bar{\lambda}^k) + \frac{1}{\eta_k}\langle \bar{\lambda}^{k+1} - \hat{\lambda}^k, \bar{\lambda}^k - \hat{\lambda}^k\rangle - \left(\frac{1}{\eta_k} - \frac{\Vert A\Vert^2}{2\gamma_{k+1}}\right)\Vert \bar{\lambda}^{k+1} - \hat{\lambda}^k\Vert^2 \vspace{1ex}\\ &\quad - \frac{\gamma_{k+1}}{2}\Vert u^{\ast}_{\gamma_{k+1}}(A^{\top}\bar{\lambda}^k) - \hat{u}^{k+1} \Vert^2. \end{array} \end{aligned} $$

Multiplying the last inequality by 1 − τ _k ∈ [0, 1] and (4.55) by τ _k ∈ [0, 1], then summing up the results, we obtain (4.50).

Finally, from (4.11), since $g^{\ast }_{\gamma }(z) := \max _{u}\{P(u, \gamma ; z) := \langle z, u\rangle - g(u) - \gamma b_{\mathcal {U}}(u;\bar {u}^c)\}$, is the maximization of P over u indexing in γ and z, which is concave in u and linear in γ, we have $g^{\ast }_{\gamma }(z)$ is convex w.r.t. γ > 0. Moreover, $\frac {d g^{\ast }_{\gamma }(z)}{d\gamma } = -b_{\mathcal {U}}(u^{\ast }_{\gamma }(z), \bar {u}^c)$. Hence, using the convexity of $g^{\ast }_{\gamma }$ w.r.t. γ > 0, we have $g^{\ast }_{\gamma _k}(z) \geq g^{\ast }_{\gamma _{k+1}}(z) - (\gamma _k - \gamma _{k+1})b_{\mathcal {U}}(u^{\ast }_{\gamma }(z), \bar {u}^c)$, which is indeed (4.52). □

4.1.2.1 Proof of Lemma 4: Bound on G _γβ for the First Iteration

Since $\bar {w}^1 := (\bar {u}^1, \bar {v}^1, \bar {\lambda }^1)$ is updated by (4.19), similar to (SAMA), we can use (4.55) with k = 0, $\lambda := \hat {\lambda }^0$ and $\hat {\ell }_{\gamma _1}(\hat {\lambda }^0) \leq d_{\gamma _1}(\hat {\lambda }^0)$ to obtain

$$\displaystyle \begin{aligned} d_{\gamma_1}(\bar{\lambda}^1) \leq d_{\gamma_1}(\hat{\lambda}^0) - \left(\frac{1}{\eta_0} - \frac{\Vert A\Vert^2}{2\gamma_1}\right)\Vert \bar{\lambda}^1 - \hat{\lambda}^0\Vert^2. \end{aligned} $$

(4.56)

Since $\bar {v}^1$ solves the second problem in (4.19) and $v^{\ast }(\hat {\lambda }^0) \in \mathrm {dom}\left (h\right )$, we have

$$\displaystyle \begin{aligned} \begin{array}{ll} &h(v^{\ast}(\hat{\lambda}^0)) - \langle \hat{\lambda}^0,Bv^{\ast}(\hat{\lambda}^0)\rangle + \frac{\eta_0}{2}\Vert A\bar{u}^1 + Bv^{\ast}(\hat{\lambda}^0) - c\Vert^2 \geq h(\bar{v}^1) \vspace{1ex}\\ &\quad - \langle \hat{\lambda}^0,B\bar{v}^1\rangle + \frac{\eta_0}{2}\Vert A\bar{u}^1 + B\bar{v}^1 - c\Vert^2 + \frac{\eta_0}{2}\Vert B(v^{\ast}(\hat{\lambda}^0) - \bar{v}^1)\Vert^2. \end{array} \end{aligned} $$

Using D _f in (4.9), this inequality implies

(4.57)

Using the definition of d _γ, we further estimate (4.56) using (4.57) as follows:

Since $G_{\gamma _1\beta _1}(\bar {w}^1) = f_{\beta _1}(\bar {x}^1) + d_{\gamma _1}(\bar {\lambda }^1)$, we obtain (4.20) from the last inequality. If $\beta _1 \geq \frac {2\gamma _1}{\eta _0(5\gamma _1 - 2\Vert A\Vert ^2\eta _0)}$, then (4.20) leads to $G_{\gamma _1\beta _1}(\bar {w}^1) \leq \frac {\eta _0}{4}D_f^2 + \frac {1}{\eta _0}\langle \hat {\lambda }^0, \bar {\lambda }^1 - \hat {\lambda }^0\rangle $. $\square $

4.1.2.2 Proof of Lemma 3: Gap Reduction Condition

For notational simplicity, we first define the following abbreviations

$$\displaystyle \begin{aligned} \left\{\begin{array}{ll} \bar{z}^k &:= A\bar{u}^k + B\bar{v}^k - c \vspace{0.5ex}\\ \hat{z}^{k+1} &:= A\hat{u}^{k+1} + B\hat{v}^{k+1} - c \vspace{0.5ex}\\ \bar{u}_{k+1}^{*} &:= u^{*}_{\gamma_{k+1}}(A^{\top}\bar{\lambda}^k)~~\text{the solution of (4.11) at }\bar{\lambda}^k, \vspace{0.5ex}\\ \hat{v}^{*}_k &:= v^{*}(\hat{\lambda}^k) \in\partial{h^{\ast}}(A^{\top}\hat{\lambda}^k) ~~\text{a subgradient of {$h^{\ast}$} defined by (4.5) at }A^{\top}\hat{\lambda}^k,\text{ and}\vspace{0.5ex}\\ D_k &:= \Vert A\hat{u}^{k+1} + B(2\hat{v}^{*}_k - \hat{v}^{k+1}) - c\Vert. \end{array}\right. \end{aligned}$$

From SAMA, we have $\bar {\lambda }^{k+1} - \hat {\lambda }^k = \eta _k(c - A\hat {u}^{k+1} - B\hat {v}^{k+1}) = -\eta _k\hat {z}^{k+1}$. In addition, by (4.16), we have $\hat {\lambda }^k = (1-\tau _k)\bar {\lambda }^k + \tau _k\lambda _k^{*}$, which leads to $(1-\tau _k)\bar {\lambda }^k + \tau _k\hat {\lambda }^k - \hat {\lambda }^k = \tau _k(\hat {\lambda }^k - \lambda _k^{*})$. Using these expressions into (4.50) with $\lambda := \hat {\lambda }^k$, and then using (4.51) with $\hat {\ell }_{\gamma _{k+1}}(\hat {\lambda }^k) \leq d_{\gamma _{k+1}}(\hat {\lambda }^k)$, we obtain

$$\displaystyle \begin{aligned} \begin{array}{ll} d_{\gamma_{k+1}}(\bar{\lambda}^{k+1}) &\leq (1-\tau_k)d_{\gamma_{k+1}}(\bar{\lambda}^k) + \tau_kd_{\gamma_{k+1}}(\hat{\lambda}^k) + \tau_k\langle \hat{z}^{k+1}, \lambda_k^{*} - \hat{\lambda}^k\rangle \vspace{1ex}\\ &\quad - \eta_k\left(1 - \frac{\eta_k\Vert A\Vert^2}{2\gamma_{k+1}}\right)\Vert\hat{z}^{k+1}\Vert^2 - (1-\tau_k)\frac{\gamma_{k+1}}{2}\Vert \bar{u}^{\ast}_{k+1} - \hat{u}^{k+1}\Vert^2. \end{array} \end{aligned} $$

(4.58)

By (4.52) with the fact that $\varphi _{\gamma }(\lambda ) := g^{\ast }_{\gamma }(A^{\top }\lambda )$, for any γ _k+1 > 0 and γ _k > 0, we have

$$\displaystyle \begin{aligned} \varphi_{\gamma_{k+1}}(\bar{\lambda}^k) \leq \varphi_{\gamma_k}(\bar{\lambda}^k) + (\gamma_k - \gamma_{k+1})b_{\mathcal{U}}(\bar{u}_{k+1}^{\ast}, \bar{u}_c). \end{aligned}$$

Using this inequality and the fact that d _γ := φ _γ + ψ, we have

$$\displaystyle \begin{aligned} d_{\gamma_{k+1}}(\bar{\lambda}^k) \leq d_{\gamma_k}(\bar{\lambda}^k) + (\gamma_k - \gamma_{k+1})b_{\mathcal{U}}(\bar{u}_{k+1}^{\ast}, \bar{u}_c). \end{aligned} $$

(4.59)

Next, using $\hat {v}^{k+1}$ from SAMA and its optimality condition, we can show that

$$\displaystyle \begin{aligned}\begin{array}{ll} &h^{\ast}(B^{\top}\hat{\lambda}^k) - \frac{\eta_k}{2}\Vert A\hat{u}^{k+1} + B\hat{v}^{*}_k - c\Vert^2 = \langle B^{\top}\hat{\lambda}^k, \hat{v}^{*}_k\rangle - h(\hat{v}^{*}_k) - \frac{\eta_k}{2}\Vert A\hat{u}^{k+1} + B\hat{v}^{*}_k - c\Vert^2 \vspace{1ex}\\ &\quad \leq \langle B^{\top}\hat{\lambda}^k, \hat{v}^{k+1}\rangle - h(\hat{v}^{k+1}) - \frac{\eta_k}{2}\Vert A\hat{u}^{k+1} + B\hat{v}^{k+1} - c\Vert^2 - \frac{\eta_k}{2}\Vert B(\hat{v}^{*}_k - \hat{v}^{k+1})\Vert^2. \end{array}\end{aligned} $$

Since ψ(λ) := h ^∗(B ^⊤λ) − c ^⊤λ, this inequality leads to

Now, by this estimate, $d_{\gamma _{k+1}} = \varphi _{\gamma _{k+1}} + \psi $ and SAMA, we can derive

Combining this inequality, (4.58) and (4.59), we obtain

(4.60)

Now, using the definition G _k, we have

$$\displaystyle \begin{aligned} \begin{array}{ll} G_k(\bar{w}^k) &:= f_{\beta_k}(\bar{x}^k) + d_{\gamma_k}(\bar{\lambda}^k) = f(\bar{x}^k) + d_{\gamma_k}(\bar{\lambda}^k) + \frac{1}{2\beta_k}\Vert A\bar{u}^k + B\bar{v}^k - c\Vert^2 \vspace{0.5ex}\\ & = f(\bar{x}^k) + d_{\gamma_k}(\bar{\lambda}^k) + \frac{1}{2\beta_k}\Vert\bar{z}^k\Vert^2. \end{array} \end{aligned}$$

Let us define $\varDelta {G}_k := (1-\tau _k)G_k(\bar {w}^k) - G_{k+1}(\bar {w}^{k+1})$. Then, we can show that

$$\displaystyle \begin{aligned} \begin{array}{ll} \varDelta{G}_k &= (1-\tau_k)f(\bar{x}^k) + (1-\tau_k)d_{\gamma_k}(\bar{\lambda}^k) - f(\bar{x}^{k+1}) - d_{\gamma_{k+1}}(\bar{\lambda}^{k+1}) \vspace{1ex}\\ &\quad + \frac{(1-\tau_k)}{2\beta_k}\Vert \bar{z}^k\Vert^2 - \frac{1}{2\beta_{k+1}}\Vert\bar{z}^{k+1}\Vert^2. \end{array} \end{aligned} $$

(4.61)

By (4.16), we have $\bar {z}^{k+1} = (1-\tau _k)\bar {z}^k + \tau _k\hat {z}^{k+1}$. Using this expression and the condition β _k+1 ≥ (1 − τ _k)β _k in (4.17), we can easily show that

$$\displaystyle \begin{aligned} \frac{(1 - \tau_k)}{2\beta_k}\Vert\bar{z}^k\Vert^2 - \frac{1}{2\beta_{k+1}}\Vert\bar{z}^{k+1}\Vert^2 \geq - \frac{\tau_k}{\beta_{k}}\langle \hat{z}^{k+1}, \bar{z}^k\rangle - \frac{\tau_k^2}{2\beta_{k}(1-\tau_k)}\Vert\hat{z}^{k+1}\Vert^2. \end{aligned}$$

Substituting this inequality into (4.61), and using the convexity of f, we further get

(4.62)

Substituting (4.60) into (4.62) and using $\lambda ^{*}_k := \frac {1}{\beta _k}(c - A\bar {u}^k - B\bar {v}^k) = -\frac {1}{\beta _k}\bar {z}^k$, we obtain

$$\displaystyle \begin{aligned} \varDelta{G}_k \geq \Big[ \eta_k\Big(1 + \frac{\tau_k}{2} - \frac{\Vert A\Vert^2\eta_k}{2\gamma_{k+1}}\Big) - \frac{\tau_k^2}{2(1-\tau_k)\beta_k}\Big]\Vert\hat{z}^{k+1}\Vert^2 + R_k - \frac{\tau_k\eta_k}{2}\Vert \hat{z}^{k+1}\Vert D_k. \end{aligned} $$

(4.63)

where

$$\displaystyle \begin{aligned} R_k := \tfrac{1-\tau_k}{2}\gamma_{k+1}\Vert \bar{u}^{\ast}_{k+1}-\hat{u}^{k+1}\Vert^2 + \tau_k\gamma_{k+1}b_{\mathcal{U}}(\hat{u}^{k+1}, \bar{u}^c) - (1-\tau_k)(\gamma_k - \gamma_{k+1})b_{\mathcal{U}}(\bar{u}^{\ast}_{k+1}, \bar{u}^c). \end{aligned}$$

Furthermore, we have

$$\displaystyle \begin{aligned} \frac{\eta_k}{4}\Vert \hat{z}^{k+1}\Vert^2 - \frac{\tau_k\eta_k}{2}\Vert \hat{z}^{k+1}\Vert D_k = \frac{\eta_k}{4}\big[\Vert z^{k+1}\Vert - \tau_kD_k\big]^2 - \frac{\eta_k\tau_k^2D_k^2}{4} \geq - \frac{\eta_k\tau_k^2D_k^2}{4}. \end{aligned}$$

Using this estimate into (4.63), we finally get

$$\displaystyle \begin{aligned} \varDelta{G}_k \geq \Big[ \eta_k\Big(\frac{3}{4} + \frac{\tau_k}{2} - \frac{\Vert A\Vert^2\eta_k}{2\gamma_{k+1}}\Big) - \frac{\tau_k^2}{2(1-\tau_k)\beta_k}\Big]\Vert\hat{z}^{k+1}\Vert^2 + R_k - \frac{\eta_k\tau_k^2D_k^2}{4}. \end{aligned} $$

(4.64)

Next step, we estimate R _k. Let $\bar {a}_k := \bar {u}^{*}_{k+1} - \bar {u}_c$, $\hat {a}_k := \hat {u}^{k+1} - \bar {u}_c$. Using the smoothness of $b_{\mathcal {U}}$, we can estimate R _k explicitly as

$$\displaystyle \begin{aligned} \begin{array}{ll} 2\gamma_{k+1}^{-1}R_k & \geq (1-\tau_k)\Vert \bar{a}_k - \hat{a}_k\Vert^2 - (1-\tau_k)(\gamma_{k+1}^{-1}\gamma_{k} - 1)L_b\Vert \bar{a}_k\Vert^2 + \tau_k\Vert \hat{a}_k\Vert^2\vspace{1ex}\\ & = \Vert\hat{a}^k - (1-\tau_k)\bar{a}_k\Vert^2 + (1-\tau_k)\left(\tau_k - (\gamma_{k+1}^{-1}\gamma_{k} - 1)L_b\right)\Vert\bar{a}_k\Vert^2. \end{array} \end{aligned} $$

(4.65)

By the condition $(1+L_b^{-1}\tau _k)\gamma _{k+1} \geq \gamma _k$ in (4.17), we have $\tau _k - (\gamma _{k+1}^{-1}\gamma _{k} - 1)L_b\geq 0$. Using this condition in (4.65), we obtain R _k ≥ 0. Finally, by (4.9) we can show that D _k ≤ D _f. Using this inequality, R _k ≥ 0, and the second condition of (4.17), we can show from (4.63) that $\varDelta {G}_k \geq -\frac {\eta _k\tau _k^2}{4}D_f^2$, which implies (4.18). $\square $

4.1.2.3 Proof of Lemma 5: Parameter Updates

The tightest update for γ _k and β _k is $\gamma _{k+1} := \frac {\gamma _k}{\tau _k+1}$ and β _k+1 := (1 − τ _k)β _k due to (4.17). Using these updates in the third condition in (4.17) leads to $\frac {(1-\tau _{k+1})^2}{(1+\tau _{k+1})\tau _{k+1}^2} \geq \frac {1-\tau _k}{\tau _k^2}$. By directly checking this condition, we can see that $\tau _k = \mathcal {O}(1/k)$ which is the optimal choice.

Clearly, if we choose $\tau _k := \frac {3}{k+4}$, then 0 < τ _k < 1 for k ≥ 0 and τ ₀ = 3∕4. Next, we choose $\gamma _{k+1} := \frac {\gamma _k}{1+\tau _k/3} \geq \frac {\gamma _k}{1+\tau _k}$. Substituting $\tau _k = \frac {3}{k+4}$ into this formula we have $\gamma _{k+1} = \left (\frac {k+4}{k+5}\right )\gamma _k$. By induction, we obtain $\gamma _{k+1} = \frac {5\gamma _1}{k+5}$. This implies $\eta _k = \frac {5\gamma _1}{2\Vert A\Vert ^2(k+5)}$. With $\tau _k = \frac {3}{k+4}$ and $\gamma _{k+1} = \frac {5\gamma _1}{k+5}$, we choose β _k from the third condition of (4.17) as $\beta _k = \frac {2\Vert A\Vert ^2\tau _k^2}{(1-\tau _k^2)\gamma _{k+1}} = \frac {18\Vert A\Vert ^2(k+5)}{5\gamma _1(k+1)(k+7)}$ for k ≥ 1. Using the value of τ _k and β _k, we need to check the second condition β _k+1 ≥ (1 − τ _k)β _k of (4.17). Indeed, this condition is equivalent to 2k ² + 28k + 88 ≥ 0, which is true for all k ≥ 0. From the update rule of β _k, it is obvious that $\beta _k \leq \frac {18\Vert A\Vert ^2}{5\gamma _1(k+1)}$. $\square $

4.1.2.4 Proof of Theorem 1: Convergence of Algorithm 1

We estimate the term $\tau _k^2\eta _k$ in (4.18) as

$$\displaystyle \begin{aligned} \tau_k^2\eta_k = \frac{45\gamma_1}{2\Vert A\Vert^2(k+4)^2(k+5)} < \frac{45\gamma_1}{2\Vert A\Vert^2(k+4)(k+5)} - \left(1 - \tau_k\right)\frac{45\gamma_1}{2\Vert A\Vert^2(k+3)(k+4)}. \end{aligned} $$

Combing this estimate and (4.18), we get

$$\displaystyle \begin{aligned} G_{k+1}(\bar{w}^{k+1}) - \frac{45\gamma_1D_f^2}{8\Vert A\Vert^2(k+4)(k+5)} \leq (1-\tau_k)\left[G_k(\bar{w}^k) - \frac{45\gamma_1D_f^2}{8\Vert A\Vert^2(k+3)(k+4)}\right]. \end{aligned}$$

By induction, we have $G_k(\bar {w}^k) - \frac {45\gamma _1D_f^2}{8\Vert A\Vert ^2(k+3)(k+4)} \leq \omega _k[G_1(\bar {w}^1) - \frac {9\gamma _1}{32\Vert A\Vert ^2}D_f^2] \leq 0$ whenever $G_1(\bar {w}^1) \leq \frac {3\gamma _1}{4\Vert A\Vert ^2}D_f$, where $\omega _k := \prod _{i=1}^{k-1}(1-\tau _i)$. Hence, we finally get

$$\displaystyle \begin{aligned} G_{k}(\bar{w}^{k}) \leq \frac{45\gamma_1D_f^2}{8\Vert A\Vert^2(k+3)(k+4)}. \end{aligned} $$

(4.66)

Since $\eta _0 = \frac {\gamma _1}{2\Vert A\Vert ^2}$, it satisfies the condition 5γ ₁ > 2η ₀∥A∥² in Lemma 4. In addition, from Lemma 5, we have $\beta _1 = \frac {27\Vert A\Vert ^2}{20\gamma _1} > \frac {\Vert A\Vert ^2}{\gamma _1}$, which satisfies the second condition in Lemma 4. We also note that $\beta _k \leq \frac {18\Vert A\Vert ^2}{5\gamma _1(k+1)}$. If we take $\hat {\lambda }^0 = \boldsymbol {0}^m$, then Lemma 4 shows that $G_{\gamma _1\beta _1}(\bar {w}^1) \leq \frac {\eta _0}{2}D_f^2 = \frac {\gamma _1}{4\Vert A\Vert ^2}D_f^2 < \frac {9\gamma _1}{32\Vert A\Vert ^2}D_f^2$. Using this estimate and (4.66) into Lemma 2, we obtain (4.23). Finally, if we choose γ ₁ := ∥A∥, then we obtain the worst-case iteration-complexity of Algorithm 1 is $\mathcal {O}(\varepsilon ^{-1})$. $\square $

4.1.3 Proof of Corollary 1: Strong Convexity of g

First, we show that if condition (4.24) hold, then (4.25) holds. Since ∇φ given by (4.5) is Lipschitz continuous with $L_{d^g_0} := \mu _g^{-1}\Vert A\Vert ^2$, similar to the proof of Lemma 3, we have

$$\displaystyle \begin{aligned} \varDelta{G_{\beta_k}} \geq \left[ \eta_k\left(\frac{3}{4} + \frac{\tau_k}{2} - \frac{\eta_k\Vert A\Vert^2}{2\mu_g}\right) - \frac{\tau_k^2}{2(1-\tau_k)\beta_k}\right]\Vert\hat{z}^{k+1}\Vert^2 - \frac{\tau_k^2\eta_k}{4}D_f^2, \end{aligned} $$

(4.67)

where $\varDelta {G_{\beta _k}} := (1-\tau _k)G_{\beta _k}(\bar {w}^k) - G_{\beta _{k+1}}(\bar {w}^{k+1})$. Under the condition (4.24), (4.67) implies (4.25).

The update rule (4.27) is in fact derived from (4.24). We finally prove the bounds (4.28). First, we consider the product $\tau ^2_k\eta _k$. By (4.27) we have

$$\displaystyle \begin{aligned} \tau_k^2\eta_k &= \frac{9\mu_g}{2\Vert A\Vert^2(k+4)^2} < \frac{9\mu_g}{2\Vert A\Vert^2(k+3)(k+4)} \\&= \frac{9\mu_g}{4\Vert A\Vert^2(k+4)} - (1-\tau_k)\frac{9\mu_g}{4\Vert A\Vert^2(k+3)} \end{aligned} $$

By induction, it follows from (4.25) and this last expression that:

$$\displaystyle \begin{aligned} G_{\beta_k}(\bar{w}^k) - \frac{9\mu_gD_f^2}{16\Vert A\Vert^2(k+3)} \leq \omega_k\Big(G_{\beta_1}(\bar{w}^1) - \frac{9\mu_gD_f^2}{64\Vert A\Vert^2}\Big) \leq 0, \end{aligned} $$

(4.68)

whenever $G_{\beta _1}(\bar {w}^1) \leq \frac {9\mu _gD_f^2}{64\Vert A\Vert ^2}$. Since $\bar {u}^1$ is given by (4.26), with the same argument as the proof of Lemma 4, we can show that if $\frac {1}{\beta _1} \leq \frac {5\eta _0}{2} - \frac {\Vert A\Vert ^2\eta _0^2}{\mu _g}$, then $G_{\beta _1}(\bar {w}^1) \leq \frac {\eta _0}{4}D_f^2$. However, from the update rule (4.27), we can see that $\eta _0 = \frac {\mu _g}{2\Vert A\Vert ^2}$ and $\beta _1 = \frac {18\Vert A\Vert ^2}{16\mu _g}$. Using these quantities, we can clearly show that $\frac {1}{\beta _1} \leq \frac {5\eta _0}{2} - \frac {\Vert A\Vert ^2\eta _0^2}{\mu _g} = \frac {\mu _g}{\Vert A\Vert ^2}$. Moreover, $G_{\beta _1}(\bar {w}^1) \leq \frac {\eta _0}{4}D_f^2 < \frac {9\mu _g}{64\Vert A\Vert ^2}D_f^2$. Hence, (4.68) holds. Finally, it remains to use Lemma 2 to obtain (4.28). The second part in (4.30) is proved similarly. The estimate (4.31) is a direct consequence of (4.68). $\square $

4.1.4 Convergence Analysis of Algorithm 2

This appendix provides full proof of Lemmas and Theorems related to the convergence of Algorithm 2.

4.1.4.1 Proof of Lemma 6: Gap Reduction Condition

We first require the following key lemma to analyze the convergence of our SADMM scheme, whose proof is similar to (4.55) and we omit the details here.

Lemma 9

Let $\bar {\lambda }^{k+1}$ be generated by SADMM. Then, for $\lambda \in \mathbb {R}^n$ , one has

$$\displaystyle \begin{aligned} d_{\gamma_{k+1}}(\bar{\lambda}^{k+1}) \leq \tilde{\ell}_{\gamma_{k+1}}(\lambda) + \tfrac{1}{\eta_k}\langle \bar{\lambda}^{k+1} -\hat{\lambda}^k, \lambda - \hat{\lambda}^k\rangle - \tfrac{1}{\eta_k}\Vert \hat{\lambda}^k - \bar{\lambda}^{k+1}\Vert^2 + \tfrac{\Vert A\Vert^2}{2\gamma_{k+1}}\Vert \tilde{\lambda}^k - \bar{\lambda}^{k+1}\Vert^2, \end{aligned} $$

where $\tilde {\lambda }^k := \hat {\lambda }^k - \rho _k(A\hat {u}^{k+1} + B\hat {v}^k - c)$ and $\tilde {\ell }_{\gamma }(\lambda ) := \varphi _{\gamma }(\tilde {\lambda }^k) + \langle \nabla {\varphi _{\gamma }}(\tilde {\lambda }^k), \lambda - \tilde {\lambda }^k\rangle + \psi (\lambda )$.

Now, we can prove Lemma 6. We still use the same notations as in the proof of Lemma 3. In addition, let us denote by $\hat {u}^{*}_{k+1} := u^{\ast }_{\gamma _{k+1}}(A^{\top }\hat {\lambda }^k)$ and $\bar {u}^{\ast }_{k+1} := u^{\ast }_{\gamma _{k+1}}(A^{\top }\bar {\lambda }^k)$ given in (4.12), $\tilde {z}^k := A\hat {u}^{k+1} + B\hat {v}^k - c$ and $\breve {D}_k := \Vert A\hat {u}^{*}_{k+1} + B\hat {v}^k - c\Vert $.

First, since $\varphi _{\gamma }(\tilde {\lambda }^k) + \langle \nabla {\varphi _{\gamma }}(\tilde {\lambda }^k), \lambda - \tilde {\lambda }^k\rangle \leq \varphi _{\gamma }(\lambda )$, it follows from Lemma 9 that

$$\displaystyle \begin{aligned} \begin{array}{ll} d_{\gamma_{k+1}}(\bar{\lambda}^{k+1}) &\leq d_{\gamma_{k+1}}(\lambda) + \tfrac{1}{\eta_k}\langle \bar{\lambda}^{k+1} -\hat{\lambda}^k, \lambda - \hat{\lambda}^k\rangle - \tfrac{1}{\eta_k}\Vert \hat{\lambda}^k - \bar{\lambda}^{k+1}\Vert^2 \vspace{1ex}\\ & \quad + \tfrac{\Vert A\Vert^2}{2\gamma_{k+1}}\Vert \tilde{\lambda}^k - \bar{\lambda}^{k+1}\Vert^2. \end{array} \end{aligned} $$

(4.69)

Next, using [26, Theorem 2.1.5 (2.1.10)] with $g^{\ast }_{\gamma }$ defined in (4.11) and $\lambda := (1-\tau _k)\bar {\lambda }^k + \tau _k\hat {\lambda }^k$ for any τ _k ∈ [0, 1], we have

$$\displaystyle \begin{aligned} \varphi_{\gamma_{k+1}}(\lambda) \leq (1-\tau_k)\varphi_{\gamma_{k+1}}(\bar{\lambda}^k) + \tau_k\varphi_{\gamma_{k+1}}(\hat{\lambda}^k) - \frac{\tau_k(1-\tau_k)\gamma_{k+1}}{2}\Vert \hat{u}^{*}_{k+1} - \bar{u}^{*}_{k+1}\Vert^2. \end{aligned} $$

(4.70)

Since ψ is convex, we also have $\psi (\lambda ) \leq (1-\tau _k)\psi (\bar {\lambda }^k) + \tau _k\psi (\hat {\lambda }^k)$ and $\lambda - \hat {\lambda }^k = (1-\tau _k)\bar {\lambda }^k + \tau _k\hat {\lambda }^k - \hat {\lambda }^k = \tau _k(\hat {\lambda }^k - \lambda ^{\ast }_k)$ due to (4.33). Combining these expressions, the definition d _γ := φ _γ + ψ, (4.69), and (4.70), we can derive

(4.71)

On the one hand, since $\hat {u}^{k+1}$ is the solution of the first convex subproblem in SADMM, using its optimality condition, we can show that

$$\displaystyle \begin{aligned} \begin{array}{ll} \varphi_{\gamma_{k+1}}(\hat{\lambda}^k) - \frac{\rho_k}{2}\breve{D}_k^2 &= \langle \hat{\lambda}^k, A\hat{u}^{*}_{k+1}\rangle - g(\hat{u}^{*}_{k+1}) - \gamma_{k+1}b_{\mathcal{U}}(\hat{u}^{*}_{k+1},\bar{u}^c) - \frac{\rho_k}{2}\breve{D}_k^2\vspace{1ex}\\ &\leq \langle \hat{\lambda}^k,A\hat{u}^{k+1}\rangle - g(\hat{u}^{k+1}) - \frac{\rho_k}{2}\Vert\tilde{z}^k\Vert^2 - \gamma_{k+1}b_{\mathcal{U}}(\hat{u}^{k+1}, \bar{u}_c)\vspace{1ex}\\ &\quad - \frac{\rho_k}{2}\Vert A(\hat{u}^{*}_{k+1} - \hat{u}^{k+1})\Vert^2 - \frac{\gamma_{k+1}}{2}\Vert \hat{u}^{*}_{k+1} - \hat{u}^{k+1} \Vert^2. \end{array} \end{aligned} $$

(4.72)

On the other hand, similar to the proof of Lemma 3, we can show that

(4.73)

Combining (4.72) and (4.73) and noting that d _γ := φ _γ + ψ, we have

(4.74)

Next, using the strong convexity of $b_{\mathcal {U}}$ with $\mu _{b_{\mathcal {U}}} = 1$, we can show that

(4.75)

Combining (4.71), (4.59), (4.74) and (4.75), we can derive

(4.76)

$$\displaystyle \begin{aligned} \begin{array}{ll} \hat{R}_k &:= \frac{\gamma_{k+1}}{2}(1-\tau_k)\tau_k\Vert \hat{u}^{\ast}_{k+1} - \bar{u}^{\ast}_{k+1}\Vert^2 + \frac{\gamma_{k+1}}{4}\tau_k\Vert \hat{u}^{\ast}_{k+1} - \bar{u}_c \Vert^2 \vspace{1ex}\\ &\quad - (1 - \tau_k)(\gamma_k - \gamma_{k+1})b_{\mathcal{U}}(\bar{u}^{\ast}_{k+1}, \bar{u}^c). \end{array} \end{aligned} $$

(4.77)

From SADMM, we have $\bar {\lambda }^{k+1} - \hat {\lambda }^k = -\eta _k\hat {z}^{k+1}$ and $\tilde {\lambda }^k - \hat {\lambda }^k = -\rho _k\tilde {z}^k$. Plugging these expressions and (4.77) into (4.76) we can simplify this estimate as

(4.78)

Using again the elementary inequality $\nu \Vert a\Vert ^2 + \kappa \Vert b\Vert ^2 \geq \frac {\nu \kappa }{\nu +\kappa }\Vert a - b\Vert ^2$, under the condition $\gamma _{k+1} \geq \Vert A\Vert ^2\left (\eta _k + \frac {\rho _k}{\tau _k}\right )$ in (4.34), we can show that

$$\displaystyle \begin{aligned} \frac{1}{2\eta_k}\Vert \bar{\lambda}^{k+1} - \hat{\lambda}^k\Vert^2 + \frac{\tau_k}{2\rho_k}\Vert \tilde{\lambda}^k - \hat{\lambda}^k\Vert^2 - \frac{\Vert A\Vert^2}{2\gamma_{k+1}}\Vert \bar{\lambda}^{k+1} - \tilde{\lambda}^k\Vert^2 \geq 0. \end{aligned} $$

(4.79)

On the other hand, similar to the proof of Lemma 3, we can show that $\frac {\eta _k}{4}\Vert \hat {z}^{k+1}\Vert ^2 - \frac {\tau _k\eta _k}{2}\Vert \hat {z}^{k+1}\Vert D_k \geq - \frac {\eta _k\tau _k^2}{4}D_k^2$. Using this inequality, (4.79), and $\lambda ^{*}_k = -\frac {1}{\beta _k}\bar {z}^k$, we can simplify (4.78) as

(4.80)

Since β _k+1 ≥ (1 − τ _k)β _k due to (4.34), similar to the proof of (4.62) we have

(4.81)

Combining (4.80) and (4.81), we get

$$\displaystyle \begin{aligned} \varDelta{G}_k \geq \frac{1}{2}\Big[ \Big(\frac{1}{2} + \tau_k\Big)\eta_k - \frac{\tau_k^2}{(1 - \tau_k)\beta_k}\Big]\Vert\hat{z}^{k+1}\Vert^2 + \hat{R}_k - \left(\frac{\eta_k\tau_k^2}{4}D_k^2 + \frac{\tau_k\rho_k}{2}\breve{D}_k^2\right). \end{aligned} $$

(4.82)

Next, we estimate $\hat {R}_k$ defined by (4.77) as follows. We define $\bar {a}_k := \bar {u}^{*}_{k+1} - \bar {u}_c$, $\hat {a}_k := \hat {u}^{*}_{k+1} - \bar {u}_c$. Using $b_{\mathcal {U}}(\bar {u}^{\ast }_{k+1}, \bar {u}^c) \leq \frac {L_b}{2}\Vert \bar {u}^{\ast }_{k+1} - \bar {u}^c\Vert ^2$, we can write $\hat {R}_k$ explicitly as

$$\displaystyle \begin{aligned} \begin{array}{ll} \frac{2\hat{R}_k}{\gamma_{k+1}} &= (1 - \tau_k)\tau_k\Vert \bar{a}_k - \hat{a}_k\Vert^2 + \frac{\tau_k}{2}\Vert \hat{a}_k\Vert^2 - (1 - \tau_k)\big(\frac{\gamma_{k}}{\gamma_{k+ 1}} - 1\big)L_b\Vert \bar{a}_k\Vert^2\vspace{1ex}\\ &= \tau_k\left(\frac{3}{2} -\tau_k\right)\left\Vert \hat{a}_k - \frac{(1-\tau)}{(3/2-\tau_k)}\bar{a}_k\right\Vert^2 + (1-\tau_k)\left[\frac{\tau_k}{3-2\tau_k} + \left(1- \frac{\gamma_k}{\gamma_{k+1}}\right)L_b\right]\Vert\bar{a}\Vert^2. \end{array} \end{aligned}$$

Since $\gamma _{k+1} \geq \left (\frac {3-2\tau _k}{3 - (2-L_b^{-1})\tau _k}\right )\gamma _k$ due to (4.34), it is easy to show that $\hat {R}_k \geq 0$. In addition, by (4.34), we also have $(1 + 2\tau _k)\eta _k - \frac {2\tau _k^2}{(1 - \tau _k)\beta _k} \geq 0$. Using these conditions, we can show from (4.82) that $\varDelta {G}_k \geq - \frac {\eta _k\tau _k^2}{4}D_k^2 - \frac {\tau _k\rho _k}{2}\breve {D}_k^2 \geq -\left (\frac {\tau _k^2\eta _k}{4} + \frac {\tau _k\rho _k}{2}\right )D_f^2$, which is indeed the gap reduction condition (4.35). $\square $

4.1.4.2 Proof of Lemma 7: Parameter Updates

Similar to the proof of Lemma 5, we can show that the optimal rate of $\left \{\tau _k\right \}$ is $\mathcal {O}(1/k)$. From the conditions (4.34), it is clear that if we choose $\tau _k := \frac {3}{k+4}$ then $0 < \tau _k \leq \frac {3}{4} < 1$ for k ≥ 0. Next, we choose $\gamma _{k+1} := \left (\frac {3-2\tau _k}{3-\tau _k}\right )\gamma _k$. Then γ _k satisfies (4.34). Substituting $\tau _k = \frac {3}{k+4}$ into this formula we have $\gamma _{k+1} = \left (\frac {k+2}{k+3}\right )\gamma _k$. By induction, we obtain $\gamma _{k+1} = \frac {3\gamma _1}{k+3}$. Now, we choose $\eta _k := \frac {\gamma _{k+1}}{2\Vert A\Vert ^2} = \frac {3\gamma _1}{2\Vert A\Vert ^2(k+3)}$. Then, from the last condition of (4.34), we choose $\rho _k := \frac {\tau _k\gamma _{k+1}}{2\Vert A\Vert ^2} = \frac {9\gamma _1}{2\Vert A\Vert ^2(k+3)(k+4)}$.

To derive an update for β _k, from the third condition of (4.34) with equality, we can derive $\beta _k = \frac {2\tau _k^2}{(1-\tau _k)(1+2\tau _k)\eta _k} = \frac {6\Vert A\Vert ^2(k+3)}{\gamma _1(k+1)(k+10)} < \frac {9\Vert A\Vert ^2}{5\gamma _1(k+1)}$. We need to check the second condition β _k+1 ≥ (1 − τ _k)β _k in (4.34). Indeed, we have $\beta _{k+1} = \frac {6\Vert A\Vert ^2(k+4)}{\gamma _1(k+2)(k+11)} \geq (1 - \tau _k)\beta _k = \frac {6\Vert A\Vert ^2(k+3)}{\gamma _1(k+1)(k+10)}$, which is true for all k ≥ 0. Hence, the second condition of (4.34) holds. $\square $

4.1.4.3 Proof of Theorem 2: Convergence of Algorithm 2

First, we check the conditions of Lemma 4. From the update rule (4.36), we have $\eta _0 = \frac {\gamma _1}{2\Vert A\Vert ^2}$ and $\beta _1 = \frac {12\Vert A\Vert ^2}{11\gamma _1}$. Hence, 5γ ₁ = 10∥A∥²η ₀ > 2∥A∥²η ₀, which satisfies the first condition of Lemma 4. Now, $\frac {2\gamma _1}{(5\gamma _1-2\eta _0\Vert A\Vert ^2)\eta _0} = \frac {\Vert A\Vert ^2}{\gamma _1} < \frac {12\Vert A\Vert ^2}{11\gamma _1} = \beta _1$. Hence, the second condition of Lemma 4 holds.

Next, since $\tau _k = \frac {3}{k+4}$, $\rho _k = \frac {9\gamma _1}{2\Vert A\Vert ^2(k+3)(k+4)}$ and $\eta _k = \frac {3\gamma _1}{2\Vert A\Vert ^2(k+3)}$, we can derive

$$\displaystyle \begin{aligned} \begin{array}{ll} \frac{\tau_k^2\eta_k}{4} + \frac{\tau_k\rho_k}{2} &= \frac{81\gamma_1}{8\Vert A\Vert^2(k + 3)(k + 4)^2} \vspace{1ex}\\ & < \frac{81\gamma_1}{8\Vert A\Vert^2(k+3)(k+4)} - \left(1 - \tau_k\right)\frac{81\gamma_1}{8\Vert A\Vert^2(k+2)(k + 3)}. \end{array} \end{aligned} $$

Substituting this inequality into (4.35) and rearrange the result we obtain

$$\displaystyle \begin{aligned} G_{k+1}(\bar{w}^{k+1}) - \frac{81\gamma_1D_f^2}{8\Vert A\Vert^2(k+3)(k+4)} \leq (1 - \tau_k)\Big[G_k(\bar{w}^k) - \frac{81\gamma_1D_f^2}{8\Vert A\Vert^2(k+2)(k+3)}\Big]. \end{aligned} $$

By induction, we obtain $G_k(\bar {w}^k) - \frac {81\gamma _1D_f^2}{8\Vert A\Vert ^2(k+2)(k+3)} \leq \omega _k\Big [G_0(\bar {w}^0) -\frac {27\gamma _1D_f^2}{16\Vert A\Vert ^2}\Big ] \leq 0$ as long as $G_0(\bar {w}^0) \leq \frac {27\gamma _1D_f^2}{16\Vert A\Vert ^2}$. Now using Lemma 4, we have $G_0(\bar {w}^0) \leq \frac {\eta _0}{4}D_f^2 = \frac {\gamma _1}{8\Vert A\Vert ^2}D_f^2 < \frac {27\gamma _1D_f^2}{16\Vert A\Vert ^2}$. Hence, $G_k(\bar {w}^k) \leq \frac {27\gamma _1D_f^2}{16\Vert A\Vert ^2(k+2)(k+3)}$.

Finally, by using Lemma 2 with $\beta _k := \frac {6\Vert A\Vert ^2(k+3)}{\gamma _1(k+1)(k+10)}$ and $\beta _k \leq \frac {9\Vert A\Vert ^2}{5\gamma _1(k+1)}$, and simplifying the results, we obtain the bounds in (4.37). If we choose γ ₁ := ∥A∥ then, we obtain the worst-case iteration-complexity of Algorithm 2 is $\mathcal {O}(\varepsilon ^{-1})$. $\square $

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tran-Dinh, Q., Cevher, V. (2018). Smoothing Alternating Direction Methods for Fully Nonsmooth Constrained Convex Optimization. In: Giselsson, P., Rantzer, A. (eds) Large-Scale and Distributed Optimization. Lecture Notes in Mathematics, vol 2227. Springer, Cham. https://doi.org/10.1007/978-3-319-97478-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-97478-1_4
Published: 12 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97477-4
Online ISBN: 978-3-319-97478-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Smoothing Alternating Direction Methods for Fully Nonsmooth Constrained Convex Optimization

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Proofs of Technical Results

Appendix: Proofs of Technical Results

4.1.1 Proof of Lemma 2: The Primal-Dual Bounds

4.1.2 Convergence Analysis of Algorithm 1

Lemma 8

Proof

4.1.2.1 Proof of Lemma 4: Bound on G γβ for the First Iteration

4.1.2.2 Proof of Lemma 3: Gap Reduction Condition

4.1.2.3 Proof of Lemma 5: Parameter Updates

4.1.2.4 Proof of Theorem 1: Convergence of Algorithm 1

4.1.3 Proof of Corollary 1: Strong Convexity of g

4.1.4 Convergence Analysis of Algorithm 2

4.1.4.1 Proof of Lemma 6: Gap Reduction Condition

Lemma 9

4.1.4.2 Proof of Lemma 7: Parameter Updates

4.1.4.3 Proof of Theorem 2: Convergence of Algorithm 2

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation

4.1.2.1 Proof of Lemma 4: Bound on G _γβ for the First Iteration