Mirror Prox algorithm for multi-term composite minimization and semi-separable problems

He, Niao; Juditsky, Anatoli; Nemirovski, Arkadi

doi:10.1007/s10589-014-9723-3

Mirror Prox algorithm for multi-term composite minimization and semi-separable problems

Published: 08 January 2015

Volume 61, pages 275–319, (2015)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Niao He¹,
Anatoli Juditsky² &
Arkadi Nemirovski¹

839 Accesses
18 Citations
Explore all metrics

Abstract

In the paper, we develop a composite version of Mirror Prox algorithm for solving convex–concave saddle point problems and monotone variational inequalities of special structure, allowing to cover saddle point/variational analogies of what is usually called “composite minimization” (minimizing a sum of an easy-to-handle nonsmooth and a general-type smooth convex functions “as if” there were no nonsmooth component at all). We demonstrate that the composite Mirror Prox inherits the favourable (and unimprovable already in the large-scale bilinear saddle point case) efficiency estimate of its prototype. We demonstrate that the proposed approach can be successfully applied to Lasso-type problems with several penalizing terms (e.g. acting together $\ell _1$ and nuclear norm regularization) and to problems of semi-separable structures considered in the alternating directions methods, implying in both cases methods with the complexity bounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decomposition Techniques for Bilinear Saddle Point Problems and Variational Inequalities with Affine Monotone Operators

Article 13 June 2016

A Regularized Semi-Smooth Newton Method with Projection Steps for Composite Convex Programs

Article 08 December 2017

A proximal method for composite minimization

Article 29 August 2015

Notes

The precise meaning of simplicity and fitting will be specified later. As of now, it suffices to give a couple of examples. When $\varPsi _k$ is the $\ell _1$ norm, $Y_k$ can be the entire space, or the centered at the origin $\ell _p$-ball, $1\le p\le 2$; when $\varPsi _k$ is the nuclear norm, $Y_k$ can be the entire space, or the centered at the origin Frobenius/nuclear norm ball.
Our exposition follows.
In principle, these parameters should be chosen to optimize the resulting efficiency estimates; this indeed is doable, provided that we have at our disposal upper bounds on the Lipschitz constants of the components of $F_u$ and that $U$ is bounded, see [17, Section 5] or [14, Section 6.3.3].
With our implementation, we run this test for both search points and approximate solutions generated by the algorithm.
Note that the latter relation implies that what was denoted by $\widetilde{\varPhi }$ in Proposition 2 is nothing but $\overline{\varPhi }$.
If the goal of solving (56) were to recover $y_{\#}$, our ${\lambda }$ and $\mu $ would, perhaps, be too large. Our goal, however, was solving (56) as an “optimization beast,” and we were interested in “meaningful” contribution of $\varPsi _0$ and $\varPsi _1$ to the objective of the problem, and thus in not too small ${\lambda }$ and $\mu $.
Recall that we do not expect linear convergence, just $O(1/t)$ one.
Note that in a more complicated matrix recovery problem, where noisy linear combinations of the matrix entries rather than just some of these entries are observed, applying ADMM becomes somehow problematic, whilethe proposed algorithm still is applicable “as is.”
In what follows, we call a collection $a_{s,t}$ of reals nonincreasing in time, if $a_{s',t'}\le a_{s,t}$ whenever $s'\ge s$, same as whenever $s=s'$ and $t'\ge t$. “Nondecreasing in time” is defined similarly.
We assume w.l.o.g. that $|\underline{{\hbox {Opt}}}_{s,t}|\le L$.

References

Andersen, E. D., Andersen, K. D.: The MOSEK optimization tools manual. http://www.mosek.com/fileadmin/products/6_0/tools/doc/pdf/tools.pdf
Aujol, J.-F., Chambolle, A.: Dual norms and image decomposition models. Int. J. Comput. Vis. 63(1), 85–104 (2005)
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MATH MathSciNet Google Scholar
Becker, S., Bobin, J., Candès, E.J.: Nesta: a fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4(1), 1–39 (2011)
Article MATH MathSciNet Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 122–122 (2010)
Article Google Scholar
Buades, A., Coll, B., Morel, J.-M.: A review of image denoising algorithms, with a new one. Multiscale Model. Simul. 4(2), 490–530 (2005)
Article MATH MathSciNet Google Scholar
Candés, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM (JACM) 58(3), 11 (2011)
Article Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Article MATH MathSciNet Google Scholar
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)
Article MATH MathSciNet Google Scholar
Deng, W., Lai, M.-J., Peng, Z., Yin, W.: Parallel multi-block admm with o (1/k) convergence, 2013. http://www.optimization-online.org/DB_HTML/2014/03/4282.html (2013)
Goldfarb, D., Ma, S.: Fast multiple-splitting algorithms for convex optimization. SIAM J. Optim. 22(2), 533–556 (2012)
Article MATH MathSciNet Google Scholar
Goldfarb, D., Ma, S., Scheinberg, K.: Fast alternating linearization methods for minimizing the sum of two convex functions. Math. Program. 141(1–2), 349–382 (2013)
Article MATH MathSciNet Google Scholar
Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.0 beta. http://cvxr.com/cvx (2013)
Juditsky, A., Nemirovski, A.: First-order methods for nonsmooth largescale convex minimization: I general purpose methods; ii utilizing problems structure. In: Sra, S., Nowozin, S., Wright, S. (eds.) Optimization for Machine Learning, pp. 121–183. The MIT Press, (2011)
Lemarchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Math. Program. 69(1–3), 111–147 (1995)
Google Scholar
Monteiro, R.D., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23(1), 475–507 (2013)
Article MATH MathSciNet Google Scholar
Nemirovski, A.: Prox-method with rate of convergence o (1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
Article MATH MathSciNet Google Scholar
Nemirovski, A., Onn, S., Rothblum, U.G.: Accuracy certificates for computational problems with convex structure. Math. Oper. Res. 35(1), 52–78 (2010)
Article MATH MathSciNet Google Scholar
Nemirovski, A., Rubinstein, R.: An efficient stochastic approximation algorithm for stochastic saddle point problems. In: Dror, M., L’Ecuyer, P., Szidarovszky, F. (eds.) Modeling Uncertainty and Examination of Stochastic Theory, Methods, and Applications, pp. 155–184. Kluwer Academic Publishers, Boston (2002)
Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Article MATH MathSciNet Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)
Article MATH MathSciNet Google Scholar
Orabona, F., Argyriou, A., Srebro, N.: Prisma: Proximal iterative smoothing algorithm. arXiv preprint arXiv:1206.2372, (2012)
Ouyang, Y., Chen, Y., Lan, G., Pasiliao, E. Jr.: An accelerated linearized alternating direction method of multipliers, arXiv:1401.6607 (2014)
Qin, Z., Goldfarb, D.: Structured sparsity via alternating direction methods. J. Mach. Learn. Res. 13, 1373–1406 (2012)
MathSciNet Google Scholar
Scheinberg, K., Goldfarb, D., Bai, X.: Fast first-order methods for composite convex optimization with backtracking. http://www.optimization-online.org/DB_FILE/2011/04/3004.pdf (2011)
Tseng, P.: Alternating projection-proximal methods for convex programming and variational inequalities. SIAM J. Optim. 7(4), 951–965 (1997)
Article MATH MathSciNet Google Scholar
Tseng, P.: On accelerated proximal gradient methods for convex–concave optimization. SIAM J. Optim. (2008, submitted)
Wen, Z., Goldfarb, D., Yin, W.: Alternating direction augmented lagrangian methods for semidefinite programming. Math. Program. Comput. 2(3–4), 203–230 (2010)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

Research of the first and the third authors was supported by the NSF Grant CMMI-1232623. Research of the second author was supported by the CNRS-Mastodons Project GARGANTUA, and the LabEx PERSYVAL-Lab (ANR-11-LABX-0025).

Author information

Authors and Affiliations

Georgia Institute of Technology, Atlanta, GA, 30332, USA
Niao He & Arkadi Nemirovski
LJK, Université Grenoble Alpes, B.P. 53, 38041, Grenoble Cedex 9, France
Anatoli Juditsky

Authors

Niao He
View author publications
You can also search for this author in PubMed Google Scholar
Anatoli Juditsky
View author publications
You can also search for this author in PubMed Google Scholar
Arkadi Nemirovski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Niao He.

Appendices

Appendix 1: Proof of Theorem 1

0 $^o$. Let us verify that the prox-mapping (28) indeed is well defined whenever $\zeta =\gamma F_v$ with $\gamma >0$. All we need is to show that whenever $u\in U$, $\eta \in E_u$, $\gamma >0$ and $[w_t;s_t]\in X$, $t=1,2, \ldots $, are such that $\Vert w_t\Vert _2+\Vert s_t\Vert _2\rightarrow \infty $ as $t\rightarrow \infty $, we have

$$\begin{aligned} r_t:=\underbrace{\langle \eta -\omega '(u),w_t\rangle +\omega (w_t)}_{a_t}+\underbrace{\gamma \langle F_v,s_t\rangle }_{b_t} \rightarrow \infty ,\,t\rightarrow \infty . \end{aligned}$$

Indeed, assuming the opposite and passing to a subsequence, we make the sequence $r_t$ bounded. Since $\omega (\cdot )$ is strongly convex, modulus 1, w.r.t. $\Vert \cdot \Vert $, and the linear function $\langle F_v,s\rangle $ of $[w;s]$ is below bounded on $X$ by A4, boundedness of the sequence $\{r_t\}$ implies boundedness of the sequence $\{w_t\}$, and since $\Vert [w_t;s_t]\Vert _2\rightarrow \infty $ as $t\rightarrow \infty $, we get $\Vert s_t\Vert _2\rightarrow \infty $ as $t\rightarrow \infty $. Since $\langle F_v,s\rangle $ is coercive in $s$ on $X$ by A4, and $\gamma >0$, we conclude that $b_t\rightarrow \infty $, $t\rightarrow \infty $, while the sequence $\{a_t\}$ is bounded since the sequence $\{w_t\in U\}$ is so and $\omega $ is continuously differentiable. Thus, $\{a_t\}$ is bounded, $b_t\rightarrow \infty $, $t\rightarrow \infty $, implying that $r_t\rightarrow \infty $, $t\rightarrow \infty $, which is the desired contradiction

$1^{\circ }$. Recall the well-known identity [9]: for all $u,u',w\in U$ one has

$$\begin{aligned} \langle V'_{u}(u'),w-u'\rangle =V_{u}(w)-V_{u'}(w)-V_{u}(u'). \end{aligned}$$

(78)

Indeed, the right hand side is

$$\begin{aligned}&[\omega (w)-\omega (u)-\langle \omega '(u),w-u\rangle ] -\,[\omega (w)-\omega (u')-\langle \omega '(u'),w-u'\rangle ]\\&\qquad -[\omega (u') -\omega (u)-\langle \omega '(u),u'-u\rangle ]\\&\quad =\langle \omega '(u),u-w\rangle +\langle \omega '(u),u'-u\rangle +\langle \omega '(u'),w-u'\rangle \\&\quad =\langle \omega '(u') - \omega '(u),w-u'\rangle =\langle V'_{u}(u'),w-u'\rangle . \end{aligned}$$

For $x=[u;v]\in X,\;\xi =[\eta ;\zeta ]$, let $P_x(\xi )=[u';v']\in X$. By the optimality condition for the problem (28), for all $[s;w]\in X$,

$$\begin{aligned} \langle \eta +V'_u(u'),u'-s\rangle +\langle \zeta ,v'-w\rangle \le 0, \end{aligned}$$

which by (78) implies that

$$\begin{aligned} \langle \eta ,u'-s\rangle +\langle \zeta ,v'-w\rangle \le \langle V'_u(u'),s-u'\rangle = V_{u}(s)-V_{u'}(s)-V_{u}(u'). \end{aligned}$$

(79)

$2^{\circ }$. When applying (79) with $[u;v]=[u_\tau ;v_\tau ]=x_\tau $, $\xi =\gamma _\tau F(x_\tau )=[\gamma _\tau F_u(u_\tau );\gamma _\tau F_v]$, $[u';v']=[u'_\tau ;v'_\tau ]=y_\tau $, and $[s;w]=[u_{\tau +1};v_{\tau +1}]=x_{\tau +1}$ we obtain:

$$\begin{aligned} \gamma _\tau [\langle F_u(u_\tau ),u'_\tau -u_{\tau +1}\rangle +\langle F_v,v'_\tau -v_{\tau +1}\rangle ]\le V_{u_\tau }(u_{\tau +1})-V_{u'_\tau }(u_{\tau +1})-V_{u_\tau }(u'_{\tau });\nonumber \\ \end{aligned}$$

(80)

and applying (79) with $[u;v]=x_\tau $, $\xi =\gamma _\tau F(y_\tau )$, $[u';v']=x_{\tau +1}$, and $[s;w]=z\in X$ we get:

$$\begin{aligned} \gamma _\tau [\langle F_u(u'_\tau ),u_{\tau +1}-s\rangle +\langle F_v,v_{\tau +1}-w\rangle ]\le V_{u_\tau }(s)-V_{u_{\tau +1}}(s)-V_{u_\tau }(u_{\tau +1}).\nonumber \\ \end{aligned}$$

(81)

Adding (81) to (80) we obtain for every $z=[s;w]\in X$

$$\begin{aligned}&{\gamma _\tau \langle F(y_\tau ),y_\tau -z\rangle =\gamma _\tau [\langle F_u(u'_\tau ),u'_\tau -s\rangle +\langle F_v,v'_\tau -w\rangle ]}\le V_{u_\tau }(s)\nonumber \\&-V_{u_{\tau +1}}(s)+\underbrace{\gamma _\tau \langle F_u(u'_\tau )-F_u(u_\tau ), u'_\tau -u_{\tau +1}\rangle -V_{u'_\tau }(u_{\tau +1})-V_{u_\tau }(u'_{\tau }) }_{\delta _\tau }. \end{aligned}$$

(82)

Due to the strong convexity, modulus 1, of $V_u(\cdot )$ w.r.t. $\Vert \cdot \Vert $, $V_u(u')\ge {1\over 2}\Vert u-u'\Vert ^2$ for all $u,u'$. Therefore,

$$\begin{aligned} \delta _\tau&\le \gamma _\tau \Vert F_u(u'_\tau )-F_u(u_\tau )\Vert _*\Vert u'_\tau -u_{\tau +1}\Vert - {\frac{1}{2}}\Vert u'_\tau -u_{\tau +1}\Vert ^2- {\frac{1}{2}}\Vert u_\tau -u'_\tau \Vert ^2\\&\le {\frac{1}{2}}\left[ \gamma _\tau ^2\Vert F_u(u'_\tau )-F_u(u_\tau )\Vert _*^2-\Vert u_\tau -u'_\tau \Vert ^2\right] \\&\le {\frac{1}{2}}\left[ \gamma _\tau ^2[M+L\Vert u'_\tau -u_\tau \Vert ]^2-\Vert u_\tau -u'_\tau \Vert ^2\right] , \end{aligned}$$

where the last inequality is due to (23). Note that $\gamma _\tau L<1$ implies that

$$\begin{aligned} \gamma _\tau ^2[M+L\Vert u'_\tau -u_\tau \Vert ]^2-\Vert u'_\tau -u_\tau \Vert ^2\le \max _r \left[ \gamma _\tau ^2[M+Lr]^2-r^2\right] ={\gamma _\tau ^2M^2\over 1-\gamma _\tau ^2L^2}. \end{aligned}$$

Let us assume that the stepsizes $\gamma _\tau >0$ ensure that (30) holds, meaning that $ \delta _\tau \le \gamma _\tau ^2M^2$ (which, by the above analysis, is definitely the case when $0<\gamma _\tau \le {1\over \sqrt{2}L}$; when $M=0$, we can take also $\gamma _\tau \le {1\over L}$). When summing up inequalities (82) over $\tau =1,2, \ldots ,t$ and taking into account that $V_{u_{t+1}}(s)\ge 0$, we conclude that for all $z=[s;w]\in X$,

$$\begin{aligned} \sum _{\tau =1}^t\lambda ^t_\tau \langle F(y_\tau ),y_\tau -z\rangle&\le {V_{u_1}(s) +\sum _{\tau =1}^t\delta _\tau \over \sum _{\tau =1}^t\gamma _\tau }\le {V_{u_1}(s) +M^2\sum _{\tau =1}^t\gamma _\tau ^2\over \sum _{\tau =1}^t\gamma _\tau },\\ \lambda ^t_\tau&= \gamma _\tau /\sum _{i=1}^t\gamma _i. \end{aligned}$$

$\square $

Appendix 2: Proof of Lemma 1

Proof

All we need to verify is the second inequality in (38). To this end note that when $t=1$, the inequality in (38) holds true by definition of $\widehat{\varTheta }(\cdot )$. Now let $1<t\le N+1$. Summing up the inequalities (82) over $\tau =1, \ldots ,t-1$, we get for every $x=[u;v]\in X$:

$$\begin{aligned} \sum _{\tau =1}^{t-1}\gamma _\tau \langle F(y_\tau ),y_\tau -[u;v]\rangle&\le V_{u_1}(u)-V_{u_t}(u)+\sum _{\tau =1}^{t-1}\delta _\tau \\&\le V_{u_1}(u)-V_{u_t}(u)+\sum _{\tau =1}^{t-1}\delta _\tau \\&\le V_{u_1}(u)-V_{u_t}(u)+M^2\sum _{\tau =1}^{t-1}\gamma _\tau ^2 \end{aligned}$$

(we have used (30)). When $[u;v]$ is $z_*$, the left hand side in the resulting inequality is $\ge 0$, and we arrive at

$$\begin{aligned} V_{u_t}(u_*)\le V_{u_1}(u_*)+M^2\sum _{\tau =1}^{t-1}\gamma _\tau ^2, \end{aligned}$$

hence

$$\begin{aligned} {1\over 2}\Vert u_t-u_*\Vert ^2\le V_{u_1}(u_*)+M^2\sum _{\tau =1}^{t-1}\gamma _\tau ^2 \end{aligned}$$

hence also

$$\begin{aligned} \Vert u_t-u_1\Vert ^2\le 2\Vert u_t-u_*\Vert ^2+2\Vert u_*-u_1 \Vert ^2\le 4\left[ V_{u_1}(u_*)+M^2\sum _{\tau =1}^{t-1}\gamma _\tau ^2\right] +4V_{u_1}(u_*) \end{aligned}$$

and therefore

$$\begin{aligned} \Vert u_t-u_1\Vert \le 2\sqrt{2V_{u_1}(u_*)+M^2\sum _{\tau =1}^{t-1}\gamma _t^2}= R_N, \end{aligned}$$

(83)

and (38) follows. $\square $

Appendix 3: Proof of Proposition 3

Proof

From (82) and (30) it follows that

$$\begin{aligned} \forall (x&= [u;v]\in X,\tau \le N): \lambda _\tau \langle F(y_\tau ),y_\tau -x\rangle \\&\le {\lambda _\tau \over \gamma _\tau }[V_{u_\tau }(u)-V_{u_{\tau +1}}(u)]+M^2\lambda _\tau \gamma _\tau . \end{aligned}$$

Summing up these inequalities over $\tau =1, \ldots ,N$, we get $\forall (x=[u;v]\in X)$:

$$\begin{aligned}&\sum \limits _{\tau =1}^N\lambda _\tau \langle F(y_\tau ),y_\tau -x\rangle \\&\quad \le {\lambda _1\over \gamma _1}[V_{u_1}(u)-V_{u_2}(u)] +\,{\lambda _2\over \gamma _2}[V_{u_2}(u)-V_{u_3}(u)]+ \cdots \\&\quad \quad + {\lambda _N\over \gamma _N}[V_{u_N}(u)-V_{u_{N+1}}(u)] +M^2\sum \limits _{\tau =1}^N\lambda _\tau \gamma _\tau \\&\quad =\underbrace{{\lambda _1\over \gamma _1}}_{\ge 0}V_{u_1}(u) +\underbrace{\left[ {\lambda _2\over \gamma _2} -\,{\lambda _1\over \gamma _1}\right] }_{\ge 0}V_{u_2}(u)+\cdots + \underbrace{\left[ {\lambda _N\over \gamma _N} -{\lambda _{N-1}\over \gamma _{N-1}}\right] }_{\ge 0} V_{u_N}(u)\\&\quad \quad -{\lambda _N\over \gamma _N}\underbrace{V_{u_{N+1}}(u)}_{\ge 0} +M^2\sum \limits _{\tau =1}^N\lambda _\tau \gamma _\tau \\&\qquad \le {\lambda _1\over \gamma _1}\widehat{\varTheta } (\max [R_N,\Vert u-u_1\Vert ])+\left[ {\lambda _2\over \gamma _2} -{\lambda _1\over \gamma _1}\right] \widehat{\varTheta }(\max [R_N,\Vert u-u_1\Vert ])+\cdots \\&\qquad \quad +\left[ {\lambda _N\over \gamma _N}-{\lambda _{N-1}\over \gamma _{N-1}}\right] \widehat{\varTheta }(\max [R_N,\Vert u-u_1\Vert ])+M^2\sum \limits _{\tau =1}^N \lambda _\tau \gamma _\tau ,\\&\quad ={\lambda _N\over \gamma _N}\widehat{\varTheta }(\max [R_N,\Vert u-u_1\Vert ]) +M^2\sum \limits _{\tau =1}^N\lambda _\tau \gamma _\tau , \end{aligned}$$

where the concluding inequality is due to (38), and (40) follows.$\square $

Appendix 4: Proof of Proposition 5

1$^o$. $h_{s,t}(\alpha )$ are concave piecewise linear functions on $[0,1]$ which clearly are pointwise nonincreasing in time. As a result, ${\hbox {Gap}}(s,t)$ is nonincreasing in time. Further, we have

$$\begin{aligned} {\hbox {Gap}}(s,t)&= \max \limits _{\alpha \in [0,1]}\left\{ \min \limits _{\lambda } \sum \limits _{(p,q)\in Q_{s,t}}\lambda _{pq} [\alpha (p-\underline{{\hbox {Opt}}}_{s,t})+(1-\alpha )q]:\; \lambda _{pq}\ge 0,\right. \\&\qquad \quad \left. \sum _{(p,q)\in Q_{s,t}}\lambda _{pq}=1\right\} \\&= \max _{\alpha \in [0,1]}\sum \limits _{(p,q)\in Q_{s,t}}\lambda ^*_{pq}[\alpha (p-\underline{{\hbox {Opt}}}_{s,t})+(1-\alpha )q]\\&= \max \left[ \sum _{(p,q)\in Q_{s,t}}\lambda ^*_{pq}(p -\underline{{\hbox {Opt}}}_{s,t}),\sum _{(p,q)\in Q_{s,t}}\lambda ^*_{pq}q\right] , \end{aligned}$$

where $\lambda ^*_{pq}\ge 0$ and sum up to 1. Recalling that for every $(p,q)\in Q_{s,t}$ we have at our disposal $y_{pq}\in Y$ such that $p\ge f(y_{pq})$ and $q\ge g(y_{pq})$, setting $\widehat{y}^{s,t}=\sum \limits _{(p,q)\in Q_{s,t}}\lambda ^*_{pq}y_{pq}$ and invoking convexity of $f,g$, we get

$$\begin{aligned} f(\widehat{y}^{s,t})&\le \sum \limits _{(p,q)\in Q_{s,t}}\lambda ^*_{pq}p\le \underline{{\hbox {Opt}}}_{s,t}+{\hbox {Gap}}(s,t), \,\,g(\widehat{y}^{s,t})\\&\le \sum \limits _{(p,q)\in Q_{s,t}}\lambda ^*_{pq}q\le {\hbox {Gap}}(s,t); \end{aligned}$$

and (69) follows, due to $\underline{{\hbox {Opt}}}_{s,t}\le {\hbox {Opt}}$.

$2^{\circ }$. We have $\overline{f}_s^t=\alpha _sf(y^{s,t}) +(1-\alpha _s)g(y^{s,t})$ for some $y^{s,t}\in Y$ which we have at our disposal at step $t$, implying that $(\widehat{p}=f(y^{s,t}),\widehat{q}=g(y^{s,t}))\in Q_{s,t}$. Hence by definition of $h_{s,t}(\cdot )$ it holds

$$\begin{aligned} h_{s,t}(\alpha _s)\le \alpha _s (\widehat{p}-\underline{{\hbox {Opt}}}_{s,t})+(1-\alpha _s)\widehat{q}=\overline{f}_s^t-\alpha _s\underline{{\hbox {Opt}}}_{s,t}\le \overline{f}_s^t-\underline{f}_{s,t}, \end{aligned}$$

where the concluding inequality is given by (67). Thus, $h_{s,t}(\alpha _s)\le \overline{f}_s^t-\underline{f}_{s,t} \le \epsilon _t$. On the other hand, if stage $s$ does not terminate in course of the first $t$ steps, $\alpha _s$ is well-centered in the segment $\varDelta _{s,t}$ where the concave function $h_{s,t}(\alpha )$ is nonnegative. We conclude that $0\le {\hbox {Gap}}(s,t)=\max _{0\le \alpha \le 1}h_{s,t}(\alpha )=\max _{\alpha \in \varDelta _{s,t}}h_{s,t}(\alpha )\le 3h_{s,t}(\alpha _s)$. Thus, if a stage $s$ does not terminate in course of the first $t$ steps, we have ${\hbox {Gap}}(s,t)\le 3\epsilon _t$, which implies (71). Further, $\alpha _s$ is the midpoint of the segment $\varDelta ^{s-1}=\varDelta _{s-1,t_{s-1}}$, where $t_r$ is the last step of stage $r$ (when $s=1$, we should define $\varDelta ^0$ as $[0,1]$), and $\alpha _s$ is not well-centered in the segment $\varDelta ^s=\varDelta _{s,t_s}\subset \varDelta _{s-1,t_{s-1}}$, which clearly implies that $|\varDelta ^s|\le \mathrm{\small {3\over 4}}|\varDelta ^{s-1}|$. Thus, $|\varDelta ^s|\le \left( \mathrm{\small {3\over 4}}\right) ^s$ for all $s$. On the other hand, when $|\varDelta _{s,t}|<1$, we have ${\hbox {Gap}}(s,t)=\max _{\alpha \in \varDelta _{s,t}}h_{s,t}(\alpha )\le 3L |\varDelta _{s,t}|$ (since $h_{s,t}(\cdot )$ is Lipschitz continuous with constant $3L$ ^{Footnote 10} and $h_{s,t}(\cdot )$ vanishes at (at least) one endpoint of $\varDelta _{s,t}$). Thus, the number of stages before ${\hbox {Gap}}(s,t)\le \epsilon $ is reached indeed obeys the bound (70).$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, N., Juditsky, A. & Nemirovski, A. Mirror Prox algorithm for multi-term composite minimization and semi-separable problems. Comput Optim Appl 61, 275–319 (2015). https://doi.org/10.1007/s10589-014-9723-3

Download citation

Received: 21 May 2014
Published: 08 January 2015
Issue Date: June 2015
DOI: https://doi.org/10.1007/s10589-014-9723-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mirror Prox algorithm for multi-term composite minimization and semi-separable problems

Abstract

Access this article

Similar content being viewed by others

Decomposition Techniques for Bilinear Saddle Point Problems and Variational Inequalities with Affine Monotone Operators

A Regularized Semi-Smooth Newton Method with Projection Steps for Composite Convex Programs

A proximal method for composite minimization

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Theorem 1

Appendix 2: Proof of Lemma 1

Proof

Appendix 3: Proof of Proposition 3

Proof

Appendix 4: Proof of Proposition 5

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Mirror Prox algorithm for multi-term composite minimization and semi-separable problems

Abstract

Access this article

Similar content being viewed by others

Decomposition Techniques for Bilinear Saddle Point Problems and Variational Inequalities with Affine Monotone Operators

A Regularized Semi-Smooth Newton Method with Projection Steps for Composite Convex Programs

A proximal method for composite minimization

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Theorem 1

Appendix 2: Proof of Lemma 1

Proof

Appendix 3: Proof of Proposition 3

Proof

Appendix 4: Proof of Proposition 5

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation