Accelerated Randomized Mirror Descent Algorithms for Composite Non-strongly Convex Optimization

Hien, Le Thi Khanh; Nguyen, Cuong V.; Xu, Huan; Lu, Canyi; Feng, Jiashi

doi:10.1007/s10957-018-01469-5

Accelerated Randomized Mirror Descent Algorithms for Composite Non-strongly Convex Optimization

Published: 16 January 2019

Volume 181, pages 541–566, (2019)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Le Thi Khanh Hien ORCID: orcid.org/0000-0003-2532-4637¹,
Cuong V. Nguyen²,
Huan Xu³,
Canyi Lu⁴ &
…
Jiashi Feng¹

724 Accesses
6 Citations
Explore all metrics

Abstract

We consider the problem of minimizing the sum of an average function of a large number of smooth convex components and a general, possibly non-differentiable, convex function. Although many methods have been proposed to solve this problem with the assumption that the sum is strongly convex, few methods support the non-strongly convex case. Adding a small quadratic regularization is a common devise used to tackle non-strongly convex problems; however, it may cause loss of sparsity of solutions or weaken the performance of the algorithms. Avoiding this devise, we propose an accelerated randomized mirror descent method for solving this problem without the strongly convex assumption. Our method extends the deterministic accelerated proximal gradient methods of Paul Tseng and can be applied, even when proximal points are computed inexactly. We also propose a scheme for solving the problem, when the component functions are non-smooth.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Catalyst for Smooth Convex Optimization

Mirror Descent and Convex Optimization Problems with Non-smooth Inequality Constraints

A hybrid stochastic optimization framework for composite nonconvex optimization

Article 04 January 2021

References

Nesterov, Y.: A method of solving a convex programming problem with convergence rate $\text{ O }(1/k^2)$. Sov. Math. Dokl. 27(2), 543–547 (1983)
Google Scholar
Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonom. i. Mat. Metody 24, 509–517 (1998)
MATH Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)
Article MathSciNet MATH Google Scholar
Becker, S., Bobin, J., Candès, E.J.: NESTA: a fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4(1), 1–39 (2011)
Article MathSciNet MATH Google Scholar
d’Aspremont, A., Banerjee, O., Ghaoui, L.E.: First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30(1), 56–66 (2008)
Article MathSciNet MATH Google Scholar
Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16(3), 697–725 (2006)
Article MathSciNet MATH Google Scholar
Tseng, P.: On Accelerated Proximal Gradient Methods for Convex–Concave Optimization. Technical report (2008)
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Article MathSciNet MATH Google Scholar
Roux, N.L., Schmidt, M., Bach, F.R.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: Advances in Neural Information Processing Systems, pp. 2663–2671 (2012)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013)
Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24, 2057–2075 (2014)
Article MathSciNet MATH Google Scholar
Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017)
Fercoq, O., Richtárik, P.: Accelerated, parallel, and proximal coordinate descent. SIAM J. Optim. 25(4), 1997–2023 (2015)
Article MathSciNet MATH Google Scholar
Lin, H., Mairal, J., Harchaoui, Z.: A universal catalyst for first-order optimization. In: Advances in Neural Information Processing Systems, pp. 3384–3392 (2015)
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)
Article Google Scholar
Cai, J.F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)
Article MathSciNet MATH Google Scholar
Fadili, J.M., Peyre, G.: Total variation projection with first order schemes. IEEE Trans. Image Process. 20(3), 657–669 (2011)
Article MathSciNet MATH Google Scholar
Ma, S., Goldfarb, D., Chen, L.: Fixed point and Bregman iterative methods for matrix rank minimization. Math. Program. 128(1), 321–353 (2011)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Article MathSciNet MATH Google Scholar
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014)
Article MathSciNet MATH Google Scholar
Schmidt, M., Roux, N.L., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in Neural Information Processing Systems, pp. 1458–1466 (2011)
Solodov, M., Svaiter, B.: Error bounds for proximal point subproblems and associated inexact proximal point algorithms. Math. Program. 88(2), 371–389 (2000)
Article MathSciNet MATH Google Scholar
Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)
Article MathSciNet MATH Google Scholar
Allen-Zhu, Z.K.: The first direct acceleration of stochastic gradient methods. In: ACM SIGACT Symposium on Theory of Computing (2017)
Bregman, L.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)
Article MathSciNet MATH Google Scholar
Teboulle, M.: Convergence of proximal-like algorithms. SIAM J. Optim. 7(4), 1069–1083 (1997)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Dordrecht (2004)
Book MATH Google Scholar
Auslender, A.: Numerical Methods for Nondifferentiable Convex Optimization, pp. 102–126. Springer, Berlin (1987)
MATH Google Scholar
Lee, Y.J., Mangasarian, O.: SSVM: a smooth support vector machine for classification. Comput. Optim. Appl. 20(1), 5–22 (2001)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Defazio, A., Bach, F., Lacoste-julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014)
Fan, R.E., Lin, C.J.: LIBSVM Data: Classification, Regression and Multi-Label. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets (2011). Accessed 01 April 2018
Jacob, L., Obozinski, G., Vert, J.P.: Group Lasso with overlap and graph Lasso. In: International Conference on Machine Learning, pp. 433–440 (2009)
Mosci, S., Villa, S., Verri, A., Rosasco, L.: A primal–dual algorithm for group sparse regularization with overlapping groups. In: Advances in Neural Information Processing Systems, pp. 2604–2612 (2010)

Download references

Acknowledgements

We are grateful to the anonymous reviewers and the Editor-in-Chief for their meticulous comments and insightful suggestions. Le Thi Khanh Hien would like to give a special thank to Prof. W. B. Haskell for his support. Le Thi Khanh Hien was supported by Grant A*STAR 1421200078.

Author information

Authors and Affiliations

National University of Singapore, Singapore, Singapore
Le Thi Khanh Hien & Jiashi Feng
University of Cambridge, Cambridge, UK
Cuong V. Nguyen
Georgia Institute of Technology, Atlanta, USA
Huan Xu
Carnegie Mellon University, Pittsburgh, USA
Canyi Lu

Authors

Le Thi Khanh Hien
View author publications
You can also search for this author in PubMed Google Scholar
Cuong V. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Huan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Canyi Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jiashi Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Le Thi Khanh Hien.

Additional information

Communicated by Gabriel Peyré.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs of Lemmas, Propositions, and Theorems

Proof of Lemma 3.1

We have

$$\begin{aligned} \begin{aligned}&\mathbb {E}\left||{\nabla F(y_{k,s}) - v_k} \right||_*^2 \\&\quad =\mathbb {E}\left||{1/(nq_{i_k})\left( {\nabla f_{i_k}(y_{k,s}) - \nabla f_{i_k}(\tilde{x}_{s-1})} \right) -(\nabla F(y_{k,s})-\nabla F(\tilde{x}_{s-1}))} \right||_*^2 \\&\quad \le \mathbb {E}\left( {\left||{1/(nq_{i_k})\left( {\nabla f_{i_k}(y_{k,s}) - \nabla f_{i_k}(\tilde{x}_{s-1})} \right) } \right||_* + \left||{\nabla F(y_{k,s})-\nabla F(\tilde{x}_{s-1})} \right||_*} \right) ^2 \\&\quad \le 2\mathbb {E}\frac{1}{(nq_{i_k})^2} \left||{\nabla f_{i_k}(y_{k,s})-\nabla f_{i_k}(\tilde{x}_{s-1})} \right||_*^2 +2\left||{\nabla F(y_{k,s})-\nabla F(\tilde{x}_{s-1}) } \right||_*^2. \end{aligned} \end{aligned}$$

$\square $

Proof of Lemma 3.2

For notation succinctness, we omit the subscript s when no confusing is caused. Applying Lemma 2.4(1), we have:

$$\begin{aligned} \begin{aligned} F^P(x_k)&=\frac{1}{n}\sum \limits _{i=1}^n f_i(x_k) + P(x_k)\\&\le \frac{1}{n}\sum \limits _{i=1}^n\left( { f_i(y_k) + \left\langle {\nabla f_i(y_k)}, {x_k-y_k} \right\rangle +\frac{L_i}{2}\left||{x_k-y_k} \right||^2} \right) + P(x_k)\\&=F(y_k) + \left\langle {\nabla F(y_k)-v_k}, {x_k-y_k} \right\rangle \\&\quad +\,\frac{L_A}{2}\left||{x_k-y_k} \right||^2 + P(x_k)+\left\langle {v_k}, {x_k-y_k} \right\rangle \\&\le F(y_k)+\frac{2L_Q}{\alpha _3} \left||{x_k-y_k} \right||^2 +\,\frac{\alpha _3}{8 L_Q}\left||{\nabla F(y_k) -v_k} \right||_*^2 \\&\quad +\,\frac{L_A}{2}\left||{x_k-y_k} \right||^2 + P(x_k) +\left\langle {v_k}, {x_k-y_k} \right\rangle , \end{aligned} \end{aligned}$$

where the last inequality uses $\left\langle {a}, {b} \right\rangle \le \frac{1}{2}\left||{a} \right||_*^2 + \frac{1}{2}\left||{b} \right||^2$. Together with the update rule (4), Lemma 2.1 with $\sigma =1$, and noting that $\hat{x}_k - y_k=\alpha _2(z_k-z_{k-1})$, we get:

$$\begin{aligned} F^P(x_k)&\le F(y_k) + \frac{\alpha _3}{8 L_Q}\left||{\nabla F(y_k) -v_k} \right||_*^2 \\&\quad +\,\left\langle {v_k}, {\hat{x}_k-y_k} \right\rangle +\frac{1}{2} \left( {L_A+\frac{4L_Q}{\alpha _3}} \right) \left||{\hat{x}_k-y_k} \right||^2+P(\hat{x}_k) \\&=F(y_k) + \frac{\alpha _3}{8 L_Q}\left||{\nabla F(y_k) -v_k} \right||_*^2+ \alpha _2\left\langle {v_k}, {z_k-z_{k-1}} \right\rangle \\&\quad +\,\frac{1}{2} \overline{L}\alpha _2^2 \left||{z_k-z_{k-1}} \right||^2+P(\hat{x}_k) \\&\le F(y_k) + \frac{\alpha _3}{8 L_Q}\left||{\nabla F(y_k) -v_k} \right||_*^2 \\&\quad +\,\alpha _2\left( {\left\langle {v_k}, {z_k-z_{k-1}} \right\rangle +\theta _s D(z_k,z_{k-1})} \right) +P(\hat{x}_k). \end{aligned}$$

$\square $

Proof of Lemma 3.3

We have $\bar{z}_{k,s}=\arg \min \nolimits _{x\in X_s} \{\phi _k(x)+D(x,z_{k-1,s})\}$, where we let $\phi _k(x)=\frac{1}{\theta _s }(\left\langle {v_k}, {x} \right\rangle + P(x))$. From Lemma 2.2, for all $x\in X_s \cap \mathrm {dom}P $, we have:

$$\begin{aligned} \frac{1}{\theta _s}(\left\langle {v_k}, {x} \right\rangle + P(x)) + D(x,z_{k-1,s}) \ge \min \limits _{x\in X_s} \{\phi _k(x) + D(x,z_{k-1,s})\} + D(x,\bar{z}_{k,s}). \end{aligned}$$

Together with $z_{k,s}\approx _{\varepsilon _{k,s}}\arg \min _{x\in X_s}\theta _s( \phi (x) + D(x,z_{k-1,s}))$, we get:

$$\begin{aligned}&\left\langle {v_k}, {x} \right\rangle + P(x)+ \theta _sD(x,z_{k-1,s})\ge \left\langle {v_k}, {z_{k,s}} \right\rangle +P(z_{k,s}) \nonumber \\&\quad +\,\theta _sD(z_{k,s},z_{k-1,s}) -\varepsilon _{k,s}+ \theta _s D(x,\bar{z}_{k,s}). \end{aligned}$$

(11)

From Lemma 2.3, we get

$$\begin{aligned} D(x,\bar{z}_{k,s})=D(x,z_{k,s})+ D(z_{k,s},\bar{z}_{k,s})-\left\langle {x-z_{k,s}}, {\nabla h(\bar{z}_{k,s})-\nabla h(z_{k,s})} \right\rangle . \end{aligned}$$

Thus, the result follows. $\square $

Proof of Proposition 3.1

For notation succinctness, we omit the subscript s when no confusion is caused. Applying Lemma 3.2, we have:

$$\begin{aligned}&F^P(x_k)\le F(y_k) + \frac{\alpha _3}{8 L_Q}\left||{\nabla F(y_k) -v_k} \right||_*^2 \nonumber \\&\quad +\,\alpha _2\left( {\left\langle {v_k}, {z_k-z_{k-1}} \right\rangle + \theta _s D(z_k,z_{k-1})} \right) + P(\hat{x}_k). \end{aligned}$$

(12)

From Inequality (12) and Lemma 3.3, we deduce that:

$$\begin{aligned} \begin{aligned} F^P(x_k)&\le F(y_k) + \frac{\alpha _3}{8 L_Q}\left||{\nabla F(y_k) -v_k} \right||_*^2 \\&\quad +\,\alpha _2( \left\langle {v_k}, {x-z_{k-1}} \right\rangle +P(x)-P(z_k)) + P(\hat{x}_k) \\&\quad +\,\alpha _2 \theta _s (D(x,z_{k-1})-D(x,z_k)-D(z_k,\bar{z}_k) \\&\quad +\,\left\langle {x-z_{k}}, {\nabla h(\bar{z}_{k})-\nabla h(z_{k})} \right\rangle ) +\alpha _2\varepsilon _{k,s}. \end{aligned} \end{aligned}$$

(13)

Note that $\mathbb {E}_{i_k} [v_k]=\nabla F(y_k)$ (we omit the subscript $i_k$ of the conditional expectation when it is clear in the context) and $P(\hat{x}_k) \le \alpha _1 P(x_{k-1})+\alpha _2 P(z_k) + \alpha _3 P(\tilde{x}_{s-1})$. Taking expectation with respected to $i_k$ conditioned on $i_{k-1}$, it follows from (13) that:

$$\begin{aligned} \mathbb {E}F^P(x_k)\le & {} F(y_k) + \alpha _3\left( {\frac{1}{8L_Q}\mathbb {E}\left||{\nabla F(y_k) - v_k} \right||_*^2+ \left\langle {\nabla F(y_k)}, {\tilde{x}_{s-1} - y_k} \right\rangle } \right) \nonumber \\&-\,\alpha _3 \left\langle {\nabla F(y_k)}, {\tilde{x}_{s-1} - y_k} \right\rangle + \alpha _2\left\langle {\nabla F(y_k)}, {x-z_{k-1}} \right\rangle +\alpha _2 P(x) \nonumber \\&+\,\alpha _1 P(x_{k-1})+\alpha _3 P(\tilde{x}_{s-1})+\alpha _2\theta _s ( D(x,z_{k-1}) - \mathbb {E}D(x,z_k) ) + r_k.\nonumber \\ \end{aligned}$$

(14)

On the other hand, applying Lemma 3.1, the second inequality of Lemma 2.4, and noting that $\frac{1}{L_Q n q_i}\le \frac{1}{L_i}$ and $\frac{1}{L_Q}\le \frac{1}{L_A}$, we have:

$$\begin{aligned} \begin{aligned}&\frac{1}{8L_Q}\mathbb {E}\left||{\nabla F(y_k) - v_k} \right||_*^2+ \left\langle {\nabla F(y_k)}, {\tilde{x}_{s-1} - y_k} \right\rangle \\&\quad \le \frac{1}{4L_Q}\mathbb {E}\frac{1}{(nq_{i_k})^2} \left||{\nabla f_{i_k}(y_k)-\nabla f_{i_k}(\tilde{x}_{s-1})} \right||_*^2 + \frac{1}{4L_Q} \left||{\nabla F(y_k)-\nabla F(\tilde{x}_{s-1})} \right||_*^2\\&\qquad +\,\left\langle {\nabla F(y_k)}, {\tilde{x}_{s-1} - y_k} \right\rangle \\&\quad =\frac{1}{n}\sum \limits _{i=1}^n \frac{1}{4L_Q} \frac{1}{nq_i}\left||{\nabla f_i(y_k)-\nabla f_i(\tilde{x}_{s-1})} \right||_*^2 + \frac{1}{2n}\sum \limits _{i=1}^n \left\langle {\nabla f_i(y_k)}, {\tilde{x}_{s-1} - y_k} \right\rangle \\&\qquad +\,\frac{1}{2 }\left\langle {\nabla F(y_k)}, {\tilde{x}_{s-1} - y_k} \right\rangle +\frac{1}{4L_Q} \left||{\nabla F(y_k)-\nabla F(\tilde{x}_{s-1})} \right||_*^2 \\&\quad \le \frac{1}{2n}\sum \limits _{i=1}^n \left( {\frac{1}{2L_i} \left||{\nabla f_i(y_k)-\nabla f_i(\tilde{x}_{s-1})} \right||_*^2 +\left\langle {\nabla f_i(y_k)}, {\tilde{x}_{s-1} - y_k} \right\rangle } \right) \\&\quad +\,\frac{1}{2}\left( {\frac{1}{2L_A}\left||{\nabla F(y_k)-\nabla F(\tilde{x}_{s-1})} \right||_*^2 + \left\langle {\nabla F(y_k)}, {\tilde{x}_{s-1} - y_k} \right\rangle } \right) \\&\quad \le \frac{1}{2n}\sum \limits _{i=1}^n (f_i(\tilde{x}_{s-1})-f_i(y_k))+\frac{1}{2}\left( {F(\tilde{x}_{s-1}) - F(y_k)} \right) = F(\tilde{x}_{s-1}) - F(y_k). \end{aligned} \end{aligned}$$

(15)

Therefore, (14) and (15) imply that:

$$\begin{aligned} \mathbb {E}F^P(x_k)&\le (1-\alpha _3)F(y_k) + \alpha _3 F^P(\tilde{x}_{s-1}) + \alpha _2\left\langle {\nabla F(y_k)}, {x-y_k} \right\rangle +\alpha _2 P(x) \\&\qquad +\,\alpha _2\left\langle {\nabla F(y_k)}, {y_k-z_{k-1}} \right\rangle - \alpha _3 \left\langle {\nabla F(y_k)}, {\tilde{x}_{s-1} -y_k} \right\rangle +\alpha _1 P(x_{k-1}) \\&\qquad +\,\alpha _2 \theta _s (D(x,z_{k-1}) - \mathbb {E}D(x,z_k))+r_k\\&{\mathop {\le }\limits ^\mathrm{(a)}}(1-\alpha _3)F(y_k) + \alpha _3 F^P(\tilde{x}_{s-1})+\alpha _2(F(x)-F(y_k))+\alpha _2 P(x)\\&\qquad +\,\alpha _1\left\langle {\nabla F(y_k)}, {x_{k-1}-y_k} \right\rangle + \alpha _1 P(x_{k-1})+ \alpha _2 \theta _s (D(x,z_{k-1}) \\&\, \qquad -\,\mathbb {E}D(x,z_k))+r_k\\&{\mathop {\le }\limits ^\mathrm{(b)}}(1-\alpha _3-\alpha _2) F(y_k) + \alpha _3 F^P(\tilde{x}_{s-1})+ \alpha _2 F^P(x) \\&\qquad +\,\alpha _1(F(x_{k-1}) - F(y_k)) + \alpha _1 P(x_{k-1})+ \alpha _2 \theta _s (D(x,z_{k-1}) \\&\, \qquad -\,\mathbb {E}D(x,z_k))+r_k\\&= \alpha _1 F^P(x_{k-1}) + \alpha _2 F^P(x) + \alpha _3 F^P(\tilde{x}_{s-1})+ \alpha _2 \theta _s (D(x,z_{k-1}) \\&\, \qquad -\,\mathbb {E}D(x,z_k))+r_k. \end{aligned}$$

Here, in (a) we use

$$\begin{aligned}&\left\langle {\nabla F(y_k)}, {x-y_k} \right\rangle \le F(x)-F(y_k)\, \text { and} \,\alpha _2(y_k-z_{k-1})-\alpha _3(\tilde{x}_{s-1}-y_k) \\&\quad =\alpha _1(x_{k-1}-y_k), \end{aligned}$$

in (b) we use $\left\langle {\nabla F(y_k)}, {x_{k-1}-y_k} \right\rangle \le F(x_{k-1})-F(y_k)$. Finally, we take expectation with respect to $i_{k-1}$ to get the result. $\square $

Proof of Proposition 3.2

Applying Proposition 3.1 with $x=x^*$, we have:

$$\begin{aligned} \begin{aligned} \mathbb {E}(F^P(x_{k,s}) -F^P(x^*))&\le \alpha _{1,s} \mathbb {E}(F^P(x_{k-1,s})-F^P(x^*))+\alpha _{3}(F^P(\tilde{x}_{s-1})-F^P(x^*)) \\&\qquad +\,\alpha _{2,s}^2\overline{L} (\mathbb {E}D(x^*,z_{k-1,s}) - \mathbb {E}D(x^*,z_{k,s}))+ r^*_{k,s}. \end{aligned} \end{aligned}$$

Denote $d_{k,s}=\mathbb {E}(F^P(x_{k,s}) -F^P(x^*))$, then

$$\begin{aligned} d_{k,s} \le \alpha _{1,s} d_{k-1,s} + \alpha _{3} \tilde{d}_{s-1} + \alpha _{2,s}^2\overline{L} (\mathbb {E}D(x^*,z_{k-1,s}) - \mathbb {E}D(x^*,z_{k,s}))+ r^*_{k,s}, \end{aligned}$$

which implies $ \frac{1}{\alpha _{2,s}^2}d_{k,s} \le \frac{\alpha _{1,s}}{\alpha _{2,s}^2} d_{k-1,s} + \frac{\alpha _{3}}{\alpha _{2,s}^2}\tilde{d}_{s-1} + \overline{L} (\mathbb {E}D(x^*,z_{k-1,s}) - \mathbb {E}D(x^*,z_{k,s}))+\frac{ r^*_{k,s}}{\alpha _{2,s}^2}. $ Summing up this inequality from $k=1$ to $k=m$, we get:

$$\begin{aligned} \frac{1}{\alpha _{2,s}^2}d_{m,s}+\frac{1-\alpha _{1,s}}{\alpha _{2,s}^2}\sum \limits _{k=1}^{m-1} d_{k,s}&\le \frac{\alpha _{1,s}}{\alpha _{2,s}^2}d_{0,s}+ \frac{\alpha _{3}}{\alpha _{2,s}^2}m \tilde{d}_{s-1} \\&\, \qquad +\,\overline{L} \left( {\mathbb {E}D(x^*,z_{0,s}) - \mathbb {E}D(x^*,z_{m,s}) } \right) \\&\,\qquad +\,\frac{\sum _{k=1}^m r^*_{k,s}}{\alpha _{2,s}^2}. \end{aligned}$$

Using the update rule (5), $\alpha _{1,s}+\alpha _{3}=1-\alpha _{2,s}$, $z_{m,s-1}=z_{0,s}$, and $d_{m,s-1}=d_{0,s}$, we get:

$$\begin{aligned} \begin{aligned} \frac{1}{\alpha _{2,s}^2}d_{m,s}+\frac{1-\alpha _{1,s}}{\alpha _{2,s}^2}\sum \limits _{k=1}^{m-1} d_{k,s}&\le \frac{1-\alpha _{2,s}}{\alpha _{2,s}^2}d_{m,s-1} + \frac{\alpha _{3}}{\alpha _{2,s}^2} \sum \limits _{k=1}^{m-1} d_{k,s-1} \\&\quad + \overline{L} \left( {\mathbb {E}D(x^*,z_{m,s-1}) - \mathbb {E}D(x^*,z_{m,s})} \right) +\frac{\sum _{k=1}^m r^*_{k,s}}{\alpha _{2,s}^2}. \end{aligned} \end{aligned}$$

Combining with the update rule (2), we obtain:

$$\begin{aligned} \begin{aligned}&\frac{1-\alpha _{2,s+1}}{\alpha _{2,s+1}^2}d_{m,s}+\frac{\alpha _{3}}{\alpha _{2,s+1}^2} \sum \limits _{k=1}^{m-1} d_{k,s} \le \frac{1-\alpha _{2,s}}{\alpha _{2,s}^2} d_{m,s-1} + \frac{\alpha _{3}}{\alpha _{2,s}^2} \sum \limits _{k=1}^{m-1} d_{k,s-1} \\&\qquad +\,\overline{L} \left( {\mathbb {E}D(x^*,z_{m,s-1}) - \mathbb {E}D(x^*,z_{m,s})} \right) +\frac{\sum _{k=1}^m r^*_{k,s}}{\alpha _{2,s}^2}. \end{aligned} \end{aligned}$$

(16)

Therefore,

$$\begin{aligned} \begin{aligned} \frac{\alpha _{3}}{\alpha _{2,s+1}^2} m\tilde{d}_s&{\mathop {\le }\limits ^\mathrm{(a)}}\frac{\alpha _{3}}{\alpha _{2,s+1}^2}\sum \limits _{k=1}^{m} d_{k,s} {\mathop {\le }\limits ^\mathrm{(b)}}\frac{1-\alpha _{2,s+1}}{\alpha _{2,s+1}^2}d_{m,s}+\frac{\alpha _{3}}{\alpha _{2,s+1}^2} \sum \limits _{k=1}^{m-1} d_{k,s}\\&{\mathop {\le }\limits ^\mathrm{(c)}}\frac{1-\alpha _{2,1}}{\alpha _{2,1}^2} d_{m,0}+\frac{\alpha _{3}}{\alpha _{2,1}^2}\sum \limits _{k=1}^{m-1}d_{k,0}+\overline{L} \left( { \mathbb {E}D(x^*,z_{m,0}) - \mathbb {E}D(x^*,z_{m,s})} \right) \\ {}&\qquad +\sum \limits _{i=1}^s\frac{\sum _{k=1}^m r^*_{k,i}}{\alpha _{2,i}^2}, \end{aligned} \end{aligned}$$

where in (a) we use the update rule (5), in (b) we use the property $\alpha _3 \le 1-\alpha _{2,s+1}$, and in (c) we use the recursive inequality (16). The result then follows. $\square $

Proof of Theorem 3.1

Without loss of generality, we can assume that:

$$\begin{aligned} \frac{1}{m}\left( {\frac{4(1-\alpha _{2,1})}{\alpha _{2,1}^2\alpha _3} d_{m,0}+\frac{4}{\alpha _{2,1}^2}\sum \limits _{i=1}^{m-1}d_{i,0}} \right) =O(F^P(\tilde{x}_0)-F^P(x^*)). \end{aligned}$$

When $\varepsilon _{k,s}=0$, then $z_{k,s}=\bar{z}_{k,s}$ and we have $r_{k,s}=0$. The convergence rate of exact ASMD follows from Proposition 3.2 by taking $\alpha _{2,s}=\frac{2}{s+2}$ and noting that $D(x^*,z_{m,s})\ge 0$. $\square $

Proof of Theorem 3.2

We remind that Inequality (11) holds for all x. Taking $x=z_{k,s}$, (11) yields that $D(z_{k,s},\bar{z}_{k,s})\le \frac{\varepsilon _{k,s}}{\theta _s}$. On the other hand, if $ h(\cdot )$ is $L_h$-Lipschitz smooth, then:

$$\begin{aligned} \left||{\nabla h(\bar{z}_{k,s})-\nabla h(z_{k,s})} \right||\le L_h \left||{\bar{z}_{k,s}-z_{k,s}} \right||\le L_h\sqrt{2D(z_{k,s},\bar{z}_{k,s})}\le L_h\sqrt{\frac{2\varepsilon _{k,s}}{\theta _s}}. \end{aligned}$$

If $\left||{z_{k,s}} \right||\le C$, then we let $C_1=\left||{x^*} \right||+C$. Noting that $D(z_{k,s},\bar{z}_{k,s})\ge 0$, we have

$$\begin{aligned} \begin{aligned} {r}^*_{k,s}&\le \alpha _{2,s}\theta _s\left||{x^*-z_{k,s}} \right||\left||{\nabla h(\bar{z}_{k,s})-\nabla h(z_{k,s})} \right||+ \alpha _{2,s}\varepsilon _{k,s} \\&\le \alpha _{2,s} C_1 L_h\sqrt{2\epsilon _s\theta _s}+ \alpha _{2,s}\epsilon _s. \end{aligned} \end{aligned}$$

Hence,

$$\begin{aligned} \alpha _{2,s+1}^2\sum \limits _{i=1}^s\sum \limits _{k=1}^m \frac{{r}^*_{k,i}}{m\alpha _3\alpha _{2,i}^2} \le \alpha _{2,s+1}^2 \sum \limits _{i=1}^s\left( { \frac{C_1L_h\sqrt{2\epsilon _i\bar{L}}}{\alpha _3\sqrt{\alpha _{2,i}}}+\frac{\epsilon _i}{\alpha _3\alpha _{2,i}}} \right) . \end{aligned}$$

(17)

If the adaptive inexact rule $\max \left\{ \left||{\bar{z}_{k,s}} \right||^2\varepsilon _{k,s},C\varepsilon _{k,s}\right\} \le C\epsilon _s$ is chosen, we have

$$\begin{aligned} \begin{aligned} {r}^*_{k,s}&\le \alpha _{2,s}\theta _s\left( {\left||{x^*} \right||+\left||{\bar{z}_{k,s}} \right||+\left||{\bar{z}_{k,s}-z_{k,s}} \right||} \right) \left||{\nabla h(\bar{z}_{k,s})-\nabla h(z_{k,s})} \right||+ \alpha _{2,s}\varepsilon _{k,s}\\&\le \alpha _{2,s} \left||{x^*} \right|| L_h\sqrt{2\epsilon _s\theta _s}+ \alpha _{2,s} L_h \sqrt{2C\epsilon _s\theta _s} + \alpha _{2,s}L_h 2 \epsilon _s+ \alpha _{2,s}\epsilon _{s}. \end{aligned} \end{aligned}$$

In this case, we let $C_1=\left||{x^*} \right||+\sqrt{C}$. We then have

$$\begin{aligned} \alpha _{2,s+1}^2\sum \limits _{i=1}^s\sum \limits _{k=1}^m \frac{{r}^*_{k,i}}{m\alpha _3\alpha _{2,i}^2} \le \alpha _{2,s+1}^2 \sum \limits _{i=1}^s\left( { \frac{C_1L_h\sqrt{2\epsilon _i\bar{L}}}{\alpha _3\sqrt{\alpha _{2,i}}}+\frac{(2L_h+1)\epsilon _i}{\alpha _3\alpha _{2,i}}} \right) . \end{aligned}$$

(18)

The result then follows from (17), (18), and Proposition 3.2 easily. $\square $

Proof of Theorem 3.3

Let $x^*_\mu $ is the optimal solution of Problem (9). We have:

$$\begin{aligned} \mathbb {E}F^P_\mu (\tilde{x}_{\mu ,s})-F^P_\mu (x^*_\mu )=O\left( {\frac{1+\frac{4\overline{L}_\mu }{m}+\bar{C}}{(s+3)^2}} \right) , \end{aligned}$$

(19)

where $\bar{C}=O\left( {\sqrt{\bar{L}_\mu }} \right) $, by applying Theorem 3.2. By Assumption 3.1, we have:

$$\begin{aligned} \begin{aligned} \mathbb {E}F^P(\tilde{x}_{\mu ,s})-F^P(x^*)&= \mathbb {E}F(\tilde{x}_{\mu ,s}) + \mathbb {E}P (\tilde{x}_{\mu ,s}) - F(x^*) - P(x^*)\\&\le \mathbb {E}F_\mu (\tilde{x}_{\mu ,s}) + \overline{K} \mu + \mathbb {E}P (\tilde{x}_{\mu ,s}) - F(x^*)-P(x^*) \\&\le \mathbb {E}F_\mu (\tilde{x}_{\mu ,s})+ \overline{K} \mu + \mathbb {E}P (\tilde{x}_{\mu ,s}) -F_\mu (x^*) + \underline{K}\mu -P(x^*) \\&\le \mathbb {E}F^P_\mu (\tilde{x}_{\mu ,s})-F^P_\mu (x^*)+ \left( {\overline{K}+\underline{K}} \right) \mu . \end{aligned} \end{aligned}$$

Together with (19) and noting that $F^P_\mu (x^*)\ge F^P_\mu (x^*_\mu )$, we get:

$$\begin{aligned}&\mathbb {E}F^P(\tilde{x}_{\mu ,s})-F^P(x^*)\le \mathbb {E}F^P_\mu (\tilde{x}_{\mu ,s})-F^P_\mu (x_\mu ^*)+ \left( {\overline{K}+\underline{K}} \right) \\&\mu =O\left( {\frac{1+\frac{4\overline{L}_\mu }{m}+\bar{C}}{(s+3)^2}} \right) +\left( {\overline{K}+\underline{K}} \right) \mu . \end{aligned}$$

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hien, L.T.K., Nguyen, C.V., Xu, H. et al. Accelerated Randomized Mirror Descent Algorithms for Composite Non-strongly Convex Optimization. J Optim Theory Appl 181, 541–566 (2019). https://doi.org/10.1007/s10957-018-01469-5

Download citation

Received: 16 April 2018
Accepted: 31 December 2018
Published: 16 January 2019
Issue Date: 01 May 2019
DOI: https://doi.org/10.1007/s10957-018-01469-5

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerated Randomized Mirror Descent Algorithms for Composite Non-strongly Convex Optimization

Abstract

Access this article

Similar content being viewed by others

Adaptive Catalyst for Smooth Convex Optimization

Mirror Descent and Convex Optimization Problems with Non-smooth Inequality Constraints

A hybrid stochastic optimization framework for composite nonconvex optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs of Lemmas, Propositions, and Theorems

Proof of Lemma 3.1

Proof of Lemma 3.2

Proof of Lemma 3.3

Proof of Proposition 3.1

Proof of Proposition 3.2

Proof of Theorem 3.1

Proof of Theorem 3.2

Proof of Theorem 3.3

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Accelerated Randomized Mirror Descent Algorithms for Composite Non-strongly Convex Optimization

Abstract

Access this article

Similar content being viewed by others

Adaptive Catalyst for Smooth Convex Optimization

Mirror Descent and Convex Optimization Problems with Non-smooth Inequality Constraints

A hybrid stochastic optimization framework for composite nonconvex optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs of Lemmas, Propositions, and Theorems

Appendix: Proofs of Lemmas, Propositions, and Theorems

Proof of Lemma 3.1

Proof of Lemma 3.2

Proof of Lemma 3.3

Proof of Proposition 3.1

Proof of Proposition 3.2

Proof of Theorem 3.1

Proof of Theorem 3.2

Proof of Theorem 3.3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation