Skip to main content
Log in

Accelerated Randomized Mirror Descent Algorithms for Composite Non-strongly Convex Optimization

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

We consider the problem of minimizing the sum of an average function of a large number of smooth convex components and a general, possibly non-differentiable, convex function. Although many methods have been proposed to solve this problem with the assumption that the sum is strongly convex, few methods support the non-strongly convex case. Adding a small quadratic regularization is a common devise used to tackle non-strongly convex problems; however, it may cause loss of sparsity of solutions or weaken the performance of the algorithms. Avoiding this devise, we propose an accelerated randomized mirror descent method for solving this problem without the strongly convex assumption. Our method extends the deterministic accelerated proximal gradient methods of Paul Tseng and can be applied, even when proximal points are computed inexactly. We also propose a scheme for solving the problem, when the component functions are non-smooth.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(\text{ O }(1/k^2)\). Sov. Math. Dokl. 27(2), 543–547 (1983)

    Google Scholar 

  2. Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonom. i. Mat. Metody 24, 509–517 (1998)

    MATH  Google Scholar 

  3. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  4. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  5. Becker, S., Bobin, J., Candès, E.J.: NESTA: a fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4(1), 1–39 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  6. d’Aspremont, A., Banerjee, O., Ghaoui, L.E.: First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30(1), 56–66 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  7. Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16(3), 697–725 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  8. Tseng, P.: On Accelerated Proximal Gradient Methods for Convex–Concave Optimization. Technical report (2008)

  9. Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  10. Roux, N.L., Schmidt, M., Bach, F.R.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: Advances in Neural Information Processing Systems, pp. 2663–2671 (2012)

  11. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013)

  12. Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24, 2057–2075 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  13. Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017)

  14. Fercoq, O., Richtárik, P.: Accelerated, parallel, and proximal coordinate descent. SIAM J. Optim. 25(4), 1997–2023 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  15. Lin, H., Mairal, J., Harchaoui, Z.: A universal catalyst for first-order optimization. In: Advances in Neural Information Processing Systems, pp. 3384–3392 (2015)

  16. Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)

    Article  Google Scholar 

  17. Cai, J.F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  18. Fadili, J.M., Peyre, G.: Total variation projection with first order schemes. IEEE Trans. Image Process. 20(3), 657–669 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  19. Ma, S., Goldfarb, D., Chen, L.: Fixed point and Bregman iterative methods for matrix rank minimization. Math. Program. 128(1), 321–353 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  20. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  21. Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  22. Schmidt, M., Roux, N.L., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in Neural Information Processing Systems, pp. 1458–1466 (2011)

  23. Solodov, M., Svaiter, B.: Error bounds for proximal point subproblems and associated inexact proximal point algorithms. Math. Program. 88(2), 371–389 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  24. Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  25. Allen-Zhu, Z.K.: The first direct acceleration of stochastic gradient methods. In: ACM SIGACT Symposium on Theory of Computing (2017)

  26. Bregman, L.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  27. Teboulle, M.: Convergence of proximal-like algorithms. SIAM J. Optim. 7(4), 1069–1083 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  28. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Dordrecht (2004)

    Book  MATH  Google Scholar 

  29. Auslender, A.: Numerical Methods for Nondifferentiable Convex Optimization, pp. 102–126. Springer, Berlin (1987)

    MATH  Google Scholar 

  30. Lee, Y.J., Mangasarian, O.: SSVM: a smooth support vector machine for classification. Comput. Optim. Appl. 20(1), 5–22 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  31. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  32. Defazio, A., Bach, F., Lacoste-julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014)

  33. Fan, R.E., Lin, C.J.: LIBSVM Data: Classification, Regression and Multi-Label. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets (2011). Accessed 01 April 2018

  34. Jacob, L., Obozinski, G., Vert, J.P.: Group Lasso with overlap and graph Lasso. In: International Conference on Machine Learning, pp. 433–440 (2009)

  35. Mosci, S., Villa, S., Verri, A., Rosasco, L.: A primal–dual algorithm for group sparse regularization with overlapping groups. In: Advances in Neural Information Processing Systems, pp. 2604–2612 (2010)

Download references

Acknowledgements

We are grateful to the anonymous reviewers and the Editor-in-Chief for their meticulous comments and insightful suggestions. Le Thi Khanh Hien would like to give a special thank to Prof. W. B. Haskell for his support. Le Thi Khanh Hien was supported by Grant A*STAR 1421200078.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Le Thi Khanh Hien.

Additional information

Communicated by Gabriel Peyré.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs of Lemmas, Propositions, and Theorems

Appendix: Proofs of Lemmas, Propositions, and Theorems

Proof of Lemma 3.1

We have

$$\begin{aligned} \begin{aligned}&\mathbb {E}\left||{\nabla F(y_{k,s}) - v_k} \right||_*^2 \\&\quad =\mathbb {E}\left||{1/(nq_{i_k})\left( {\nabla f_{i_k}(y_{k,s}) - \nabla f_{i_k}(\tilde{x}_{s-1})} \right) -(\nabla F(y_{k,s})-\nabla F(\tilde{x}_{s-1}))} \right||_*^2 \\&\quad \le \mathbb {E}\left( {\left||{1/(nq_{i_k})\left( {\nabla f_{i_k}(y_{k,s}) - \nabla f_{i_k}(\tilde{x}_{s-1})} \right) } \right||_* + \left||{\nabla F(y_{k,s})-\nabla F(\tilde{x}_{s-1})} \right||_*} \right) ^2 \\&\quad \le 2\mathbb {E}\frac{1}{(nq_{i_k})^2} \left||{\nabla f_{i_k}(y_{k,s})-\nabla f_{i_k}(\tilde{x}_{s-1})} \right||_*^2 +2\left||{\nabla F(y_{k,s})-\nabla F(\tilde{x}_{s-1}) } \right||_*^2. \end{aligned} \end{aligned}$$

\(\square \)

Proof of Lemma 3.2

For notation succinctness, we omit the subscript s when no confusing is caused. Applying Lemma 2.4(1), we have:

$$\begin{aligned} \begin{aligned} F^P(x_k)&=\frac{1}{n}\sum \limits _{i=1}^n f_i(x_k) + P(x_k)\\&\le \frac{1}{n}\sum \limits _{i=1}^n\left( { f_i(y_k) + \left\langle {\nabla f_i(y_k)}, {x_k-y_k} \right\rangle +\frac{L_i}{2}\left||{x_k-y_k} \right||^2} \right) + P(x_k)\\&=F(y_k) + \left\langle {\nabla F(y_k)-v_k}, {x_k-y_k} \right\rangle \\&\quad +\,\frac{L_A}{2}\left||{x_k-y_k} \right||^2 + P(x_k)+\left\langle {v_k}, {x_k-y_k} \right\rangle \\&\le F(y_k)+\frac{2L_Q}{\alpha _3} \left||{x_k-y_k} \right||^2 +\,\frac{\alpha _3}{8 L_Q}\left||{\nabla F(y_k) -v_k} \right||_*^2 \\&\quad +\,\frac{L_A}{2}\left||{x_k-y_k} \right||^2 + P(x_k) +\left\langle {v_k}, {x_k-y_k} \right\rangle , \end{aligned} \end{aligned}$$

where the last inequality uses \(\left\langle {a}, {b} \right\rangle \le \frac{1}{2}\left||{a} \right||_*^2 + \frac{1}{2}\left||{b} \right||^2\). Together with the update rule (4), Lemma 2.1 with \(\sigma =1\), and noting that \(\hat{x}_k - y_k=\alpha _2(z_k-z_{k-1})\), we get:

$$\begin{aligned} F^P(x_k)&\le F(y_k) + \frac{\alpha _3}{8 L_Q}\left||{\nabla F(y_k) -v_k} \right||_*^2 \\&\quad +\,\left\langle {v_k}, {\hat{x}_k-y_k} \right\rangle +\frac{1}{2} \left( {L_A+\frac{4L_Q}{\alpha _3}} \right) \left||{\hat{x}_k-y_k} \right||^2+P(\hat{x}_k) \\&=F(y_k) + \frac{\alpha _3}{8 L_Q}\left||{\nabla F(y_k) -v_k} \right||_*^2+ \alpha _2\left\langle {v_k}, {z_k-z_{k-1}} \right\rangle \\&\quad +\,\frac{1}{2} \overline{L}\alpha _2^2 \left||{z_k-z_{k-1}} \right||^2+P(\hat{x}_k) \\&\le F(y_k) + \frac{\alpha _3}{8 L_Q}\left||{\nabla F(y_k) -v_k} \right||_*^2 \\&\quad +\,\alpha _2\left( {\left\langle {v_k}, {z_k-z_{k-1}} \right\rangle +\theta _s D(z_k,z_{k-1})} \right) +P(\hat{x}_k). \end{aligned}$$

\(\square \)

Proof of Lemma 3.3

We have \(\bar{z}_{k,s}=\arg \min \nolimits _{x\in X_s} \{\phi _k(x)+D(x,z_{k-1,s})\}\), where we let \(\phi _k(x)=\frac{1}{\theta _s }(\left\langle {v_k}, {x} \right\rangle + P(x))\). From Lemma 2.2, for all \(x\in X_s \cap \mathrm {dom}P \), we have:

$$\begin{aligned} \frac{1}{\theta _s}(\left\langle {v_k}, {x} \right\rangle + P(x)) + D(x,z_{k-1,s}) \ge \min \limits _{x\in X_s} \{\phi _k(x) + D(x,z_{k-1,s})\} + D(x,\bar{z}_{k,s}). \end{aligned}$$

Together with \(z_{k,s}\approx _{\varepsilon _{k,s}}\arg \min _{x\in X_s}\theta _s( \phi (x) + D(x,z_{k-1,s}))\), we get:

$$\begin{aligned}&\left\langle {v_k}, {x} \right\rangle + P(x)+ \theta _sD(x,z_{k-1,s})\ge \left\langle {v_k}, {z_{k,s}} \right\rangle +P(z_{k,s}) \nonumber \\&\quad +\,\theta _sD(z_{k,s},z_{k-1,s}) -\varepsilon _{k,s}+ \theta _s D(x,\bar{z}_{k,s}). \end{aligned}$$
(11)

From Lemma 2.3, we get

$$\begin{aligned} D(x,\bar{z}_{k,s})=D(x,z_{k,s})+ D(z_{k,s},\bar{z}_{k,s})-\left\langle {x-z_{k,s}}, {\nabla h(\bar{z}_{k,s})-\nabla h(z_{k,s})} \right\rangle . \end{aligned}$$

Thus, the result follows. \(\square \)

Proof of Proposition 3.1

For notation succinctness, we omit the subscript s when no confusion is caused. Applying Lemma 3.2, we have:

$$\begin{aligned}&F^P(x_k)\le F(y_k) + \frac{\alpha _3}{8 L_Q}\left||{\nabla F(y_k) -v_k} \right||_*^2 \nonumber \\&\quad +\,\alpha _2\left( {\left\langle {v_k}, {z_k-z_{k-1}} \right\rangle + \theta _s D(z_k,z_{k-1})} \right) + P(\hat{x}_k). \end{aligned}$$
(12)

From Inequality (12) and Lemma 3.3, we deduce that:

$$\begin{aligned} \begin{aligned} F^P(x_k)&\le F(y_k) + \frac{\alpha _3}{8 L_Q}\left||{\nabla F(y_k) -v_k} \right||_*^2 \\&\quad +\,\alpha _2( \left\langle {v_k}, {x-z_{k-1}} \right\rangle +P(x)-P(z_k)) + P(\hat{x}_k) \\&\quad +\,\alpha _2 \theta _s (D(x,z_{k-1})-D(x,z_k)-D(z_k,\bar{z}_k) \\&\quad +\,\left\langle {x-z_{k}}, {\nabla h(\bar{z}_{k})-\nabla h(z_{k})} \right\rangle ) +\alpha _2\varepsilon _{k,s}. \end{aligned} \end{aligned}$$
(13)

Note that \(\mathbb {E}_{i_k} [v_k]=\nabla F(y_k)\) (we omit the subscript \(i_k\) of the conditional expectation when it is clear in the context) and \(P(\hat{x}_k) \le \alpha _1 P(x_{k-1})+\alpha _2 P(z_k) + \alpha _3 P(\tilde{x}_{s-1})\). Taking expectation with respected to \(i_k\) conditioned on \(i_{k-1}\), it follows from (13) that:

$$\begin{aligned} \mathbb {E}F^P(x_k)\le & {} F(y_k) + \alpha _3\left( {\frac{1}{8L_Q}\mathbb {E}\left||{\nabla F(y_k) - v_k} \right||_*^2+ \left\langle {\nabla F(y_k)}, {\tilde{x}_{s-1} - y_k} \right\rangle } \right) \nonumber \\&-\,\alpha _3 \left\langle {\nabla F(y_k)}, {\tilde{x}_{s-1} - y_k} \right\rangle + \alpha _2\left\langle {\nabla F(y_k)}, {x-z_{k-1}} \right\rangle +\alpha _2 P(x) \nonumber \\&+\,\alpha _1 P(x_{k-1})+\alpha _3 P(\tilde{x}_{s-1})+\alpha _2\theta _s ( D(x,z_{k-1}) - \mathbb {E}D(x,z_k) ) + r_k.\nonumber \\ \end{aligned}$$
(14)

On the other hand, applying Lemma 3.1, the second inequality of Lemma 2.4, and noting that \(\frac{1}{L_Q n q_i}\le \frac{1}{L_i}\) and \(\frac{1}{L_Q}\le \frac{1}{L_A}\), we have:

$$\begin{aligned} \begin{aligned}&\frac{1}{8L_Q}\mathbb {E}\left||{\nabla F(y_k) - v_k} \right||_*^2+ \left\langle {\nabla F(y_k)}, {\tilde{x}_{s-1} - y_k} \right\rangle \\&\quad \le \frac{1}{4L_Q}\mathbb {E}\frac{1}{(nq_{i_k})^2} \left||{\nabla f_{i_k}(y_k)-\nabla f_{i_k}(\tilde{x}_{s-1})} \right||_*^2 + \frac{1}{4L_Q} \left||{\nabla F(y_k)-\nabla F(\tilde{x}_{s-1})} \right||_*^2\\&\qquad +\,\left\langle {\nabla F(y_k)}, {\tilde{x}_{s-1} - y_k} \right\rangle \\&\quad =\frac{1}{n}\sum \limits _{i=1}^n \frac{1}{4L_Q} \frac{1}{nq_i}\left||{\nabla f_i(y_k)-\nabla f_i(\tilde{x}_{s-1})} \right||_*^2 + \frac{1}{2n}\sum \limits _{i=1}^n \left\langle {\nabla f_i(y_k)}, {\tilde{x}_{s-1} - y_k} \right\rangle \\&\qquad +\,\frac{1}{2 }\left\langle {\nabla F(y_k)}, {\tilde{x}_{s-1} - y_k} \right\rangle +\frac{1}{4L_Q} \left||{\nabla F(y_k)-\nabla F(\tilde{x}_{s-1})} \right||_*^2 \\&\quad \le \frac{1}{2n}\sum \limits _{i=1}^n \left( {\frac{1}{2L_i} \left||{\nabla f_i(y_k)-\nabla f_i(\tilde{x}_{s-1})} \right||_*^2 +\left\langle {\nabla f_i(y_k)}, {\tilde{x}_{s-1} - y_k} \right\rangle } \right) \\&\quad +\,\frac{1}{2}\left( {\frac{1}{2L_A}\left||{\nabla F(y_k)-\nabla F(\tilde{x}_{s-1})} \right||_*^2 + \left\langle {\nabla F(y_k)}, {\tilde{x}_{s-1} - y_k} \right\rangle } \right) \\&\quad \le \frac{1}{2n}\sum \limits _{i=1}^n (f_i(\tilde{x}_{s-1})-f_i(y_k))+\frac{1}{2}\left( {F(\tilde{x}_{s-1}) - F(y_k)} \right) = F(\tilde{x}_{s-1}) - F(y_k). \end{aligned} \end{aligned}$$
(15)

Therefore, (14) and (15) imply that:

$$\begin{aligned} \mathbb {E}F^P(x_k)&\le (1-\alpha _3)F(y_k) + \alpha _3 F^P(\tilde{x}_{s-1}) + \alpha _2\left\langle {\nabla F(y_k)}, {x-y_k} \right\rangle +\alpha _2 P(x) \\&\qquad +\,\alpha _2\left\langle {\nabla F(y_k)}, {y_k-z_{k-1}} \right\rangle - \alpha _3 \left\langle {\nabla F(y_k)}, {\tilde{x}_{s-1} -y_k} \right\rangle +\alpha _1 P(x_{k-1}) \\&\qquad +\,\alpha _2 \theta _s (D(x,z_{k-1}) - \mathbb {E}D(x,z_k))+r_k\\&{\mathop {\le }\limits ^\mathrm{(a)}}(1-\alpha _3)F(y_k) + \alpha _3 F^P(\tilde{x}_{s-1})+\alpha _2(F(x)-F(y_k))+\alpha _2 P(x)\\&\qquad +\,\alpha _1\left\langle {\nabla F(y_k)}, {x_{k-1}-y_k} \right\rangle + \alpha _1 P(x_{k-1})+ \alpha _2 \theta _s (D(x,z_{k-1}) \\&\, \qquad -\,\mathbb {E}D(x,z_k))+r_k\\&{\mathop {\le }\limits ^\mathrm{(b)}}(1-\alpha _3-\alpha _2) F(y_k) + \alpha _3 F^P(\tilde{x}_{s-1})+ \alpha _2 F^P(x) \\&\qquad +\,\alpha _1(F(x_{k-1}) - F(y_k)) + \alpha _1 P(x_{k-1})+ \alpha _2 \theta _s (D(x,z_{k-1}) \\&\, \qquad -\,\mathbb {E}D(x,z_k))+r_k\\&= \alpha _1 F^P(x_{k-1}) + \alpha _2 F^P(x) + \alpha _3 F^P(\tilde{x}_{s-1})+ \alpha _2 \theta _s (D(x,z_{k-1}) \\&\, \qquad -\,\mathbb {E}D(x,z_k))+r_k. \end{aligned}$$

Here, in (a) we use

$$\begin{aligned}&\left\langle {\nabla F(y_k)}, {x-y_k} \right\rangle \le F(x)-F(y_k)\, \text { and} \,\alpha _2(y_k-z_{k-1})-\alpha _3(\tilde{x}_{s-1}-y_k) \\&\quad =\alpha _1(x_{k-1}-y_k), \end{aligned}$$

in (b) we use \(\left\langle {\nabla F(y_k)}, {x_{k-1}-y_k} \right\rangle \le F(x_{k-1})-F(y_k)\). Finally, we take expectation with respect to \(i_{k-1}\) to get the result. \(\square \)

Proof of Proposition 3.2

Applying Proposition 3.1 with \(x=x^*\), we have:

$$\begin{aligned} \begin{aligned} \mathbb {E}(F^P(x_{k,s}) -F^P(x^*))&\le \alpha _{1,s} \mathbb {E}(F^P(x_{k-1,s})-F^P(x^*))+\alpha _{3}(F^P(\tilde{x}_{s-1})-F^P(x^*)) \\&\qquad +\,\alpha _{2,s}^2\overline{L} (\mathbb {E}D(x^*,z_{k-1,s}) - \mathbb {E}D(x^*,z_{k,s}))+ r^*_{k,s}. \end{aligned} \end{aligned}$$

Denote \(d_{k,s}=\mathbb {E}(F^P(x_{k,s}) -F^P(x^*))\), then

$$\begin{aligned} d_{k,s} \le \alpha _{1,s} d_{k-1,s} + \alpha _{3} \tilde{d}_{s-1} + \alpha _{2,s}^2\overline{L} (\mathbb {E}D(x^*,z_{k-1,s}) - \mathbb {E}D(x^*,z_{k,s}))+ r^*_{k,s}, \end{aligned}$$

which implies \( \frac{1}{\alpha _{2,s}^2}d_{k,s} \le \frac{\alpha _{1,s}}{\alpha _{2,s}^2} d_{k-1,s} + \frac{\alpha _{3}}{\alpha _{2,s}^2}\tilde{d}_{s-1} + \overline{L} (\mathbb {E}D(x^*,z_{k-1,s}) - \mathbb {E}D(x^*,z_{k,s}))+\frac{ r^*_{k,s}}{\alpha _{2,s}^2}. \) Summing up this inequality from \(k=1\) to \(k=m\), we get:

$$\begin{aligned} \frac{1}{\alpha _{2,s}^2}d_{m,s}+\frac{1-\alpha _{1,s}}{\alpha _{2,s}^2}\sum \limits _{k=1}^{m-1} d_{k,s}&\le \frac{\alpha _{1,s}}{\alpha _{2,s}^2}d_{0,s}+ \frac{\alpha _{3}}{\alpha _{2,s}^2}m \tilde{d}_{s-1} \\&\, \qquad +\,\overline{L} \left( {\mathbb {E}D(x^*,z_{0,s}) - \mathbb {E}D(x^*,z_{m,s}) } \right) \\&\,\qquad +\,\frac{\sum _{k=1}^m r^*_{k,s}}{\alpha _{2,s}^2}. \end{aligned}$$

Using the update rule (5), \(\alpha _{1,s}+\alpha _{3}=1-\alpha _{2,s}\), \(z_{m,s-1}=z_{0,s}\), and \(d_{m,s-1}=d_{0,s}\), we get:

$$\begin{aligned} \begin{aligned} \frac{1}{\alpha _{2,s}^2}d_{m,s}+\frac{1-\alpha _{1,s}}{\alpha _{2,s}^2}\sum \limits _{k=1}^{m-1} d_{k,s}&\le \frac{1-\alpha _{2,s}}{\alpha _{2,s}^2}d_{m,s-1} + \frac{\alpha _{3}}{\alpha _{2,s}^2} \sum \limits _{k=1}^{m-1} d_{k,s-1} \\&\quad + \overline{L} \left( {\mathbb {E}D(x^*,z_{m,s-1}) - \mathbb {E}D(x^*,z_{m,s})} \right) +\frac{\sum _{k=1}^m r^*_{k,s}}{\alpha _{2,s}^2}. \end{aligned} \end{aligned}$$

Combining with the update rule (2), we obtain:

$$\begin{aligned} \begin{aligned}&\frac{1-\alpha _{2,s+1}}{\alpha _{2,s+1}^2}d_{m,s}+\frac{\alpha _{3}}{\alpha _{2,s+1}^2} \sum \limits _{k=1}^{m-1} d_{k,s} \le \frac{1-\alpha _{2,s}}{\alpha _{2,s}^2} d_{m,s-1} + \frac{\alpha _{3}}{\alpha _{2,s}^2} \sum \limits _{k=1}^{m-1} d_{k,s-1} \\&\qquad +\,\overline{L} \left( {\mathbb {E}D(x^*,z_{m,s-1}) - \mathbb {E}D(x^*,z_{m,s})} \right) +\frac{\sum _{k=1}^m r^*_{k,s}}{\alpha _{2,s}^2}. \end{aligned} \end{aligned}$$
(16)

Therefore,

$$\begin{aligned} \begin{aligned} \frac{\alpha _{3}}{\alpha _{2,s+1}^2} m\tilde{d}_s&{\mathop {\le }\limits ^\mathrm{(a)}}\frac{\alpha _{3}}{\alpha _{2,s+1}^2}\sum \limits _{k=1}^{m} d_{k,s} {\mathop {\le }\limits ^\mathrm{(b)}}\frac{1-\alpha _{2,s+1}}{\alpha _{2,s+1}^2}d_{m,s}+\frac{\alpha _{3}}{\alpha _{2,s+1}^2} \sum \limits _{k=1}^{m-1} d_{k,s}\\&{\mathop {\le }\limits ^\mathrm{(c)}}\frac{1-\alpha _{2,1}}{\alpha _{2,1}^2} d_{m,0}+\frac{\alpha _{3}}{\alpha _{2,1}^2}\sum \limits _{k=1}^{m-1}d_{k,0}+\overline{L} \left( { \mathbb {E}D(x^*,z_{m,0}) - \mathbb {E}D(x^*,z_{m,s})} \right) \\ {}&\qquad +\sum \limits _{i=1}^s\frac{\sum _{k=1}^m r^*_{k,i}}{\alpha _{2,i}^2}, \end{aligned} \end{aligned}$$

where in (a) we use the update rule (5), in (b) we use the property \(\alpha _3 \le 1-\alpha _{2,s+1}\), and in (c) we use the recursive inequality (16). The result then follows. \(\square \)

Proof of Theorem 3.1

Without loss of generality, we can assume that:

$$\begin{aligned} \frac{1}{m}\left( {\frac{4(1-\alpha _{2,1})}{\alpha _{2,1}^2\alpha _3} d_{m,0}+\frac{4}{\alpha _{2,1}^2}\sum \limits _{i=1}^{m-1}d_{i,0}} \right) =O(F^P(\tilde{x}_0)-F^P(x^*)). \end{aligned}$$

When \(\varepsilon _{k,s}=0\), then \(z_{k,s}=\bar{z}_{k,s}\) and we have \(r_{k,s}=0\). The convergence rate of exact ASMD follows from Proposition 3.2 by taking \(\alpha _{2,s}=\frac{2}{s+2}\) and noting that \(D(x^*,z_{m,s})\ge 0\). \(\square \)

Proof of Theorem 3.2

We remind that Inequality (11) holds for all x. Taking \(x=z_{k,s}\), (11) yields that \(D(z_{k,s},\bar{z}_{k,s})\le \frac{\varepsilon _{k,s}}{\theta _s}\). On the other hand, if \( h(\cdot )\) is \(L_h\)-Lipschitz smooth, then:

$$\begin{aligned} \left||{\nabla h(\bar{z}_{k,s})-\nabla h(z_{k,s})} \right||\le L_h \left||{\bar{z}_{k,s}-z_{k,s}} \right||\le L_h\sqrt{2D(z_{k,s},\bar{z}_{k,s})}\le L_h\sqrt{\frac{2\varepsilon _{k,s}}{\theta _s}}. \end{aligned}$$

If \(\left||{z_{k,s}} \right||\le C\), then we let \(C_1=\left||{x^*} \right||+C\). Noting that \(D(z_{k,s},\bar{z}_{k,s})\ge 0\), we have

$$\begin{aligned} \begin{aligned} {r}^*_{k,s}&\le \alpha _{2,s}\theta _s\left||{x^*-z_{k,s}} \right||\left||{\nabla h(\bar{z}_{k,s})-\nabla h(z_{k,s})} \right||+ \alpha _{2,s}\varepsilon _{k,s} \\&\le \alpha _{2,s} C_1 L_h\sqrt{2\epsilon _s\theta _s}+ \alpha _{2,s}\epsilon _s. \end{aligned} \end{aligned}$$

Hence,

$$\begin{aligned} \alpha _{2,s+1}^2\sum \limits _{i=1}^s\sum \limits _{k=1}^m \frac{{r}^*_{k,i}}{m\alpha _3\alpha _{2,i}^2} \le \alpha _{2,s+1}^2 \sum \limits _{i=1}^s\left( { \frac{C_1L_h\sqrt{2\epsilon _i\bar{L}}}{\alpha _3\sqrt{\alpha _{2,i}}}+\frac{\epsilon _i}{\alpha _3\alpha _{2,i}}} \right) . \end{aligned}$$
(17)

If the adaptive inexact rule \(\max \left\{ \left||{\bar{z}_{k,s}} \right||^2\varepsilon _{k,s},C\varepsilon _{k,s}\right\} \le C\epsilon _s\) is chosen, we have

$$\begin{aligned} \begin{aligned} {r}^*_{k,s}&\le \alpha _{2,s}\theta _s\left( {\left||{x^*} \right||+\left||{\bar{z}_{k,s}} \right||+\left||{\bar{z}_{k,s}-z_{k,s}} \right||} \right) \left||{\nabla h(\bar{z}_{k,s})-\nabla h(z_{k,s})} \right||+ \alpha _{2,s}\varepsilon _{k,s}\\&\le \alpha _{2,s} \left||{x^*} \right|| L_h\sqrt{2\epsilon _s\theta _s}+ \alpha _{2,s} L_h \sqrt{2C\epsilon _s\theta _s} + \alpha _{2,s}L_h 2 \epsilon _s+ \alpha _{2,s}\epsilon _{s}. \end{aligned} \end{aligned}$$

In this case, we let \(C_1=\left||{x^*} \right||+\sqrt{C}\). We then have

$$\begin{aligned} \alpha _{2,s+1}^2\sum \limits _{i=1}^s\sum \limits _{k=1}^m \frac{{r}^*_{k,i}}{m\alpha _3\alpha _{2,i}^2} \le \alpha _{2,s+1}^2 \sum \limits _{i=1}^s\left( { \frac{C_1L_h\sqrt{2\epsilon _i\bar{L}}}{\alpha _3\sqrt{\alpha _{2,i}}}+\frac{(2L_h+1)\epsilon _i}{\alpha _3\alpha _{2,i}}} \right) . \end{aligned}$$
(18)

The result then follows from (17), (18), and Proposition 3.2 easily. \(\square \)

Proof of Theorem 3.3

Let \(x^*_\mu \) is the optimal solution of Problem (9). We have:

$$\begin{aligned} \mathbb {E}F^P_\mu (\tilde{x}_{\mu ,s})-F^P_\mu (x^*_\mu )=O\left( {\frac{1+\frac{4\overline{L}_\mu }{m}+\bar{C}}{(s+3)^2}} \right) , \end{aligned}$$
(19)

where \(\bar{C}=O\left( {\sqrt{\bar{L}_\mu }} \right) \), by applying Theorem 3.2. By Assumption 3.1, we have:

$$\begin{aligned} \begin{aligned} \mathbb {E}F^P(\tilde{x}_{\mu ,s})-F^P(x^*)&= \mathbb {E}F(\tilde{x}_{\mu ,s}) + \mathbb {E}P (\tilde{x}_{\mu ,s}) - F(x^*) - P(x^*)\\&\le \mathbb {E}F_\mu (\tilde{x}_{\mu ,s}) + \overline{K} \mu + \mathbb {E}P (\tilde{x}_{\mu ,s}) - F(x^*)-P(x^*) \\&\le \mathbb {E}F_\mu (\tilde{x}_{\mu ,s})+ \overline{K} \mu + \mathbb {E}P (\tilde{x}_{\mu ,s}) -F_\mu (x^*) + \underline{K}\mu -P(x^*) \\&\le \mathbb {E}F^P_\mu (\tilde{x}_{\mu ,s})-F^P_\mu (x^*)+ \left( {\overline{K}+\underline{K}} \right) \mu . \end{aligned} \end{aligned}$$

Together with (19) and noting that \(F^P_\mu (x^*)\ge F^P_\mu (x^*_\mu )\), we get:

$$\begin{aligned}&\mathbb {E}F^P(\tilde{x}_{\mu ,s})-F^P(x^*)\le \mathbb {E}F^P_\mu (\tilde{x}_{\mu ,s})-F^P_\mu (x_\mu ^*)+ \left( {\overline{K}+\underline{K}} \right) \\&\mu =O\left( {\frac{1+\frac{4\overline{L}_\mu }{m}+\bar{C}}{(s+3)^2}} \right) +\left( {\overline{K}+\underline{K}} \right) \mu . \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hien, L.T.K., Nguyen, C.V., Xu, H. et al. Accelerated Randomized Mirror Descent Algorithms for Composite Non-strongly Convex Optimization. J Optim Theory Appl 181, 541–566 (2019). https://doi.org/10.1007/s10957-018-01469-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-018-01469-5

Keywords

Mathematics Subject Classification

Navigation