A Framework of Convergence Analysis of Mini-batch Stochastic Projected Gradient Methods

  • Jian Gu
  • Xian-Tao XiaoEmail author


In this paper, we establish a unified framework to study the almost sure global convergence and the expected convergence rates of a class of mini-batch stochastic (projected) gradient (SG) methods, including two popular types of SG: stepsize diminished SG and batch size increased SG. We also show that the standard variance uniformly bounded assumption, which is frequently used in the literature to investigate the convergence of SG, is actually not required when the gradient of the objective function is Lipschitz continuous. Finally, we show that our framework can also be used for analyzing the convergence of a mini-batch stochastic extragradient method for stochastic variational inequality.


Stochastic projected gradient method Variance uniformly bounded Convergence analysis 

Mathematics Subject Classification

62L20 90C25 90C15 



The authors would like to thank the referees for their valuable comments which are helpful to improve this manuscript.


  1. 1.
    Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, New York (2014)CrossRefGoogle Scholar
  2. 2.
    Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Kushner, H.J., Yin, G.G.: Stochastic approximation and recursive algorithms and applications. Springer, New York (2003) Google Scholar
  5. 5.
    Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3), A1380–A1405 (2012)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection in optimization methods for machine learning. Math. Program. 134(1, Ser. B), 127–155 (2012)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends Mach. Learn. 8(3–4), 231–357 (2015)CrossRefGoogle Scholar
  9. 9.
    Ghadimi, S., Lan, G.: Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Nedić, A., Lee, S.: On stochastic subgradient mirror-descent algorithm with weighted averaging. SIAM J. Optim. 24(1), 84–107 (2014)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. 155(1–2, Ser. A), 267–305 (2016)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Iusem, A.N., Jofré, A., Oliveira, R.I., Thompson, P.: Extragradient method with variance reduction for stochastic variational inequalities. SIAM J. Optim. 27(2), 686–724 (2017)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Jofré, A., Thompson, P.: On variance reduction for stochastic smooth convex optimization with multiplicative noise (2017). arXiv:1705.02969
  14. 14.
    Nguyen, L., Nguyen, P.H., van Dijk, M., Richtarik, P., Scheinberg, K., Takac, M.: SGD and hogwild! Convergence without the bounded gradients assumption. In: Proceedings of the 35th international conference on machine learning, pp. 3750–3758 (2018)Google Scholar
  15. 15.
    Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 315–323 (2013) Google Scholar
  16. 16.
    Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2, Ser. A), 83–112 (2017)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014)Google Scholar
  18. 18.
    Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th international conference on machine learning, pp. 2613–2621 (2017)Google Scholar
  19. 19.
    Korpelevich, G.M.: The extragradient method for finding saddle points and other problems. Ekon. Mat. Metody 12, 747–756 (1976)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Facchinei, F., Pang, J.S.: Finite-dimensional variational inequalities and complementarity problems, vol. I. Springer Series in Operations Research. Springer, New York (2003)Google Scholar
  21. 21.
    Robbins, H., Siegmund, D.: A convergence theorem for non negative almost supermartingales and some applications. In: Proceedings of a Symposium Held at the Center for Tomorrow, the Ohio State University, pp. 233–257 (1971) Google Scholar
  22. 22.
    Billingsley, P.: Probability and measure, 3rd edn. Wiley Series in Probability and Mathematical Statistics. Wiley, New York (1995)Google Scholar

Copyright information

© Operations Research Society of China, Periodicals Agency of Shanghai University, Science Press, and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of SciencesDalian Ocean UniversityDalianChina
  2. 2.School of Mathematical SciencesDalian University of TechnologyDalianChina

Personalised recommendations