Automation and Remote Control

, Volume 80, Issue 9, pp 1607–1627 | Cite as

Algorithms of Robust Stochastic Optimization Based on Mirror Descent Method

  • A. V. NazinEmail author
  • A. S. NemirovskyEmail author
  • A. B. TsybakovEmail author
  • A. B. JuditskyEmail author
Topical Issue


We propose an approach to the construction of robust non-Euclidean iterative algorithms by convex composite stochastic optimization based on truncation of stochastic gradients. For such algorithms, we establish sub-Gaussian confidence bounds under weak assumptions about the tails of the noise distribution in convex and strongly convex settings. Robust estimates of the accuracy of general stochastic algorithms are also proposed.


robust iterative algorithms stochastic optimization algorithms convex composite stochastic optimization mirror descent method robust confidence sets 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Nemirovski, A., Juditsky, A., Lan, G., and Shapiro, A., Robust Stochastic Approximation Approach to Stochastic Programming, SIAM J. Optim., 2009, vol. 19, no. 4, pp. 1574–1609.MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Juditsky, A. and Nesterov, Y., Deterministic and Stochastic Primal-Dual Subgradient Algorithms for Uniformly Convex Minimization, Stoch. Syst., 2014, vol. 4, no. 1, pp. 44–80.MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Ghadimi, S. and Lan, G., Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework, SIAM J. Optim., 2012, vol. 22, no. 4, pp. 1469–1492.MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Tukey, J.W., A Survey of Sampling from Contaminated Distributions, in Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling, Olkin, I., et al., Eds., Palo Alto: Stanford Univ. Press, 1960, pp. 448–485.Google Scholar
  5. 5.
    Huber, P.J., Robust Estimation of a Location Parameter, Ann. Math. Statist., 1964, vol. 35, no. 1, pp. 73–101.MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Huber, P.J., Robust Statistics: A Review, Ann. Math. Statist., 1972, vol. 43, no. 4, pp. 1041–1067.MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Huber, P.J., Robust Statistics, New York: Wiley, 1981.CrossRefzbMATHGoogle Scholar
  8. 8.
    Martin, R. and Masreliez, C., Robust Estimation via Stochastic Approximation, IEEE Trans. Inf. Theory, 1975, vol. 21, no. 3, pp. 263–271.MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Polyak, B.T. and Tsypkin, Ya.Z., Adaptive Estimation Algorithms: Convergence, Optimality, Stability, Autom. Remote Control, 1979, vol. 40, no. 3, pp. 378–389.MathSciNetzbMATHGoogle Scholar
  10. 10.
    Polyak, B.T. and Tsypkin, Ya.Z., Robust Pseudogradient Adaptation Algorithms, Autom. Remote Control, 1981, vol. 41, no. 10, pp. 1404–1409.zbMATHGoogle Scholar
  11. 11.
    Polyak, B. and Tsypkin, J.Z., Robust Identification, Automatica, 1980, vol. 16, no. 1, pp. 53–63.MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Price, E. and VandeLinde, V., Robust Estimation Using the Robbins-Monro Stochastic Approximation Algorithm, IEEE Trans. Inf. Theory, 1979, vol. 25, no. 6, pp. 698–704.MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Stankovi´c, S.S. and Kovaˇcevi´c, B.D., Analysis of Robust Stochastic Approximation Algorithms for Process Identification, Automatica, 1986, vol. 22, no. 4, pp. 483–488.CrossRefGoogle Scholar
  14. 14.
    Chen, H.-F., Guo, L., and Gao, A.J., Convergence and Robustness of the Robbins-Monro Algorithm Truncated at Randomly Varying Bounds, Stoch. Proc. Appl., 1987, vol. 27, pp. 217–231.MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Chen, H.-F. and Gao, A.J., Robustness Analysis for Stochastic Approximation Algorithms, Stochast. Stochast. Rep., 1989, vol. 26, no. 1, pp. 3–20.MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Nazin, A.V., Polyak, B.T., and Tsybakov, A.B., Optimal and Robust Kernel Algorithms for Passive Stochastic Approximation, IEEE Trans. Inf. Theory, 1992, vol. 38, no. 5, pp. 1577–1583.MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Tsypkin, Ya.Z., Osnovy Informatsionnoi Teorii Identifikatsii (On the Foundations of Information and Identification Theory), Moscow: Nauka, 1984.zbMATHGoogle Scholar
  18. 18.
    Tsypkin, Ya.Z., Informatsionnaya Teoriya Identifikatsii (Information Identification Theory), Moscow: Nauka, 1995.zbMATHGoogle Scholar
  19. 19.
    Kwon, J., Lecu´e, G., and Lerasle, M., Median of Means Principle as a Divide-and-Conquer Procedure for Robustness, Sub-Sampling and Hyper-parameters Tuning, 2018, arXiv:1812.02435.Google Scholar
  20. 20.
    Chinot, G., Lecu´e, G., and Lerasle, M., Statistical Learning with Lipschitz and Convex Loss Functions, 2018, arXiv preprint arXiv:1810.01090.Google Scholar
  21. 21.
    Lecu´e, G. and Lerasle, M., Robust Machine Learning by Median-of-Means: Theory and Practice, 2017, arXiv preprint arXiv:1711.10306v2. Annals of Stat., to appear.Google Scholar
  22. 22.
    Lecu´e, G., Lerasle, M., and Mathieu, T., Robust Classification via MOM Minimization, 2018, arXiv preprint arXiv:1808.03106.Google Scholar
  23. 23.
    Lerasle, M. and Oliveira, R.I., Robust Empirical Mean Estimators, 2011, arXiv preprint arXiv:1112.3914.Google Scholar
  24. 24.
    Lugosi, G. and Mendelson, S., Risk Minimization by Median-of-Means Tournaments, 2016, arXiv preprint arXiv:1608.00757.Google Scholar
  25. 25.
    Lugosi, G. and Mendelson, S., Regularization, Sparse Recovery, and Median-of-Means Tournaments, 2017, arXiv preprint arXiv:1701.04112.Google Scholar
  26. 26.
    Lugosi, G. and Mendelson, S., Near-Optimal Mean Estimators with Respect to General Norms, 2018, arXiv preprint arXiv:1806.06233.Google Scholar
  27. 27.
    Hsu, D. and Sabato, S., Loss Minimization and Parameter Estimation with Heavy Tails, J. Machine Learning Res., 2016, vol. 17, no. 1, pp. 543–582.MathSciNetzbMATHGoogle Scholar
  28. 28.
    Bubeck, S., Cesa-Bianchi, N., and Lugosi, G., Bandits with Heavy Tail, IEEE Trans. Inf. Theory, 2013, vol. 59, no. 11, pp. 7711–7717.MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Devroye, L., Lerasle, M., Lugosi, G., and Oliveira, R.I., Sub-Gaussian Mean Estimators, Ann. Stat., 2016, vol. 44, no. 6, pp. 2695–2725.MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Nemirovskii, A.S. and Yudin, D.B., Slozhnost’ zadach i effektivnost’ metodov optimizatsii, Moscow: Nauka, 1979. Translated under the title Problem Complexity and Method Efficiency in Optimization, Chichester: Wiley, 1983.Google Scholar
  31. 31.
    Lugosi, G. and Mendelson, S., Sub-Gaussian Estimators of the Mean of a Random Vector, Ann. Stat., 2019, vol. 47, no. 2, pp. 783–794.MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Catoni, O., Challenging the Empirical Mean and Empirical Variance: A Deviation Study, Ann. IHP: Probab. Stat., 2012, vol. 48, no. 4, pp. 1148–1185.MathSciNetzbMATHGoogle Scholar
  33. 33.
    Audibert, J.-Y. and Catoni, O., Robust Linear Least Squares Regression, Ann. Stat., 2011, vol. 39, no. 5, pp. 2766–2794.MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Minsker, S., Geometric Median and Robust Estimation in Banach Spaces, Bernoulli, 2015, vol. 21, no. 4, pp. 2308–2335.MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    Wei, X. and Minsker, S., Estimation of the Covariance Structure of Heavy-Tailed Distributions, in Advances in Neural Information Processing Systems, 2017, pp. 2859–2868.Google Scholar
  36. 36.
    Chen, Y., Su, L., and Xu, J., Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent, Proc. ACM Measur. Analys. Comput. Syst., 2017, vol. 1, no. 2, article no. 44.Google Scholar
  37. 37.
    Yin, D., Chen, Y., Ramchandran, K., and Bartlett, P., Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates, 2018, arXiv preprint arXiv:1803.01498.Google Scholar
  38. 38.
    Cardot, H., C´enac, P., and Chaouch, M., Stochastic Approximation for Multivariate and Functional Median, in Proc. COMPSTAT’2010, Springer, 2010, pp. 421–428.Google Scholar
  39. 39.
    Cardot, H., C´enac, P., and Godichon-Baggioni, A., Online Estimation of the Geometric Median in Hilbert Spaces: Nonasymptotic Confidence Balls, Ann. Stat., 2017, vol. 45, no. 2, pp. 591–614.MathSciNetCrossRefzbMATHGoogle Scholar
  40. 40.
    Lan, G., An Optimal Method for Stochastic Composite Optimization, Math. Program., 2012, vol. 133, nos. 1–2, pp. 365–397.MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    Necoara, I., Nesterov, Y., and Glineur, F., Linear Convergence of First Order Methods for Non-Strongly Convex Optimization, Math. Program., 2018, pp. 1–39.Google Scholar
  42. 42.
    Juditsky, A. and Nemirovski, A., First Order Methods for Nonsmooth Convex Large-Scale Optimization, I: General Purpose Methods, in Optimization for Machine Learning, Sra, S., Nowozin, S., and Wright, S.J., Eds., Boston: MIT Press, 2011, pp. 121–148.Google Scholar
  43. 43.
    Freedman, D.A., On Tail Probabilities for Martingales, Ann. Probab., 1975, vol. 3, no. 1, pp. 100–118.MathSciNetCrossRefzbMATHGoogle Scholar
  44. 44.
    Chen, G. and Teboulle, M., Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions, SIAM J. Optim., 1993, vol. 3, no. 3, pp. 538–543.MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Pleiades Publishing, Ltd. 2019

Authors and Affiliations

  1. 1.Trapeznikov Institute of Control SciencesRussian Academy of SciencesMoscowRussia
  2. 2.Georgia Institute of TechnologyAtlantaUSA
  3. 3.CREST, ENSAEParisFrance
  4. 4.Université Grenoble AlpesGrenobleFrance

Personalised recommendations