Skip to main content
Log in

Stochastic online optimization. Single-point and multi-point non-linear multi-armed bandits. Convex and strongly-convex case

  • Stochastic Systems, Queueing Systems
  • Published:
Automation and Remote Control Aims and scope Submit manuscript

Abstract

In this paper the gradient-free modification of the mirror descent method for convex stochastic online optimization problems is proposed. The crucial assumption in the problem setting is that function realizations are observed with minor noises. The aim of this paper is to derive the convergence rate of the proposed methods and to determine a noise level which does not significantly affect the convergence rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Gasnikov, A.V., Lagunovskaya, A.A., Usmanova, I.N., and Fedorenko, F.A., Gradient-Free Proximal Methods with Inexact Oracle for Convex Stochastic Nonsmooth Optimization Problems on the Simplex, Autom. Remote Control, 2016, vol. 77, no. 11, pp. 2018–2034.

    Article  Google Scholar 

  2. Lugosi, G. and Cesa-Bianchi, N., Prediction, Learning and Games, New York: Cambridge Univ. Press, 2006.

    MATH  Google Scholar 

  3. Agarwal, A., Dekel, O., and Xiao, L., Optimal Algorithm for Online Convex Optimization with Multi- Point Bandit Feedback, COLT, Proc. 23rd Ann. Conf. on Learning Theory, Haifa, 2010, pp. 28–40.

    Google Scholar 

  4. Sridharan, K., Learning from an Optimization Viewpoint, PhD Dissertation, Toyota Technol. Inst. Chicago, 2011, arXiv:1204.4145.

    Google Scholar 

  5. Bubeck, S., Introduction to Online Optimization, Princeton: Princeton Univ., 2011. http://www. princeton.edu/~sbubeck/BubeckLectureNotes.pdf.

    Google Scholar 

  6. Shalev-Shwartz, S., Online Learning and Online Convex Optimization, Foundat. Trends Machine Learning, 2011, vol. 4, no. 2, pp. 107–194. http://www.cs.huji.ac.il/~shais/papers/OLsurvey.

    Article  MATH  Google Scholar 

  7. Bubeck, S. and Cesa-Bianchi, N., Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Foundat. Trends Machine Learning, 2012, vol. 5, no. 1, pp. 1–122. http://www.princeton. edu/~sbubeck/SurveyBCB12.pdf.

    Article  MATH  Google Scholar 

  8. Rakhlin, A. and Sridharan, K., Statistical Learning Theory and Sequential Prediction, e-print, 2014. http://stat.wharton.upenn.edu/~rakhlin/book draft.pdf.

    MATH  Google Scholar 

  9. Hazan, E., Introduction to Online Convex Optimization, e-print, 2015. http://ocobook.cs.princeton.edu/OCObook.pdf.

    Google Scholar 

  10. Gasnikov, A.V., Nesterov, Yu.E., and Spokoinyi, V.G., On Efficiency of One Method for Randomization of the Mirror Descent in the Problems of Online Optimization, Zh. Vychisl. Mat. Mat. Fiz., 2015, vol. 55, no. 4, pp. 55–71.

    Google Scholar 

  11. Duchi, J.C., Jordan, M.I., Wainwright, M.J., and Wibisono, A., Optimal Rates for Zero-order Convex Optimization: The Power of Two Function Evaluations, IEEE Transact. Inform., 2015, vol. 61, no. 5, pp. 2788–2806. http://www.eecs.berkeley.edu/~wainwrig/Papers/DucZero15.pdf.

    Article  MathSciNet  Google Scholar 

  12. Nemirovskii, A.S. and Yudin, D.B., Slozhnost’ zadach i effektivnost’ metodov optimizatsii (Complexity of Problems and Efficiency of the Optimization Methods), Moscow: Nauka, 1979.

    Google Scholar 

  13. Flaxman, A.D., Kalai, A.T., and McCahan, H.B., Online Convex Optimization in the Bandit Setting: Gradient Descent without a Gradient, Proc. 16 Annual ACM-SIAM Sympos. Discret. Algorithm., 2005, pp. 385–394. http://research.microsoft.com/en-us/um/people/adum/publications/2005-Online_Convex_Optimization_in_the_Bandit_Setting.pdf.

    Google Scholar 

  14. Juditsky, A. and Nemirovski, A., First Order Methods for Nonsmooth Convex Large-scale Optimization. I, II, in Optimization for Machine Learning, Sra, S., Nowozin, S., and Wright, S., Eds., Boston: MIT Press, 2012.

    Google Scholar 

  15. Gasnikov, A.V., Dvurechenskii, P.E., and Nesterov, Yu.E., Stochastic Gradient Methods with Inaccurate Oracle, Tr. MFTI, 2016, vol. 8, no, 1. pp. 41–91.

    Google Scholar 

  16. Nemirovski, A., Lectures on Modern Convex Optimization Analysis, Algorithms, and Engineering Applications, Philadelphia: SIAM, 2013. http://www2.isye.gatech.edu/~nemirovs/Lect ModConvOpt.pdf.

    MATH  Google Scholar 

  17. Agarwal, A., Bartlett, P.L., Ravikumar, P., and Wainwright, M.J., Information-theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization, IEEE Transact. Inform., 2012, vol. 58, no. 5, pp. 3235–3249, arXiv:1009.0571.

    Article  MathSciNet  Google Scholar 

  18. Bubeck, S. and Eldan, R., Multi-Scale Exploration of Convex Functions and Bandit Convex Optimization, e-print, 2015. http://research.microsoft.com/en-us/um/people/sebubeck/ConvexBandits.pdf.

    Google Scholar 

  19. Allen-Zhu, Z. and Orecchia, L., Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent, e-print, 2014, arXiv:1407.1537.

    Google Scholar 

  20. Nesterov, Y., Primal-Dual Subgradient Methods for Convex Problems, Math. Program., Ser. B, 2009, vol. 120(1), pp. 261–283.

    Article  MathSciNet  MATH  Google Scholar 

  21. Ledoux, M., Concentration of Measure Phenomenon, Providence: Am. Math. Soc. (Math. Surveys Monogr., vol. 89), 2001.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. V. Gasnikov.

Additional information

Original Russian Text © A.V. Gasnikov, E.A. Krymova, A.A. Lagunovskaya, I.N. Usmanova, F.A. Fedorenko, 2017, published in Avtomatika i Telemekhanika, 2017, No. 2, pp. 36–49.

This paper was recommended for publication by P.S. Shcherbakov, a member of the Editorial Board

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gasnikov, A.V., Krymova, E.A., Lagunovskaya, A.A. et al. Stochastic online optimization. Single-point and multi-point non-linear multi-armed bandits. Convex and strongly-convex case. Autom Remote Control 78, 224–234 (2017). https://doi.org/10.1134/S0005117917020035

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0005117917020035

Keywords

Navigation