Skip to main content
Log in

Parallel design of robust control in the stochastic environment (the two-armed bandit problem)

  • Robust and Adaptive Systems
  • Published:
Automation and Remote Control Aims and scope Submit manuscript

Abstract

The problem of rational behavior in the stochastic environment, also known as the two armed bandit problem, is considered in the robust (minimax) setting. A parallel strategy is proposed leading to control, which is arbitrary close to the optimal one for environments with gains having gaussian cumulative distribution functions with unit variance. The invariant recursive equation is obtained for computing the minimax strategy and risk, which are to be found as Bayesian ones associated with the worst-case a priori distribution. As a result, the well-known Vogel’s estimates of the minimax risk can be improved. Numerical experiments show that the strategy is efficient in the environments with non-gaussian distributions, e.g., the binary ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem (Studies in the Automata Theory and Simulation of Biological Systems), Moscow: Nauka, 1969.

    MATH  Google Scholar 

  2. Varshavskii, V.I., Kollektivnoe povedenie avtomatov (Cooperative Behavior of Automata), Moscow: Nauka, 1973.

    Google Scholar 

  3. Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow: Nauka, 1981.

    MATH  Google Scholar 

  4. Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov (Adaptive Choice of Variants), Moscow: Nauka, 1986.

    Google Scholar 

  5. Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie po nepolnym dannym (Sequential Control from Incomplete Data), Moscow: Nauka, 1982.

    MATH  Google Scholar 

  6. Berry, D.A. and Fristedt, B., Bandit Problems, London: Chapman and Hall, 1985.

    MATH  Google Scholar 

  7. Vogel, W., An Asymptotic Minimax Theorem for the Two-Armed Bandit Problem, Ann. Math. Stat., 1960, vol. 31, pp. 444–451.

    Article  MATH  Google Scholar 

  8. Kolnogorov, A.V., Finding Minimax Strategy and Minimax Risk in a Random Environment (the Two-Armed Bandit Problem), Autom. Remote Control, 2011, vol. 72, no. 5, pp. 1017–1027.

    Article  MathSciNet  MATH  Google Scholar 

  9. Kolnogorov, A.V., Determination of the Minimax Risk for the Normal Two-Armed Bandit, in Proc. IFAC Workshop “Adaptation and Learning in Control and Signal Processing, ALCOSP-2010,” Antalya, Turkey, August 26–28, 2010, http://www.ifac-papersonline.net.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Original Russian Text © A.V. Kolnogorov, 2012, published in Avtomatika i Telemekhanika, 2012, no. 4, pp. 114–130.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kolnogorov, A.V. Parallel design of robust control in the stochastic environment (the two-armed bandit problem). Autom Remote Control 73, 689–701 (2012). https://doi.org/10.1134/S000511791204008X

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S000511791204008X

Keywords

Navigation