Skip to main content
Log in

Finding minimax strategy and minimax risk in a random environment (the two-armed bandit problem)

  • Stochastic Systems, Queueing Systems
  • Published:
Automation and Remote Control Aims and scope Submit manuscript

Abstract

Minimax strategy and risk in a stationary random environment are found as Bayesian ones corresponding to the worst prior distribution. For environments with normally distributed incomes with unit variance and expectations that depend only on the alternative selected, this distribution can be chosen to be symmetric and asymptotically uniform. This lets one use numerical methods. The results can be used for systems with parallel data processing, in particular, for controlling environments with distributions other than normal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem (Studies in Automata Theory and Modeling Biological Systems), Moscow: Nauka, 1969.

    MATH  Google Scholar 

  2. Varshavskii, V.I., Kollektivnoe povedenie avtomatov (Collective Behavior of Automata), Moscow: Nauka, 1973.

    Google Scholar 

  3. Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow: Nauka, 1981.

    MATH  Google Scholar 

  4. Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov (Adaptive Choice between Alternatives), Moscow: Nauka, 1986.

    Google Scholar 

  5. Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie po nepolnym dannym (Sequential Control with Incomplete Data), Moscow: Nauka, 1982.

    MATH  Google Scholar 

  6. Berry, D.A. and Fristedt, B., Bandit Problems, London: Chapman and Hall, 1985.

    MATH  Google Scholar 

  7. Robbins, H., Some Aspects of the Sequential Design of Experiments, Bull. AMS, 1952, vol. 58, no. 5, pp. 527–535.

    Article  MathSciNet  MATH  Google Scholar 

  8. Fabius, J. and van Zwet, W.R., Some Remarks on the Two-Armed Bandit, Ann. Math. Stat., 1970, vol. 41, pp. 1906–1916.

    Article  MATH  Google Scholar 

  9. Vogel, W., An Asymptotic Minimax Theorem for the Two-Armed Bandit Problem, Ann. Math. Stat., 1960, vol. 31, pp. 444–451.

    Article  MATH  Google Scholar 

  10. Kolnogorov, A.V., Asymptotic Estimates of Bayesian Risk for a Class of Stationary Environments, in Plenary Talks and Selected Proceedings of the 3rd Int. Conf. on Control Problems, Moscow: Inst. Probl. Upravlen., 2006, pp. 241–248.

    Google Scholar 

  11. Borovkov, A.A., Matematicheskaya statistika. Dopolnitel’nue glavy (Advanced Mathematical Statistics), Moscow: Nauka, 1984.

    Google Scholar 

  12. Kolmogorov, A.N. and Fomin, S.V., Elementy teorii funktsii i funktsional’nogo analiza (Elements of the Theory of Functions and Functional Analysis), Moscow: Nauka, 1972.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Original Russian Text © A.V. Kolnogorov, 2011, published in Avtomatika i Telemekhanika, 2011, No. 5, pp. 127–138.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kolnogorov, A.V. Finding minimax strategy and minimax risk in a random environment (the two-armed bandit problem). Autom Remote Control 72, 1017–1027 (2011). https://doi.org/10.1134/S0005117911050092

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0005117911050092

Keywords

Navigation