Finding minimax strategy and minimax risk in a random environment (the two-armed bandit problem)

Kolnogorov, A. V.

doi:10.1134/S0005117911050092

Finding minimax strategy and minimax risk in a random environment (the two-armed bandit problem)

Stochastic Systems, Queueing Systems
Published: 26 May 2011

Volume 72, pages 1017–1027, (2011)
Cite this article

Automation and Remote Control Aims and scope Submit manuscript

A. V. Kolnogorov¹

75 Accesses
5 Citations
Explore all metrics

Abstract

Minimax strategy and risk in a stationary random environment are found as Bayesian ones corresponding to the worst prior distribution. For environments with normally distributed incomes with unit variance and expectations that depend only on the alternative selected, this distribution can be chosen to be symmetric and asymptotically uniform. This lets one use numerical methods. The results can be used for systems with parallel data processing, in particular, for controlling environments with distributions other than normal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Targeting a Simple Statistical Bandit Problem

One-armed bandit problem for parallel data processing systems

Article 01 April 2015

The Stochastic Multi-Armed Bandit Problem

References

Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem (Studies in Automata Theory and Modeling Biological Systems), Moscow: Nauka, 1969.
MATH Google Scholar
Varshavskii, V.I., Kollektivnoe povedenie avtomatov (Collective Behavior of Automata), Moscow: Nauka, 1973.
Google Scholar
Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow: Nauka, 1981.
MATH Google Scholar
Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov (Adaptive Choice between Alternatives), Moscow: Nauka, 1986.
Google Scholar
Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie po nepolnym dannym (Sequential Control with Incomplete Data), Moscow: Nauka, 1982.
MATH Google Scholar
Berry, D.A. and Fristedt, B., Bandit Problems, London: Chapman and Hall, 1985.
MATH Google Scholar
Robbins, H., Some Aspects of the Sequential Design of Experiments, Bull. AMS, 1952, vol. 58, no. 5, pp. 527–535.
Article MathSciNet MATH Google Scholar
Fabius, J. and van Zwet, W.R., Some Remarks on the Two-Armed Bandit, Ann. Math. Stat., 1970, vol. 41, pp. 1906–1916.
Article MATH Google Scholar
Vogel, W., An Asymptotic Minimax Theorem for the Two-Armed Bandit Problem, Ann. Math. Stat., 1960, vol. 31, pp. 444–451.
Article MATH Google Scholar
Kolnogorov, A.V., Asymptotic Estimates of Bayesian Risk for a Class of Stationary Environments, in Plenary Talks and Selected Proceedings of the 3rd Int. Conf. on Control Problems, Moscow: Inst. Probl. Upravlen., 2006, pp. 241–248.
Google Scholar
Borovkov, A.A., Matematicheskaya statistika. Dopolnitel’nue glavy (Advanced Mathematical Statistics), Moscow: Nauka, 1984.
Google Scholar
Kolmogorov, A.N. and Fomin, S.V., Elementy teorii funktsii i funktsional’nogo analiza (Elements of the Theory of Functions and Functional Analysis), Moscow: Nauka, 1972.
Google Scholar

Download references

Author information

Authors and Affiliations

Yaroslav-the-Wise State University, Novgorod, Russia
A. V. Kolnogorov

Authors

A. V. Kolnogorov
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kolnogorov, A.V. Finding minimax strategy and minimax risk in a random environment (the two-armed bandit problem). Autom Remote Control 72, 1017–1027 (2011). https://doi.org/10.1134/S0005117911050092

Download citation

Received: 22 March 2010
Published: 26 May 2011
Issue Date: May 2011
DOI: https://doi.org/10.1134/S0005117911050092

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding minimax strategy and minimax risk in a random environment (the two-armed bandit problem)

Abstract

Access this article

Similar content being viewed by others

Targeting a Simple Statistical Bandit Problem

One-armed bandit problem for parallel data processing systems

The Stochastic Multi-Armed Bandit Problem

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Finding minimax strategy and minimax risk in a random environment (the two-armed bandit problem)

Abstract

Access this article

Similar content being viewed by others

Targeting a Simple Statistical Bandit Problem

One-armed bandit problem for parallel data processing systems

The Stochastic Multi-Armed Bandit Problem

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation