Parallel design of robust control in the stochastic environment (the two-armed bandit problem)

Kolnogorov, A. V.

doi:10.1134/S000511791204008X

Parallel design of robust control in the stochastic environment (the two-armed bandit problem)

Robust and Adaptive Systems
Published: 15 April 2012

Volume 73, pages 689–701, (2012)
Cite this article

Automation and Remote Control Aims and scope Submit manuscript

A. V. Kolnogorov¹

78 Accesses
12 Citations
Explore all metrics

Abstract

The problem of rational behavior in the stochastic environment, also known as the two armed bandit problem, is considered in the robust (minimax) setting. A parallel strategy is proposed leading to control, which is arbitrary close to the optimal one for environments with gains having gaussian cumulative distribution functions with unit variance. The invariant recursive equation is obtained for computing the minimax strategy and risk, which are to be found as Bayesian ones associated with the worst-case a priori distribution. As a result, the well-known Vogel’s estimates of the minimax risk can be improved. Numerical experiments show that the strategy is efficient in the environments with non-gaussian distributions, e.g., the binary ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information Considerations in Multi-Person Cooperative Control/Decision Problems: Information Sets, Sufficient Information Flows, and Risk-Averse Decision Rules for Performance Robustness

Robust Designs Through Risk Sensitivity: An Overview

Article 26 October 2021

An asymptotically optimal strategy for constrained multi-armed bandit problems

Article 02 January 2020

References

Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem (Studies in the Automata Theory and Simulation of Biological Systems), Moscow: Nauka, 1969.
MATH Google Scholar
Varshavskii, V.I., Kollektivnoe povedenie avtomatov (Cooperative Behavior of Automata), Moscow: Nauka, 1973.
Google Scholar
Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow: Nauka, 1981.
MATH Google Scholar
Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov (Adaptive Choice of Variants), Moscow: Nauka, 1986.
Google Scholar
Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie po nepolnym dannym (Sequential Control from Incomplete Data), Moscow: Nauka, 1982.
MATH Google Scholar
Berry, D.A. and Fristedt, B., Bandit Problems, London: Chapman and Hall, 1985.
MATH Google Scholar
Vogel, W., An Asymptotic Minimax Theorem for the Two-Armed Bandit Problem, Ann. Math. Stat., 1960, vol. 31, pp. 444–451.
Article MATH Google Scholar
Kolnogorov, A.V., Finding Minimax Strategy and Minimax Risk in a Random Environment (the Two-Armed Bandit Problem), Autom. Remote Control, 2011, vol. 72, no. 5, pp. 1017–1027.
Article MathSciNet MATH Google Scholar
Kolnogorov, A.V., Determination of the Minimax Risk for the Normal Two-Armed Bandit, in Proc. IFAC Workshop “Adaptation and Learning in Control and Signal Processing, ALCOSP-2010,” Antalya, Turkey, August 26–28, 2010, http://www.ifac-papersonline.net.

Download references

Author information

Authors and Affiliations

Yaroslav-the-Wise State University, Novgorod, Russia
A. V. Kolnogorov

Authors

A. V. Kolnogorov
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kolnogorov, A.V. Parallel design of robust control in the stochastic environment (the two-armed bandit problem). Autom Remote Control 73, 689–701 (2012). https://doi.org/10.1134/S000511791204008X

Download citation

Received: 24 November 2010
Published: 15 April 2012
Issue Date: April 2012
DOI: https://doi.org/10.1134/S000511791204008X

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel design of robust control in the stochastic environment (the two-armed bandit problem)

Abstract

Access this article

Similar content being viewed by others

Information Considerations in Multi-Person Cooperative Control/Decision Problems: Information Sets, Sufficient Information Flows, and Risk-Averse Decision Rules for Performance Robustness

Robust Designs Through Risk Sensitivity: An Overview

An asymptotically optimal strategy for constrained multi-armed bandit problems

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parallel design of robust control in the stochastic environment (the two-armed bandit problem)

Abstract

Access this article

Similar content being viewed by others

Information Considerations in Multi-Person Cooperative Control/Decision Problems: Information Sets, Sufficient Information Flows, and Risk-Averse Decision Rules for Performance Robustness

Robust Designs Through Risk Sensitivity: An Overview

An asymptotically optimal strategy for constrained multi-armed bandit problems

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation