Abstract
The problem of rational behavior in the stochastic environment, also known as the two armed bandit problem, is considered in the robust (minimax) setting. A parallel strategy is proposed leading to control, which is arbitrary close to the optimal one for environments with gains having gaussian cumulative distribution functions with unit variance. The invariant recursive equation is obtained for computing the minimax strategy and risk, which are to be found as Bayesian ones associated with the worst-case a priori distribution. As a result, the well-known Vogel’s estimates of the minimax risk can be improved. Numerical experiments show that the strategy is efficient in the environments with non-gaussian distributions, e.g., the binary ones.
Similar content being viewed by others
References
Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem (Studies in the Automata Theory and Simulation of Biological Systems), Moscow: Nauka, 1969.
Varshavskii, V.I., Kollektivnoe povedenie avtomatov (Cooperative Behavior of Automata), Moscow: Nauka, 1973.
Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow: Nauka, 1981.
Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov (Adaptive Choice of Variants), Moscow: Nauka, 1986.
Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie po nepolnym dannym (Sequential Control from Incomplete Data), Moscow: Nauka, 1982.
Berry, D.A. and Fristedt, B., Bandit Problems, London: Chapman and Hall, 1985.
Vogel, W., An Asymptotic Minimax Theorem for the Two-Armed Bandit Problem, Ann. Math. Stat., 1960, vol. 31, pp. 444–451.
Kolnogorov, A.V., Finding Minimax Strategy and Minimax Risk in a Random Environment (the Two-Armed Bandit Problem), Autom. Remote Control, 2011, vol. 72, no. 5, pp. 1017–1027.
Kolnogorov, A.V., Determination of the Minimax Risk for the Normal Two-Armed Bandit, in Proc. IFAC Workshop “Adaptation and Learning in Control and Signal Processing, ALCOSP-2010,” Antalya, Turkey, August 26–28, 2010, http://www.ifac-papersonline.net.
Author information
Authors and Affiliations
Additional information
Original Russian Text © A.V. Kolnogorov, 2012, published in Avtomatika i Telemekhanika, 2012, no. 4, pp. 114–130.
Rights and permissions
About this article
Cite this article
Kolnogorov, A.V. Parallel design of robust control in the stochastic environment (the two-armed bandit problem). Autom Remote Control 73, 689–701 (2012). https://doi.org/10.1134/S000511791204008X
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S000511791204008X