Abstract
Minimax strategy and risk in a stationary random environment are found as Bayesian ones corresponding to the worst prior distribution. For environments with normally distributed incomes with unit variance and expectations that depend only on the alternative selected, this distribution can be chosen to be symmetric and asymptotically uniform. This lets one use numerical methods. The results can be used for systems with parallel data processing, in particular, for controlling environments with distributions other than normal.
Similar content being viewed by others
References
Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem (Studies in Automata Theory and Modeling Biological Systems), Moscow: Nauka, 1969.
Varshavskii, V.I., Kollektivnoe povedenie avtomatov (Collective Behavior of Automata), Moscow: Nauka, 1973.
Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow: Nauka, 1981.
Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov (Adaptive Choice between Alternatives), Moscow: Nauka, 1986.
Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie po nepolnym dannym (Sequential Control with Incomplete Data), Moscow: Nauka, 1982.
Berry, D.A. and Fristedt, B., Bandit Problems, London: Chapman and Hall, 1985.
Robbins, H., Some Aspects of the Sequential Design of Experiments, Bull. AMS, 1952, vol. 58, no. 5, pp. 527–535.
Fabius, J. and van Zwet, W.R., Some Remarks on the Two-Armed Bandit, Ann. Math. Stat., 1970, vol. 41, pp. 1906–1916.
Vogel, W., An Asymptotic Minimax Theorem for the Two-Armed Bandit Problem, Ann. Math. Stat., 1960, vol. 31, pp. 444–451.
Kolnogorov, A.V., Asymptotic Estimates of Bayesian Risk for a Class of Stationary Environments, in Plenary Talks and Selected Proceedings of the 3rd Int. Conf. on Control Problems, Moscow: Inst. Probl. Upravlen., 2006, pp. 241–248.
Borovkov, A.A., Matematicheskaya statistika. Dopolnitel’nue glavy (Advanced Mathematical Statistics), Moscow: Nauka, 1984.
Kolmogorov, A.N. and Fomin, S.V., Elementy teorii funktsii i funktsional’nogo analiza (Elements of the Theory of Functions and Functional Analysis), Moscow: Nauka, 1972.
Author information
Authors and Affiliations
Additional information
Original Russian Text © A.V. Kolnogorov, 2011, published in Avtomatika i Telemekhanika, 2011, No. 5, pp. 127–138.
Rights and permissions
About this article
Cite this article
Kolnogorov, A.V. Finding minimax strategy and minimax risk in a random environment (the two-armed bandit problem). Autom Remote Control 72, 1017–1027 (2011). https://doi.org/10.1134/S0005117911050092
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0005117911050092