Abstract
We consider the minimax setting for the two-armed bandit problem with normally distributed incomes having a priori unknown mathematical expectations and variances. This setting naturally arises in optimization of batch data processing where two alternative processing methods are available with different a priori unknown efficiencies. During the control process, it is required to determine the most efficient method and ensure its predominant application. We use the main theorem of game theory to search for minimax strategy and minimax risk as Bayesian ones corresponding to the worst-case prior distribution. To find them, a recursive integro-difference equation is obtained. We show that batch data processing almost does not increase the minimax risk if the number of batches is large enough.
Similar content being viewed by others
References
Berry, D.A. and Fristedt, B., Bandit Problems: Sequential Allocation of Experiments, London: Chapman & Hall, 1985.
Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie po nepolnym dannym. Baiesovskii podkhod, Moscow: Nauka, 1982. Translated under the title Sequential Control with Incomplete Information, New York: Academic, 1990.
Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem, Moscow: Nauka, 1969. Translated under the title Automaton Theory and Modeling of Biological Systems, New York: Academic, 1973.
Varshavsky, V.I., Kollektivnoe povedenie avtomatov (Collective Behavior of Automata), Moscow: Nauka, 1973. Translated under the title Kollektives Verhalten von Automaten, Warschawski, W.I., Berlin: Akademie, 1978.
Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow: Nauka, 1981. Translated under the title Mathematical Theory of Adaptive Control, Singapore: World Sci., 2006.
Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov: rekurrentnye algoritmy (Adaptive Choice between Alternatives: Recursive Algorithms), Moscow: Nauka, 1986.
Robbins, H., Some Aspects of the Sequential Design of Experiments, Bull. Amer. Math. Soc., 1952, vol. 58, no. 5, pp. 527–535.
Fabius, J. and van Zwet, W.R., Some Remarks on the Two-Armed Bandit, Ann. Math. Statist., 1970, vol. 41, no. 6, pp. 1906–1916.
Vogel, W., An Asymptotic Minimax Theorem for the Two Armed Bandit Problem, Ann. Math. Statist., 1960, vol. 31, no. 2, pp. 444–451.
Bather, J.A., The Minimax Risk for the Two-Armed Bandit Problem, Mathematical Learning Models—Theory and Algorithms, Herkenrath, U., Kalin, D., and Vogel, W., Eds., Lect. Notes Statist, vol. 20, New York: Springer, 1983, pp. 1–11.
Lai, T.L., Levin, B., Robbins, H., and Siegmund, D., Sequential Medical Trials (Stopping Rules/Asymptotic Optimality), Proc. Natl. Acad. Sci. USA, 1980, vol. 77, no. 6, Part 1, pp. 3135–3138.
Cesa-Bianchi, N. and Lugosi, G., Prediction, Learning, and Games, Cambridge: Cambridge Univ. Press, 2006.
Juditsky, A., Nazin, A.V., Tsybakov, A.B., and Vayatis, N., Gap-Free Bounds for Stochastic Multi-Armed Bandit, in Proc. 17th IFAC World Congr., Seoul, Korea, July 6–11, 2008, pp. 11560–11563. Available at http://www.ifac-papersonline.net/Detailed/37644.html.
Gasnikov, A.V., Nesterov, Yu.E., and Spokoiny, V.G., On the Efficiency of a Randomized Mirror Descent Algorithm in Online Optimization Problems, Zh. Vychisl. Mat. Mat. Fiz., 2015, vol. 55, no. 4, pp. 582–598 [Comput. Math. Math. Phys. (Engl. Transl.), 2015, vol. 55, no. 4, pp. 580–596].
Kolnogorov, A.V., Determination of Minimax Strategies and Risk in a Random Environment (the Two-Armed Bandit Problem), Avtomat. i Telemekh., 2011, no. 5, pp. 127–138 [Autom. Remote Control (Engl. Transl.), 2011, vol. 72, no. 5, pp. 1017–1027].
Kolnogorov, A.V., One-Armed Bandit Problem for Parallel Data Processing Systems, Probl. Peredachi Inf., 2015, vol. 51, no. 2, pp. 99–113 [Probl. Inf. Trans. (Engl. Transl.), 2015, vol. 51, no. 2, pp. 177–191].
Oleynikov, A.O., Numerical Optimization of Parallel Processing in a Stationary Environment, Trans. Karelian Res. Centre Russ. Acad. Sci., 2013, no. 1, pp. 73–78.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © A.V. Kolnogorov, 2018, published in Problemy Peredachi Informatsii, 2018, Vol. 54, No. 1, pp. 93–111.
Rights and permissions
About this article
Cite this article
Kolnogorov, A.V. Gaussian Two-Armed Bandit and Optimization of Batch Data Processing. Probl Inf Transm 54, 84–100 (2018). https://doi.org/10.1134/S0032946018010076
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0032946018010076