Skip to main content
Log in

Gaussian Two-Armed Bandit and Optimization of Batch Data Processing

  • Large Systems
  • Published:
Problems of Information Transmission Aims and scope Submit manuscript

Abstract

We consider the minimax setting for the two-armed bandit problem with normally distributed incomes having a priori unknown mathematical expectations and variances. This setting naturally arises in optimization of batch data processing where two alternative processing methods are available with different a priori unknown efficiencies. During the control process, it is required to determine the most efficient method and ensure its predominant application. We use the main theorem of game theory to search for minimax strategy and minimax risk as Bayesian ones corresponding to the worst-case prior distribution. To find them, a recursive integro-difference equation is obtained. We show that batch data processing almost does not increase the minimax risk if the number of batches is large enough.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Berry, D.A. and Fristedt, B., Bandit Problems: Sequential Allocation of Experiments, London: Chapman & Hall, 1985.

    Book  MATH  Google Scholar 

  2. Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie po nepolnym dannym. Baiesovskii podkhod, Moscow: Nauka, 1982. Translated under the title Sequential Control with Incomplete Information, New York: Academic, 1990.

    MATH  Google Scholar 

  3. Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem, Moscow: Nauka, 1969. Translated under the title Automaton Theory and Modeling of Biological Systems, New York: Academic, 1973.

    MATH  Google Scholar 

  4. Varshavsky, V.I., Kollektivnoe povedenie avtomatov (Collective Behavior of Automata), Moscow: Nauka, 1973. Translated under the title Kollektives Verhalten von Automaten, Warschawski, W.I., Berlin: Akademie, 1978.

    Google Scholar 

  5. Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow: Nauka, 1981. Translated under the title Mathematical Theory of Adaptive Control, Singapore: World Sci., 2006.

    MATH  Google Scholar 

  6. Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov: rekurrentnye algoritmy (Adaptive Choice between Alternatives: Recursive Algorithms), Moscow: Nauka, 1986.

    Google Scholar 

  7. Robbins, H., Some Aspects of the Sequential Design of Experiments, Bull. Amer. Math. Soc., 1952, vol. 58, no. 5, pp. 527–535.

    Article  MathSciNet  MATH  Google Scholar 

  8. Fabius, J. and van Zwet, W.R., Some Remarks on the Two-Armed Bandit, Ann. Math. Statist., 1970, vol. 41, no. 6, pp. 1906–1916.

    Article  MathSciNet  MATH  Google Scholar 

  9. Vogel, W., An Asymptotic Minimax Theorem for the Two Armed Bandit Problem, Ann. Math. Statist., 1960, vol. 31, no. 2, pp. 444–451.

    Article  MathSciNet  MATH  Google Scholar 

  10. Bather, J.A., The Minimax Risk for the Two-Armed Bandit Problem, Mathematical Learning Models—Theory and Algorithms, Herkenrath, U., Kalin, D., and Vogel, W., Eds., Lect. Notes Statist, vol. 20, New York: Springer, 1983, pp. 1–11.

    Article  MathSciNet  MATH  Google Scholar 

  11. Lai, T.L., Levin, B., Robbins, H., and Siegmund, D., Sequential Medical Trials (Stopping Rules/Asymptotic Optimality), Proc. Natl. Acad. Sci. USA, 1980, vol. 77, no. 6, Part 1, pp. 3135–3138.

    Article  MATH  Google Scholar 

  12. Cesa-Bianchi, N. and Lugosi, G., Prediction, Learning, and Games, Cambridge: Cambridge Univ. Press, 2006.

    Book  MATH  Google Scholar 

  13. Juditsky, A., Nazin, A.V., Tsybakov, A.B., and Vayatis, N., Gap-Free Bounds for Stochastic Multi-Armed Bandit, in Proc. 17th IFAC World Congr., Seoul, Korea, July 6–11, 2008, pp. 11560–11563. Available at http://www.ifac-papersonline.net/Detailed/37644.html.

  14. Gasnikov, A.V., Nesterov, Yu.E., and Spokoiny, V.G., On the Efficiency of a Randomized Mirror Descent Algorithm in Online Optimization Problems, Zh. Vychisl. Mat. Mat. Fiz., 2015, vol. 55, no. 4, pp. 582–598 [Comput. Math. Math. Phys. (Engl. Transl.), 2015, vol. 55, no. 4, pp. 580–596].

    MathSciNet  MATH  Google Scholar 

  15. Kolnogorov, A.V., Determination of Minimax Strategies and Risk in a Random Environment (the Two-Armed Bandit Problem), Avtomat. i Telemekh., 2011, no. 5, pp. 127–138 [Autom. Remote Control (Engl. Transl.), 2011, vol. 72, no. 5, pp. 1017–1027].

    MathSciNet  MATH  Google Scholar 

  16. Kolnogorov, A.V., One-Armed Bandit Problem for Parallel Data Processing Systems, Probl. Peredachi Inf., 2015, vol. 51, no. 2, pp. 99–113 [Probl. Inf. Trans. (Engl. Transl.), 2015, vol. 51, no. 2, pp. 177–191].

    MathSciNet  MATH  Google Scholar 

  17. Oleynikov, A.O., Numerical Optimization of Parallel Processing in a Stationary Environment, Trans. Karelian Res. Centre Russ. Acad. Sci., 2013, no. 1, pp. 73–78.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. V. Kolnogorov.

Additional information

Original Russian Text © A.V. Kolnogorov, 2018, published in Problemy Peredachi Informatsii, 2018, Vol. 54, No. 1, pp. 93–111.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kolnogorov, A.V. Gaussian Two-Armed Bandit and Optimization of Batch Data Processing. Probl Inf Transm 54, 84–100 (2018). https://doi.org/10.1134/S0032946018010076

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0032946018010076

Navigation