Skip to main content

Part of the book series: The IMA Volumes in Mathematics and Its Applications ((IMA,volume 10))

Abstract

Some recent results on asymptotically optimal solutions of bandit problems are reviewed and discussed herein. The problems considered include (a) the classical “closed bandit problem” of adaptive allocation involving k statistical populations, and (b) the “open bandit problem” of priority scheduling in a queueing network. Making use of the interconnections between the discounted and finite-horizon formulations of these problems, we also suggest certain heuristic arguments that lead to simple asymptotic solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Bellman, “A problem in the sequential design of experiments,” Sankhya Ser. A. 16, 221–229 (1956).

    MATH  Google Scholar 

  2. D.A. Berry “A Bernoulli two-armed bandit,” Ann. Math. Statist. 43, 871–897 (1972).

    Article  MathSciNet  MATH  Google Scholar 

  3. F. Chang and T.L. Lai, “Optimal stopping and dynamic allocation,” to appear in Adv. Appl. Probability.

    Google Scholar 

  4. H. Chernoff, “Optimal stochastic control,” Sankhyâ Ser. A. 30, 221–251 (1968).

    MathSciNet  MATH  Google Scholar 

  5. J. Fabius and W.R. Van Zwet, “Some remarks on the two-armed bandit,” Ann. Math. Statist. 41, 1906–1916 (1970).

    Article  MathSciNet  MATH  Google Scholar 

  6. D. Feldman, “Contributions to the two-armed bandit problem,” Ann. Math. Statist. 33, 847–856 (1962).

    Article  MathSciNet  MATH  Google Scholar 

  7. J.C. Gittins, “Bandit processes and dynamic allocation indices,” J. Roy. Statist. Soc. Ser. B 41, 148–177 (1979).

    MathSciNet  MATH  Google Scholar 

  8. J.C. Gittins and D.M. Jones, “A dynamic allocation index for the design of experiments,” Progress in Statistics (Ed. J. Gani et al.), 241–266. North Holland, Amsterdam, 1974.

    Google Scholar 

  9. J.C. Gittins and D.M. Jones, “A dynamic allocation index for the discounted multi-armed bandit problem,” Biometrika 66, 561–565 (1979).

    Article  Google Scholar 

  10. G.P. Klimov, “Time-sharing service systems I,” Theory Probability & Appl. 19, 532–551 (1974).

    Article  MathSciNet  MATH  Google Scholar 

  11. G.P. Klimov, “Time-sharing service systems II,” Theory Probability & Appl. 23, 314–321 (1978).

    Article  Google Scholar 

  12. T.L. Lai, “Boundary crossing problems for sample means,” Columbia Univ. Dept. Statist. Tech. Report, 1985.

    Google Scholar 

  13. T.L. Lai, “Adaptive treatment allocation and the multi-armed bandit problem,” to appear in Ann. Statist.

    Google Scholar 

  14. T.L. Lai and H. Robbins, “Asymptotically efficient adaptive allocation rules,” Adv. Appl. Math. 6, 4–22 (1985).

    Article  MathSciNet  MATH  Google Scholar 

  15. T.L. Lai and Z.Ying, “Open bandit processes and optimal scheduling of queueing networks,” Columbia Univ. Dept. Statist. Tech. Report, 1986.

    Google Scholar 

  16. A. Mandelbaum, “Discrete multi-armed bandits and multi-parameter processes,” Probability Theory & Related Fields 71, 129–147 (1986).

    Article  MathSciNet  Google Scholar 

  17. H. Robbins, “Some aspects of the sequential design of experiments,” Bull. Amer. Math. Soc. 55, 527–535 (1952).

    Article  MathSciNet  Google Scholar 

  18. P.P. Varaiya, J.C. Walrand and C. Buyukkoc, “Extensions of the multiarmed bandit problem: the discounted case,” IEEE Trans. Automat. Contr. 30, 426–439 (1985).

    Article  MathSciNet  MATH  Google Scholar 

  19. P. Whittle, “Multi-armed bandits and the Gittins index,” J. Roy. Statist. Soc. Ser. B 42, 143–149 (1980).

    MathSciNet  MATH  Google Scholar 

  20. P. Whittle, “Arm-acquiring bandits,” Ann. Probability 9, 284–292 (1981).

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1988 Springer-Verlag New York Inc.

About this paper

Cite this paper

Lai, T.L. (1988). Asymptotic Solutions of Bandit Problems. In: Fleming, W., Lions, PL. (eds) Stochastic Differential Systems, Stochastic Control Theory and Applications. The IMA Volumes in Mathematics and Its Applications, vol 10. Springer, New York, NY. https://doi.org/10.1007/978-1-4613-8762-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-8762-6_18

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4613-8764-0

  • Online ISBN: 978-1-4613-8762-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics