Asymptotic Solutions of Bandit Problems

Lai, T. L.

doi:10.1007/978-1-4613-8762-6_18

T. L. Lai³

Part of the book series: The IMA Volumes in Mathematics and Its Applications ((IMA,volume 10))

1273 Accesses
2 Citations

Abstract

Some recent results on asymptotically optimal solutions of bandit problems are reviewed and discussed herein. The problems considered include (a) the classical “closed bandit problem” of adaptive allocation involving k statistical populations, and (b) the “open bandit problem” of priority scheduling in a queueing network. Making use of the interconnections between the discounted and finite-horizon formulations of these problems, we also suggest certain heuristic arguments that lead to simple asymptotic solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Bellman, “A problem in the sequential design of experiments,” Sankhya Ser. A. 16, 221–229 (1956).
MATH Google Scholar
D.A. Berry “A Bernoulli two-armed bandit,” Ann. Math. Statist. 43, 871–897 (1972).
Article MathSciNet MATH Google Scholar
F. Chang and T.L. Lai, “Optimal stopping and dynamic allocation,” to appear in Adv. Appl. Probability.
Google Scholar
H. Chernoff, “Optimal stochastic control,” Sankhyâ Ser. A. 30, 221–251 (1968).
MathSciNet MATH Google Scholar
J. Fabius and W.R. Van Zwet, “Some remarks on the two-armed bandit,” Ann. Math. Statist. 41, 1906–1916 (1970).
Article MathSciNet MATH Google Scholar
D. Feldman, “Contributions to the two-armed bandit problem,” Ann. Math. Statist. 33, 847–856 (1962).
Article MathSciNet MATH Google Scholar
J.C. Gittins, “Bandit processes and dynamic allocation indices,” J. Roy. Statist. Soc. Ser. B 41, 148–177 (1979).
MathSciNet MATH Google Scholar
J.C. Gittins and D.M. Jones, “A dynamic allocation index for the design of experiments,” Progress in Statistics (Ed. J. Gani et al.), 241–266. North Holland, Amsterdam, 1974.
Google Scholar
J.C. Gittins and D.M. Jones, “A dynamic allocation index for the discounted multi-armed bandit problem,” Biometrika 66, 561–565 (1979).
Article Google Scholar
G.P. Klimov, “Time-sharing service systems I,” Theory Probability & Appl. 19, 532–551 (1974).
Article MathSciNet MATH Google Scholar
G.P. Klimov, “Time-sharing service systems II,” Theory Probability & Appl. 23, 314–321 (1978).
Article Google Scholar
T.L. Lai, “Boundary crossing problems for sample means,” Columbia Univ. Dept. Statist. Tech. Report, 1985.
Google Scholar
T.L. Lai, “Adaptive treatment allocation and the multi-armed bandit problem,” to appear in Ann. Statist.
Google Scholar
T.L. Lai and H. Robbins, “Asymptotically efficient adaptive allocation rules,” Adv. Appl. Math. 6, 4–22 (1985).
Article MathSciNet MATH Google Scholar
T.L. Lai and Z.Ying, “Open bandit processes and optimal scheduling of queueing networks,” Columbia Univ. Dept. Statist. Tech. Report, 1986.
Google Scholar
A. Mandelbaum, “Discrete multi-armed bandits and multi-parameter processes,” Probability Theory & Related Fields 71, 129–147 (1986).
Article MathSciNet Google Scholar
H. Robbins, “Some aspects of the sequential design of experiments,” Bull. Amer. Math. Soc. 55, 527–535 (1952).
Article MathSciNet Google Scholar
P.P. Varaiya, J.C. Walrand and C. Buyukkoc, “Extensions of the multiarmed bandit problem: the discounted case,” IEEE Trans. Automat. Contr. 30, 426–439 (1985).
Article MathSciNet MATH Google Scholar
P. Whittle, “Multi-armed bandits and the Gittins index,” J. Roy. Statist. Soc. Ser. B 42, 143–149 (1980).
MathSciNet MATH Google Scholar
P. Whittle, “Arm-acquiring bandits,” Ann. Probability 9, 284–292 (1981).
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Columbia University, New York, NY, 10027, USA
T. L. Lai

Authors

T. L. Lai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Division of Applied Mathematics, Brown University, 02912, Providence, Rhode Island, USA
Wendell Fleming
Ceremade, Universite Paris-Dauphine, Place de Lattre de Tassigny, 75775, Paris Cedex 16, France
Pierre-Louis Lions

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lai, T.L. (1988). Asymptotic Solutions of Bandit Problems. In: Fleming, W., Lions, PL. (eds) Stochastic Differential Systems, Stochastic Control Theory and Applications. The IMA Volumes in Mathematics and Its Applications, vol 10. Springer, New York, NY. https://doi.org/10.1007/978-1-4613-8762-6_18

Download citation

DOI: https://doi.org/10.1007/978-1-4613-8762-6_18
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4613-8764-0
Online ISBN: 978-1-4613-8762-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics