Abstract
Some recent results on asymptotically optimal solutions of bandit problems are reviewed and discussed herein. The problems considered include (a) the classical “closed bandit problem” of adaptive allocation involving k statistical populations, and (b) the “open bandit problem” of priority scheduling in a queueing network. Making use of the interconnections between the discounted and finite-horizon formulations of these problems, we also suggest certain heuristic arguments that lead to simple asymptotic solutions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Bellman, “A problem in the sequential design of experiments,” Sankhya Ser. A. 16, 221–229 (1956).
D.A. Berry “A Bernoulli two-armed bandit,” Ann. Math. Statist. 43, 871–897 (1972).
F. Chang and T.L. Lai, “Optimal stopping and dynamic allocation,” to appear in Adv. Appl. Probability.
H. Chernoff, “Optimal stochastic control,” Sankhyâ Ser. A. 30, 221–251 (1968).
J. Fabius and W.R. Van Zwet, “Some remarks on the two-armed bandit,” Ann. Math. Statist. 41, 1906–1916 (1970).
D. Feldman, “Contributions to the two-armed bandit problem,” Ann. Math. Statist. 33, 847–856 (1962).
J.C. Gittins, “Bandit processes and dynamic allocation indices,” J. Roy. Statist. Soc. Ser. B 41, 148–177 (1979).
J.C. Gittins and D.M. Jones, “A dynamic allocation index for the design of experiments,” Progress in Statistics (Ed. J. Gani et al.), 241–266. North Holland, Amsterdam, 1974.
J.C. Gittins and D.M. Jones, “A dynamic allocation index for the discounted multi-armed bandit problem,” Biometrika 66, 561–565 (1979).
G.P. Klimov, “Time-sharing service systems I,” Theory Probability & Appl. 19, 532–551 (1974).
G.P. Klimov, “Time-sharing service systems II,” Theory Probability & Appl. 23, 314–321 (1978).
T.L. Lai, “Boundary crossing problems for sample means,” Columbia Univ. Dept. Statist. Tech. Report, 1985.
T.L. Lai, “Adaptive treatment allocation and the multi-armed bandit problem,” to appear in Ann. Statist.
T.L. Lai and H. Robbins, “Asymptotically efficient adaptive allocation rules,” Adv. Appl. Math. 6, 4–22 (1985).
T.L. Lai and Z.Ying, “Open bandit processes and optimal scheduling of queueing networks,” Columbia Univ. Dept. Statist. Tech. Report, 1986.
A. Mandelbaum, “Discrete multi-armed bandits and multi-parameter processes,” Probability Theory & Related Fields 71, 129–147 (1986).
H. Robbins, “Some aspects of the sequential design of experiments,” Bull. Amer. Math. Soc. 55, 527–535 (1952).
P.P. Varaiya, J.C. Walrand and C. Buyukkoc, “Extensions of the multiarmed bandit problem: the discounted case,” IEEE Trans. Automat. Contr. 30, 426–439 (1985).
P. Whittle, “Multi-armed bandits and the Gittins index,” J. Roy. Statist. Soc. Ser. B 42, 143–149 (1980).
P. Whittle, “Arm-acquiring bandits,” Ann. Probability 9, 284–292 (1981).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1988 Springer-Verlag New York Inc.
About this paper
Cite this paper
Lai, T.L. (1988). Asymptotic Solutions of Bandit Problems. In: Fleming, W., Lions, PL. (eds) Stochastic Differential Systems, Stochastic Control Theory and Applications. The IMA Volumes in Mathematics and Its Applications, vol 10. Springer, New York, NY. https://doi.org/10.1007/978-1-4613-8762-6_18
Download citation
DOI: https://doi.org/10.1007/978-1-4613-8762-6_18
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4613-8764-0
Online ISBN: 978-1-4613-8762-6
eBook Packages: Springer Book Archive