Abstract
We consider the problem of sequential sampling from a finite number of independent statistical populations to maximize the expected infinite horizon average outcome per period, under a constraint that the expected average sampling cost does not exceed an upper bound. The outcome distributions are not known. We construct a class of consistent adaptive policies, under which the average outcome converges with probability 1 to the true value under complete information for all distributions with finite means. We also compare the rate of convergence for various policies in this class using simulation.
AMS Subject Classification: Primary 93E35, Stochastic learning and adaptive control; Secondary 62L05, Sequential designs
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit. Machine Learning. 47, 235–256 (2002)
Burnetas, A.N., Katehakis, M.N.: Optimal adaptive policies for sequential allocation problems. Adv. App. Math. 17, 122–142 (1996)
Burnetas, A.N., Katehakis, M.N.: Optimal adaptive policies for markovian decision processes. Math. Oper. Res. 22, 222–255 (1997)
Katehakis, M.N., Robbins, H.: Sequential choice from several populations. Proc. Natl. Acad. Sci. USA. 92, 8584–8585 (1995)
Kulkarni, S.R., Lugosi, G.: Finite-time lower bounds for the two-armed bandit problem. IEEE Trans. Automatic Contr. 45, 711–714 (2000)
Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. App. Math. 6, 4–22 (1985)
Madani, O., Lizotte, D., Greiner, R.: The budgeted multi-armed bandit problem. In: Lecture Notes in Artificial Intelligence, Subseries of Lecture Notes in Computer Science, vol. 3120, pp. 643–645 (2004)
Pezeshk, H., Gittins, J.: Sample size determination in clinical trials. Student. 3(1), 19–26 (1999)
Poznyak, A., Nazim, K., Gomez, E.: Self-Learning Control of Finite Markov Chains. CRC Press, New York (2000)
Robbins, H.: Some aspects of the sequential design of experiments. Bull. Amer. Math. Monthly. 58, 527–536 (1952)
Wang, Y.G.: Gittins indices and constrained allocation in clinical trials. Biometrika. 78, 101–111 (1991)
Acknowledgments
This research was supported by the Greek Secretariat of Research and Technology under a Greece/Turkey bilateral research collaboration program. The authors thank Nickos Papadatos and George Afendras for useful discussions on the problem of consistent estimation in a random sequence of random variables.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media New York
About this chapter
Cite this chapter
Burnetas, A., Kanavetas, O. (2012). Adaptive Policies for Sequential Sampling under Incomplete Information and a Cost Constraint. In: Daras, N. (eds) Applications of Mathematics and Informatics in Military Science. Springer Optimization and Its Applications, vol 71. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4109-0_8
Download citation
DOI: https://doi.org/10.1007/978-1-4614-4109-0_8
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4108-3
Online ISBN: 978-1-4614-4109-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)