Adaptive Policies for Sequential Sampling under Incomplete Information and a Cost Constraint

Burnetas, Apostolos; Kanavetas, Odysseas

doi:10.1007/978-1-4614-4109-0_8

Apostolos Burnetas² &
Odysseas Kanavetas²

Part of the book series: Springer Optimization and Its Applications ((SOIA,volume 71))

1308 Accesses
2 Citations

Abstract

We consider the problem of sequential sampling from a finite number of independent statistical populations to maximize the expected infinite horizon average outcome per period, under a constraint that the expected average sampling cost does not exceed an upper bound. The outcome distributions are not known. We construct a class of consistent adaptive policies, under which the average outcome converges with probability 1 to the true value under complete information for all distributions with finite means. We also compare the rate of convergence for various policies in this class using simulation.

AMS Subject Classification: Primary 93E35, Stochastic learning and adaptive control; Secondary 62L05, Sequential designs

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit. Machine Learning. 47, 235–256 (2002)
Article MATH Google Scholar
Burnetas, A.N., Katehakis, M.N.: Optimal adaptive policies for sequential allocation problems. Adv. App. Math. 17, 122–142 (1996)
Article MathSciNet MATH Google Scholar
Burnetas, A.N., Katehakis, M.N.: Optimal adaptive policies for markovian decision processes. Math. Oper. Res. 22, 222–255 (1997)
Article MathSciNet MATH Google Scholar
Katehakis, M.N., Robbins, H.: Sequential choice from several populations. Proc. Natl. Acad. Sci. USA. 92, 8584–8585 (1995)
Article MathSciNet Google Scholar
Kulkarni, S.R., Lugosi, G.: Finite-time lower bounds for the two-armed bandit problem. IEEE Trans. Automatic Contr. 45, 711–714 (2000)
Article MathSciNet MATH Google Scholar
Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. App. Math. 6, 4–22 (1985)
Article MathSciNet MATH Google Scholar
Madani, O., Lizotte, D., Greiner, R.: The budgeted multi-armed bandit problem. In: Lecture Notes in Artificial Intelligence, Subseries of Lecture Notes in Computer Science, vol. 3120, pp. 643–645 (2004)
MathSciNet Google Scholar
Pezeshk, H., Gittins, J.: Sample size determination in clinical trials. Student. 3(1), 19–26 (1999)
Google Scholar
Poznyak, A., Nazim, K., Gomez, E.: Self-Learning Control of Finite Markov Chains. CRC Press, New York (2000)
Google Scholar
Robbins, H.: Some aspects of the sequential design of experiments. Bull. Amer. Math. Monthly. 58, 527–536 (1952)
MathSciNet MATH Google Scholar
Wang, Y.G.: Gittins indices and constrained allocation in clinical trials. Biometrika. 78, 101–111 (1991)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This research was supported by the Greek Secretariat of Research and Technology under a Greece/Turkey bilateral research collaboration program. The authors thank Nickos Papadatos and George Afendras for useful discussions on the problem of consistent estimation in a random sequence of random variables.

Author information

Authors and Affiliations

Department of Mathematics, University of Athens, Panepistemiopolis, 15784, Athens, Greece
Apostolos Burnetas & Odysseas Kanavetas

Authors

Apostolos Burnetas
View author publications
You can also search for this author in PubMed Google Scholar
Odysseas Kanavetas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Apostolos Burnetas .

Editor information

Editors and Affiliations

, Department of Military Science, Hellenic Military Academy, Vari Attikis, 166 73, Greece
Nicholas J. Daras

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Burnetas, A., Kanavetas, O. (2012). Adaptive Policies for Sequential Sampling under Incomplete Information and a Cost Constraint. In: Daras, N. (eds) Applications of Mathematics and Informatics in Military Science. Springer Optimization and Its Applications, vol 71. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4109-0_8

Download citation

DOI: https://doi.org/10.1007/978-1-4614-4109-0_8
Published: 10 July 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4108-3
Online ISBN: 978-1-4614-4109-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics