Markov Multi-armed Bandit

Zheng, Rong; Hua, Cunqing

doi:10.1007/978-3-319-50502-2_3

Rong Zheng⁴ &
Cunqing Hua⁵

Part of the book series: Wireless Networks ((WN))

1072 Accesses
1 Citations

Abstract

In many application domains, temporal changes in the reward distribution structure are modeled as a Markov chain. In this chapter, we present the formulation, theoretical bound, and algorithms for the Markov MAB problem, where the rewards are characterized by unknown irreducible Markov processes. Two important classes of the problem are discussed, namely, rested and restless Markov MAB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In some literature, rested Markov MAB is also called sleeping Markov MAB.

References

Richard Bellman. “A Markovian Decision Process”. In: Journal of Mathematics and Mechanics 6 (1957).
Google Scholar
Richard Bellman. Dynamic Programming. 1st ed. Princeton, NJ, USA: Princeton University Press, 1957.
Google Scholar
John C Gittins. “Bandit processes and dynamic allocation indices”. In: Journal of the Royal Statistical Society. Series B (Methodological) (1979), pp. 148–177.
Google Scholar
Ronald A. Howard. Dynamic Programming and Markov Processes. Technology Press and Wiley, 1960.
Google Scholar
Thomas Jaksch, Ronald Ortner, and Peter Auer. “Near-optimal regret bounds for reinforcement learning”. In: Journal of Machine Learning Research 11.Apr (2010), pp. 1563–1600.
Google Scholar
Haoyang Liu, Keqin Liu, and Qing Zhao. “Learning in a changing world: Restless multiarmed bandit with unknown dynamics”. In: Information Theory, IEEE Transactions on 59.3 (2013), pp. 1902–1916.
Google Scholar
Ronald Ortner et al. “Regret bounds for restless Markov bandits”. In: Theor. Comput. Sci. 558 (2014), pp. 62–76.
Google Scholar
Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. Vol. 1. 1. MIT press Cambridge, 1998.
Google Scholar
E.J. Sondik. “The optimal control of partially observable Markov processes over the infinite horizon: discounted cost”. In: Operations Research 26 (1978), pp. 282–304.
Google Scholar
Cem Tekin and Mingyan Liu. “Online algorithms for the multi-armed bandit problem with Markovian rewards”. In: Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on. IEEE. 2010, pp. 1675–1682.
Google Scholar
Cem Tekin and Mingyan Liu. “Online learning of rested and restless bandits”. In: IEEE ransactions on Information Theory 58.8 (2012), pp. 5588–5611.
Google Scholar
K. Wang and L. Chen. “On Optimality of Myopic Policy for Restless Multi-Armed Bandit Problem: An Axiomatic Approach”. In: IEEE Transactions on Signal Processing 60.1 (Jan. 2012), pp. 300–309.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, McMaster University, Hamilton, ON, Canada
Rong Zheng
School of Information Security Engineering, Shanghai Jiao Tong University, Shanghai, China
Cunqing Hua

Authors

Rong Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Cunqing Hua
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rong Zheng .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zheng, R., Hua, C. (2016). Markov Multi-armed Bandit. In: Sequential Learning and Decision-Making in Wireless Resource Management. Wireless Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-50502-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-50502-2_3
Published: 06 January 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50501-5
Online ISBN: 978-3-319-50502-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics