Skip to main content

Part of the book series: Wireless Networks ((WN))

Abstract

In many application domains, temporal changes in the reward distribution structure are modeled as a Markov chain. In this chapter, we present the formulation, theoretical bound, and algorithms for the Markov MAB problem, where the rewards are characterized by unknown irreducible Markov processes. Two important classes of the problem are discussed, namely, rested and restless Markov MAB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In some literature, rested Markov MAB is also called sleeping Markov MAB.

References

  1. Richard Bellman. “A Markovian Decision Process”. In: Journal of Mathematics and Mechanics 6 (1957).

    Google Scholar 

  2. Richard Bellman. Dynamic Programming. 1st ed. Princeton, NJ, USA: Princeton University Press, 1957.

    Google Scholar 

  3. John C Gittins. “Bandit processes and dynamic allocation indices”. In: Journal of the Royal Statistical Society. Series B (Methodological) (1979), pp. 148–177.

    Google Scholar 

  4. Ronald A. Howard. Dynamic Programming and Markov Processes. Technology Press and Wiley, 1960.

    Google Scholar 

  5. Thomas Jaksch, Ronald Ortner, and Peter Auer. “Near-optimal regret bounds for reinforcement learning”. In: Journal of Machine Learning Research 11.Apr (2010), pp. 1563–1600.

    Google Scholar 

  6. Haoyang Liu, Keqin Liu, and Qing Zhao. “Learning in a changing world: Restless multiarmed bandit with unknown dynamics”. In: Information Theory, IEEE Transactions on 59.3 (2013), pp. 1902–1916.

    Google Scholar 

  7. Ronald Ortner et al. “Regret bounds for restless Markov bandits”. In: Theor. Comput. Sci. 558 (2014), pp. 62–76.

    Google Scholar 

  8. Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. Vol. 1. 1. MIT press Cambridge, 1998.

    Google Scholar 

  9. E.J. Sondik. “The optimal control of partially observable Markov processes over the infinite horizon: discounted cost”. In: Operations Research 26 (1978), pp. 282–304.

    Google Scholar 

  10. Cem Tekin and Mingyan Liu. “Online algorithms for the multi-armed bandit problem with Markovian rewards”. In: Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on. IEEE. 2010, pp. 1675–1682.

    Google Scholar 

  11. Cem Tekin and Mingyan Liu. “Online learning of rested and restless bandits”. In: IEEE ransactions on Information Theory 58.8 (2012), pp. 5588–5611.

    Google Scholar 

  12. K. Wang and L. Chen. “On Optimality of Myopic Policy for Restless Multi-Armed Bandit Problem: An Axiomatic Approach”. In: IEEE Transactions on Signal Processing 60.1 (Jan. 2012), pp. 300–309.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rong Zheng .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this chapter

Cite this chapter

Zheng, R., Hua, C. (2016). Markov Multi-armed Bandit. In: Sequential Learning and Decision-Making in Wireless Resource Management. Wireless Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-50502-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50502-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50501-5

  • Online ISBN: 978-3-319-50502-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics