A Simulation-Based Policy Iteration Algorithm for Average Cost Unichain Markov Decision Processes

He, Ying; Fu, Michael C.; Marcus, Steven I.

doi:10.1007/978-1-4615-4567-5_9

A Simulation-Based Policy Iteration Algorithm for Average Cost Unichain Markov Decision Processes

Ying He⁵,
Michael C. Fu⁵ &
Steven I. Marcus⁵

Chapter

187 Accesses
4 Citations

Part of the book series: Operations Research/Computer Science Interfaces Series ((ORCS,volume 12))

Abstract

In this paper, we propose a simulation-based policy iteration algorithm for Markov decision process (MDP) problems with average cost criterion under the unichain assumption, which is a weaker assumption than found in previous work. In this algorithm, 1) the problem is converted to a stochastic shortest path problem and a reference state can be chosen as any recurrent state under the current policy, in which case the reference state is not necessarily the same from iteration to iteration; 2) the differential costs are evaluated indirectly by a temporal-difference learning scheme; 3) transient states are selected as the initial states for sample paths and the inverse of the visit count is chosen as the stepsize to improve the performance. Numerical results using the algorithm for an inventory control problem are also provided.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Puterman, M. L. (1994), Markov Decision Processes, John Wiley & Sons, Inc., New York.
Google Scholar
Bertsekas, D. P. and Tsitsiklis, J. N. (1996), Neuro-Dynamic Programming, Athena Scientific, Belmont, Massachusetts.
Google Scholar
Cao, X. R. (1997) “Single sample path based optimization of Markov chains,” preprint.
Google Scholar
Tsitsiklis, J. N. and Van Roy, B. (1999) “Average cost temporal-difference learning,” submitted to Machine Learing.
Google Scholar
Konda, V.R. and Borkar, V. S. (1998) “Learning algorithms for Markov decision processes,” preprint.
Google Scholar
Bertsekas, D. P. (1995), Dynamic Programming and Optimal Control Vol J & 2, Athena Scientific, Belmont, Massachusetts.
Google Scholar
Abounadi, J., Bertsekas, D. P., and Borkar, V. S. (1998) “Learning algorithms for Markov decision processes with average cost,” Tech. Rep., MIT, LIDS-P-2434.
Google Scholar
Van Roy, B., Bertsekas, D. P., Lee, P., and Tsitsiklis, J. N. (1997) “A neuro-dynamic programming approach to retailer inventory management,” Tech. Rep., Unica Technologies, Lincoln, MA.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Systems Research, University of Maryland, 20742, College Park, MD, USA
Ying He, Michael C. Fu & Steven I. Marcus

Authors

Ying He
View author publications
You can also search for this author in PubMed Google Scholar
Michael C. Fu
View author publications
You can also search for this author in PubMed Google Scholar
Steven I. Marcus
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Colorado, USA
Manuel Laguna
ITESM Campus Monterrey, Mexico
José Luis González Velarde

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

He, Y., Fu, M.C., Marcus, S.I. (2000). A Simulation-Based Policy Iteration Algorithm for Average Cost Unichain Markov Decision Processes. In: Laguna, M., Velarde, J.L.G. (eds) Computing Tools for Modeling, Optimization and Simulation. Operations Research/Computer Science Interfaces Series, vol 12. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-4567-5_9

Download citation

DOI: https://doi.org/10.1007/978-1-4615-4567-5_9
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-7062-8
Online ISBN: 978-1-4615-4567-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics