Advertisement

Markov Decision Processes

  • Xi-Ren Cao

Abstract

In Chapter 2, we introduced the basic principles of PA and derived the performance derivative formulas for queueing networks and Markov and semi-Markov systems with these principles. In Chapter 3, we developed sample-path-based (on-line learning) algorithms for estimating the performance derivatives and sample-path-based optimization schemes. In this chapter, we will show that the performance sensitivity based view leads to a unified approach to both PA and Markov decision processes (MDPs).

Keywords

Optimal Policy Poisson Equation Markov Decision Process Reward Function Optimality Equation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 55.
    X. R. Cao, “A Unified Approach to Markov Decision Problems and Performance Sensitivity Analysis,” Automatica, Vol. 36, 771-774, 2000.MATHCrossRefGoogle Scholar
  2. 63.
    X. R. Cao and X. P. Guo, “A Unified Approach to Markov Decision Problems and Sensitivity Analysis with Discounted and Average Criteria: Multichain Case,” Automatica, Vol. 40, 1749-1759, 2004.MATHCrossRefMathSciNetGoogle Scholar
  3. 71.
    X. R. Cao and J. Y. Zhang, “The nth-Order Bias Optimality for Multi-chain Markov Decision Processes,” IEEE Transactions on Automatic Control, submitted.Google Scholar
  4. 21.
    D. P. Bertsekas, Dynamic Programming and Optimal Control, Volumes I and II. Athena Scientific, Belmont, Massachusetts, 1995, 2001, 2007.Google Scholar
  5. 25.
    D. P. Bertsekas and T. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, Massachusetts, 1996.Google Scholar
  6. 216.
    M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, New York, 1994.MATHGoogle Scholar
  7. 77.
    H. S. Chang, M. C. Fu, J. Hu and S. I. Marcus, Simulation-Based Algorithms for Markov Decision Processes, Springer, New York, 2007.MATHGoogle Scholar
  8. 78.
    H. S. Chang, H. G. Lee, M. C. Fu, and S. I. Marcus, “Evolutionary Policy Iteration for Solving Markov Decision Processes,” IEEE Transactions on Automatic Control, Vol. 50, 1804–1808, 2005.CrossRefMathSciNetGoogle Scholar
  9. 157.
    J. Q. Hu, M. C. Fu, V. R. Ramezani, and S. I. Marcus, “An Evolutionary Random Search Algorithm for Solving Markov Decision Processes,” INFORMS Journal on Computing, to appear, 2006.Google Scholar
  10. 182.
    M. E. Lewis and M. L. Puterman, “A Probabilistic Analysis of Bias Optimality in Unichain Markov Decision Processes,” IEEE Transactions on Automatic Control, Vol. 46, 96-100, 2001.MATHCrossRefMathSciNetGoogle Scholar
  11. 183.
    M. E. Lewis and M. L. Puterman, “Bias Optimality,” in E. A. Feinberg and A. Shwartz (eds.), The Handbook of Markov Decision Processes: Methods and Applications, Kluwer Academic Publishers, Boston, 89-111, 2002.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Hong Kong University of Science and TechnologyKowloonHong Kong

Personalised recommendations