Advertisement

TOP

, Volume 14, Issue 2, pp 177–261 | Cite as

A survey of recent results on continuous-time Markov decision processes

  • Xianping Guo
  • Onésimo Hernández-Lerma
  • Tomás Prieto-Rumeau
  • Xi-Ren Cao
  • Junyu Zhang
  • Qiying Hu
  • Mark E. Lewis
  • Ricardo Vélez
Article

Abstract

This paper is a survey of recent results on continuous-time Markov decision processes (MDPs) withunbounded transition rates, and reward rates that may beunbounded from above and from below. These results pertain to discounted and average reward optimality criteria, which are the most commonly used criteria, and also to more selective concepts, such as bias optimality and sensitive discount criteria. For concreteness, we consider only MDPs with a countable state space, but we indicate how the results can be extended to more general MDPs or to Markov games.

Key Words

Continuous-time Markov decision processes (also known as controlled Markov chains) unbounded reward and transition rates discounted reward average reward bias optimality sensitive discount criteria 

AMS subject classification

90C40 93E20 60J27 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Albright S.C. and Winston W. (1979). A Birth-Death Model of Advertising and Pricing.Advances in Applied Probability 11, 134–152.CrossRefGoogle Scholar
  2. Allen L.J.S. (2003).An Introduction to Stochastic Processes with Applications to Biology. Pearson Education.Google Scholar
  3. Anderson W.J. (1991).Continuous-Time Markov Chains. Springer.Google Scholar
  4. Bailey N.T.J. (1975).The Mathematical Theory of Infectious Diseases and Its Applications. Griffin.Google Scholar
  5. Bartholomew D.J. (1973).Stochastic Models for Social Processes, 2nd Edition. Wiley.Google Scholar
  6. Bellman R. (1957).Dynamic Programming. Princeton University Press.Google Scholar
  7. Berge C. (1963).Topological Spaces. Macmillan.Google Scholar
  8. Bertsekas D.P. (2001).,Dynamic Programming and Optimal Control, Vol. II, 2nd. Edition. Athena Scientific.Google Scholar
  9. Blackwell D. (1962). Discrete Dynamic Programming.Annals of Mathematical Statistics 33, 719–726.CrossRefGoogle Scholar
  10. Cao X.R. (2005). Basic Ideals for Event-Based Optimality of Markov, Systems.Discrete Event Dynamic Systems: Theory and Applications 15, 169–197.CrossRefGoogle Scholar
  11. Cao X.R. and Guo X.P. (2006). Continuous-Time Markov Decision Processes withn-Potential Optimality Criteria. Preprint.Google Scholar
  12. Cao X.R. and Zhang J.Y. (2007). Then-th Order Bias Optimality for Multi-Chain Markov Decsiion Processes.IEEE Transactions on Automatic Control. In press.Google Scholar
  13. Dekker R. and Hordijk A. (1988). Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded Rewards.Mathematics of Operations Research 13, 395–420.Google Scholar
  14. Dekker R. and Hordijk A. (1992). Recurrence Conditions for Average and Black-well Optimality in Denumerable State Markov Decision Chains.Mathematics of Operations Research 17, 271–289.Google Scholar
  15. Doshi B.T. (1976). Continuous-Time Control of Markov Processes on an Arbitrary State Space: Discounted Rewards.Annals of Statistics 4, 1219–1235.CrossRefGoogle Scholar
  16. Dynkin E.B. and Yushkevich A.A. (1979).Controlled Markov Processes. Springer.Google Scholar
  17. Feinberg E.A. and Shwartz A. (2002).Handbook of Markov Decision Processes. Kluwer.Google Scholar
  18. Feller W. (1940). On the Integro-Differential Equations of Purely Discontinuous Markoff Processes.Transactions of the American Mathematical Society 48, 488–515.CrossRefGoogle Scholar
  19. Fisher L. (1968). On the Recurrent Denumerable Decision Process.Annals of Mathematical Statistics 39, 424–434.CrossRefGoogle Scholar
  20. Gale D. (1967). On Optimal Development in a Multi-Sector Economy.Review of Economic Studies 34, 1–19.CrossRefGoogle Scholar
  21. Guo X.P. (2006). Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces.Mathematics of Operations Research (to appear).Google Scholar
  22. Guo X.P. and Cao X.R. (2005). Optimal Control of Ergodic Continuous-Time Markov Chains with Average Sample-Path Rewards.SIAM Journal on Control and Optimization 44, 29–48.CrossRefGoogle Scholar
  23. Guo X.P. and Hernández-Lerma O. (2003a). Continuous-Time Controlled Markov Chains.Annals of Applied Probability 13, 363–388.CrossRefGoogle Scholar
  24. Guo X.P. and Hernández-Lerma O. (2003b). Continuous-Time Controlled Markov Chains with Discounted Rewards.Acta Applicandae Mathematicae 79, 195–216.CrossRefGoogle Scholar
  25. Guo X.P. and Hernández-Lerma O. (2003c). Drift and Monotonicity Conditions for Continuous-Time Controlled Markov Chains with an Average Criterion.IEEE Transactions on Automatic Control 48, 236–245.CrossRefGoogle Scholar
  26. Guo X.P. and Hernández-Lerma O. (2003d). Zero-Sum Games for Continuous-Time Markov Chains with Unbounded Transition and Average Payoff Rates.Journal of Applied Probability 40, 327–345.CrossRefGoogle Scholar
  27. Guo X.P. and Hernández-Lerma O. (2005a). Nonzero-Sum Games for Continuous-Time Markov Chains with Unbounded Discounted Payoffs.Journal of Applied Probability 42, 302–320.CrossRefGoogle Scholar
  28. Guo X.P. and Hernández-Lerma O. (2005b). Zero-Sum Continuous-Time Markov Games with Unbounded Transition and Discounted Payoff Rates.Bernoulli 16, 1009–1029.Google Scholar
  29. Guo X.P. and Liu K. (2001). A Note on Optimality Conditions for Continuous-Time Markov Decision Processes with Average Cost Criterion.IEEE Transactions on Automatic Control 46, 1984–1989.CrossRefGoogle Scholar
  30. Guo X.P. and Rieder U. (2006). Average Optimality for Continuous-Time Markov Decision Processes in Polish Spaces.Annals of Applied Probability 16, 730–756.CrossRefGoogle Scholar
  31. Guo X.P. and Zhu W.P. (2002a). Denumerable State Continuous-Time Markov Decision Processes with Unbounded Cost and Transition Rates Under the Discounted Criterion.Journal of Applied Probability 39, 233–250.CrossRefGoogle Scholar
  32. Guo X.P. and Zhu W.P. (2002b). Denumerable State Continuous-Time Markov Decision Processes with Unbounded Cost and Transition Rates Under Average Criterion.ANZIAM Journal 43, 541–557.Google Scholar
  33. Haviv M. and Puterman M.L. (1998). Bias Optimality in Controlled Queuing Systems.Journal of Applied Probability 35, 136–150.CrossRefGoogle Scholar
  34. Hernández-Lerma O. (1994).Lectures on Continuous-Time Markov Control Processes. Aportaciones Matemáticas, Vol. 3, Sociedad Matemática Mexicana, Mexico City.Google Scholar
  35. Hernández-Lerma O. and Govindan T.E. (2001). Nonstationary Continuous-Time Markov Control Processes with Discounted Costs on Infinite Horizon.,Acta Applicandae Mathematicae 67, 277–293.CrossRefGoogle Scholar
  36. Hernández-Lerma O. and Lasserre J.B. (1996).Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer.Google Scholar
  37. Hernández-Lerma O. and Lasserre J.B. (1999).Further Topics on Discrete-Time Markov Control Processes. Springer.Google Scholar
  38. Hernández-Lerma O. and Romera R. (2004). The Scalarization Approach to Multi-objective Markov Control Problems: Why Does it Work?Applied Mathematics and Optimization 50, 279–293.CrossRefGoogle Scholar
  39. Hilgert N. and Hernández-Lerma O. (2003). Bias Optimality Versus Strong 0-Discount Optimality in Markov Control Processes with Unbounded Costs.Acta Applicandae Mathematicae 76, 215–235.CrossRefGoogle Scholar
  40. Hordijk A. and Yushkevich A.A. (1999a). Blackwell Optimality in the Class of Stationary Policies in Markov Decision Chains with a Borel State and Unbounded Rewards.Mathematical Methods of Operations Research 49, 1–39.Google Scholar
  41. Hordijk A. and Yushkevich A.A. (1999b). Blackwell Optimality in the Class of All Policies in Markov Decision Chains with a Borel State and Unbounded Rewards.Mathematical Methods of Operations Research 50, 421–448.CrossRefGoogle Scholar
  42. Hordijk A. and Yushkevich A.A. (2002). Blackwell Optimality. In: Feinberg E.A. and Shwartz A. (eds.),Handbook of Markov Decision Processes. Kluwer, 231–267.Google Scholar
  43. Hou Z.T. and Guo X.P. (1998).Markov Decision Processes. Science and Technology Press of Human, Changsha, China. (In Chinese.)Google Scholar
  44. Howard R.A. (1960).Dynamic Programming and Markov Processes, Wiley.Google Scholar
  45. Hu Q. (1992). Discounted and Average Markov Decision Processes with Unbounded Rewards: New Conditions.Journal of Mathematical Analysis and Applications 171, 111–124.CrossRefGoogle Scholar
  46. Hu Q. (1996). Continuous Time Markov Decision Processes with Discounted Moment Criterion.Journal of Mathematical Analysis and Applications 203, 1–12.CrossRefGoogle Scholar
  47. Iosifescu M. and Tautu P. (1973).Stochastic Processes and Applications in Biology and Medicine, Vol. II: Models. Springer.Google Scholar
  48. Jaskiewicz A. (2004). On the Equivalence of Two Expected Average Cost Criteria for Semi-Markov Control Processes.Mathematics of Operations Research 29, 326–338.CrossRefGoogle Scholar
  49. Kakumanu P. (1971). Continuously Discounted Markov Decision Models with Countable State and Action Spaces.Annals of Mathematical Statistics 42, 919–926.CrossRefGoogle Scholar
  50. Kakumanu P. (1972). Nondiscounted Continuous-Time Markov Decision Processes with Countable State and Action Spaces.SIAM Journal on Control 10, 210–220.CrossRefGoogle Scholar
  51. Kakumanu P. (1975). Continuous Time Markov Decision Processes with Average Return Criterion.Journal of Mathematical Analysis and Applications 52, 173–188.CrossRefGoogle Scholar
  52. Kakumanu P. (1977). Relation Between Continuous and Discrete Markovian Decision Problems.Naval Research Logistics Quarterly 24, 431–439.CrossRefGoogle Scholar
  53. Kato T. (1966).Perturbation Theory for Linear Operators. Springer.Google Scholar
  54. Kermack W.O. and McKendrick A.G. (1927). Contributions to the Mathematical Theory of Epidemics.Proceedings of the Royal Statistical Society A115, 700–721.Google Scholar
  55. Kitayev M.Yu. (1985). Semi-Markov and Jump Markov Controlled Models: Average Cost Criterion.Theory of Probability and its Applications 30, 272–288.CrossRefGoogle Scholar
  56. Kitayev M.Yu. and Rykov V.V. (1995).Controlled Queueing Systems. CRC Press.Google Scholar
  57. Lasserre J.B. (1988). Conditions for the Existence of Average and Blackwell Optimal Stationary Policies in Denumerable Markov Decision Processes.Journal of Mathematical Analysis and Applications 136, 479–490.CrossRefGoogle Scholar
  58. Lefèvre C. (1979). Optimal Control of the Simple Stochastic Epidemic with Variable Recovery Rates.Mathematical Biosciences 44, 209–219.CrossRefGoogle Scholar
  59. Lefèvre C. (1981). Optimal Control of a Birth and Death Epidemic Process.Operations Research 29, 971–982.Google Scholar
  60. Leizarowitz A. (1996). Overtaking and Almost-Sure Optimality for Infinite Horizon Markov Decision Processes.Mathematics of Operations Research 21, 158–181.CrossRefGoogle Scholar
  61. Lembersky M.R. (1974). On Maximal Rewards and ∈-Optimal Policies in Continuous Time Markov Chains.Annals of Statistics 2, 159–169.CrossRefGoogle Scholar
  62. Lewis M.E., Ayhan H. and Foley R.D. (1999). Bias Optimality in a Queue with Admission Control.Probability in the Engineering and Informational Sciences 13, 309–327.CrossRefGoogle Scholar
  63. Lewis M.E., Ayhan H. and Foley R.D. (2002). Bias Optimal Admission Policies for a Noustationary Multiclass Queueing System.Journal of Applied Probability 39, 20–37.CrossRefGoogle Scholar
  64. Lewis M.E. and Puterman M.L. (2001). A Note on Bias Optimality in Controlled Queueing Systems.Journal of Applied Probability 37, 300–305.Google Scholar
  65. Lewis M.E. and Puterman M.L. (2002). A Probabilistic Analysis of Bias Optimality in Unichain Markov Decision Processes.IEEE Transactions on Automatic Control 46, 96–100.CrossRefGoogle Scholar
  66. Lippman S.A. (1975). Applying a New Device in the Optimization of Exponential Queueing Systems.Operations Research 23, 667–710.Google Scholar
  67. Lund R.B., Meyn S.P. and Tweedie R.L. (1996). Computable Exponential Convergence Rates for Stochastically Ordered Markov Processes.Annals of Applied Probability 6, 218–237.CrossRefGoogle Scholar
  68. Mangel M. (1985).Decision and Control in Uncertain Resource Systems. Academic Press.Google Scholar
  69. Massy W.F., Montgomery D.B. and Morrison D.G. (1970).Stochastic Models of Buying Behavior. MIT Press.Google Scholar
  70. Meyn S.P. and Tweedie R.L. (1993). Stability of Markovian Processes III: Foster-Lyapunov Criteria for Continuous-Time Processes.Advances in Applied Probability, 25, 518–548.CrossRefGoogle Scholar
  71. Miller B.L. (1968). Finite State Continuous Time Markov Decision Processes with an Infinite Planning Horizon.Journal of Mathematical Analysis and Applications 22, 552–569.CrossRefGoogle Scholar
  72. Miller B.L. and Veinott A.F. (1969). Discrete Dynamic Programming with a Small Interest Rate.Annals of Mathematical Statistics 40, 366–370.CrossRefGoogle Scholar
  73. Piunovskii A.B. (1998). A Controlled Jump Discounted Model with Constraints.Theory of Probability and Its Applications 42, 51–72.CrossRefGoogle Scholar
  74. Piunovskii A.B. (2004). Multicriteria Impulsive Control of Jump Markov Processes.Mathematical Methods of Operations Research 60, 125–144.Google Scholar
  75. Prieto-Rumeau T. (2006). Blackwell Optimality in the Class of Markov Policies for Continuous-Time Controlled Markov Chains.Acta Applicandae Mathematicae 92, 77–96.CrossRefGoogle Scholar
  76. Prieto-Rumeau T. and Hernández-Lerma O. (2005a). The Laurent Series, Sensitive Discount and Blackwell Optimality for Continuous-Time Controlled Markov Chains.Mathematical Methods of Operations Research 61, 123–145.CrossRefGoogle Scholar
  77. Prieto-Rumeau T. and Hernández-Lerma O. (2005b). Bias and Overtaking Equilibria for Zero-Sum Continuous-Time Markov Games.Mathematical Methods of Operations Research 61, 437–454.CrossRefGoogle Scholar
  78. Prieto-Rumeau T. and Hernández-Lerma O. (2006a). Bias Optimality for Continuous-Time Controlled Markov Chains.SIAM Journal on Control and Optimization 45, 51–73.CrossRefGoogle Scholar
  79. Prieto-Rumeau T. and Hernández-Lerma O (2006b). A Unified Approach to Continuous-Time Discounted Markov Control Processes.Morfismos 10 (to appear).Google Scholar
  80. Prieto-Rumeau T. and Hernández-Lerma O. (2006c). Ergodic Control of Continuous-Time Markov Chains with Pathwise Constraints. Preprint.Google Scholar
  81. Prieto-Rumeau T. and Hernández-Lerma O. (2006d). Variance Minimization and the Overtaking Optimality Approach to Continuous-Time Markov Control Chains. Preprint.Google Scholar
  82. Puterman M.L. (1974). Sensitive Discount Optimality in Controlled One-Dimensional Diffusions.Annals of Probability 2, 408–419.CrossRefGoogle Scholar
  83. Puterman M.L. (1994).Markov Decision Processes. Wiley.Google Scholar
  84. Qiu Q., Wu Q. and Pedram M. (2001). Stochastic Modeling of a Power-Managed System: Construction and Optimization.IEEE Transactions on Computer Aided Design 20, 1200–1217.CrossRefGoogle Scholar
  85. Ramsey F.P. (1928). A Mathematical Theory of Savings.Econometrics Journal 38, 543–559.Google Scholar
  86. Ross S.M. (1970).Applied Probability Models with Optimization Applications. Holden-Day.Google Scholar
  87. Roykov V.V. (1966). Markov Sequential Decision Processes with Finite State and Decision Space.Theory of Probability and Its Applications, 11, 302–311.CrossRefGoogle Scholar
  88. Schäl M. (1992). On the Second Optimality Equation for Semi-Markov Decision Models.Mathematics of Operations Research 17, 470–486.CrossRefGoogle Scholar
  89. Sennott L.I. (1999).Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley.Google Scholar
  90. Serfozo R.F. (1979). An Equivalence Between Continuous and Discrete Time Markov Decision Processes.Operations Research 27, 616–620.Google Scholar
  91. Sladký K. (1978). Sensitive Optimality Criteria for Continuous Time Markov Processes.Transactions of the Eighth Prague Conference on Information Theory Statistical Decision Functions and Random Processes (Prague, 1978), Vol. B, 221–225.Google Scholar
  92. Song, J.S. (1987). Continuous-Time Markov Decision Programming with Non-Uniformly Bounded Transition Rates.Scientia Sinica 12, 1258–1267. (in Chinese).Google Scholar
  93. Tadj L and Choudhury G. (2005). Optimal Design and Control of Queues.Top 13, 359–412.CrossRefGoogle Scholar
  94. Taylor H.M. (1976). A Laurent Series for the Resolvent of a Strongly Continuous Stochastic Semi-Group.Mathematical Programming Study 6, 258–263.Google Scholar
  95. Veinott A.F. (1966). On Finding Optimal Policies in Discrete Dynamic Programming with no Discounting.annals of Mathematical Statistics 37, 1284–1294.CrossRefGoogle Scholar
  96. Veinott A.F. (1969). Discrete Dynamic Programming with Sensitive Discount Optimality Criteria.Annals of Mathematical Statistics 40, 1635–1660.CrossRefGoogle Scholar
  97. Vidale M.L and Wolfe H.B. (1957). An Operations Research Study of Sales Response to Advertising.Operations Research 5, 370–381.Google Scholar
  98. von Weizsäcker C.C. (1965). Existence of Optimal Programs of Accumulation for an Infinite Horizon.Review of Economic Studies 32, 85–104.CrossRefGoogle Scholar
  99. Wickwire K. (1977). Mathematical Models for the Control of Pests and Infectious Diseases: A Survey.Theoretical Population Biology 11, 182–238.CrossRefGoogle Scholar
  100. Wu C.B. (1997). Continuous Time Markov Decision Processes with Unbounded Reward and Non-Uniformly Bounded Transition Rate Under Discounted Criterion.Acta Mathematicae Applicandae Sinica 20, 196–208.Google Scholar
  101. Ye L., Guo X.P. and Hernández-Lerma O. (2006). Existence and Regularity of NonhomogeneousQ(t)-Processes under Measurability Conditions. Preprint.Google Scholar
  102. Yosida K. (1980).Functional Analysis, Sixth Edition. Springer.Google Scholar
  103. Yushkevich A.A. (1973). On a Class of Strategies in General Markov Decision Models.Theory of Probability and Its Applications 18, 777–779.CrossRefGoogle Scholar
  104. Yushkevich A.A. (1977). Controlled Markov Models with Countable State and Continuous Time.Theory of Probability and its Applications 22, 215–235.CrossRefGoogle Scholar
  105. Yushkevich A.A. (1994). Blackwell Optimal Policies in a Markov Decision Process with a Borel State Space.Mathematical Methods of Operations Research 40, 253–288.CrossRefGoogle Scholar
  106. Yushkevich A.A. (1997). Blackwell Optimality in Continuous in Action Markov Decision Processes.SIAM Journal on Control and Optimization 35, 2157–2182.CrossRefGoogle Scholar
  107. Yushkevich A.A. and Feinberg E.A. (1979). On Homogeneous Markov Model with Continuous Time and Finite or Countable State Space.Theory of Probability and its Applications 24, 156–161.CrossRefGoogle Scholar

References

  1. Bather J. (1976). Optimal Stationary Policies for Denumerable Markov Chains in Continuous Time.Advances in Applied Probability 8, 148–155.Google Scholar
  2. Cao X.-R. (2003a). Semi-Markov Decision Problems and Performance Sensitivity Analysis.IEEE Transactions on Automatic Control 48, 758–769.CrossRefGoogle Scholar
  3. Cao X.-R. (2003b) A Sensitivity View of Markov Decision Processes and Reinforcement Learning. In: Gong W. and Shi L. (eds.),Modeling, Control and Optimization of Complex systems, Kluwer, 261–283.Google Scholar
  4. Cao X.-R. (2003c). From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning.Discrete Event Dynamic Systems 13, 9–39.CrossRefGoogle Scholar
  5. Cao X.-R. (2004). The Potential Structure of Sample Paths and Performance Sensitivities of Markov Systems.IEEE Transactions on Automatic Control 49, 2129–2142.CrossRefGoogle Scholar
  6. Cao X.-R. (2005). Basic Ideas for Event-Based Optimality of Markov Systems.Discrete Event Dynamic Systems: Theory and Applications 15, 169–197.CrossRefGoogle Scholar
  7. Cao X.-R. and Zhang J.Y. (2007). Thenth-Order Bias Optimality for Multichain Markov Decision Process.IEEE Transactions on Automatic Control (to appear).Google Scholar
  8. Dijk N.V. (1993).Queueing Networks and Product Forms: A System Approach. Wiley.Google Scholar
  9. Dynkin E.B. and Yushkevich A.A. (1979).Controlled markov Processes. Springer.Google Scholar
  10. Puterman M.L. (1994).Markov Decision Processes. Wiley.Google Scholar
  11. Sennott L.I. (1999).Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley.Google Scholar

References

  1. Hu Q. (1990). CTMDP and Its Relationship with DTMDP.Chinese Science Bulletin 35, 710–714.Google Scholar
  2. Hu Q. (1992). Discounted and Average Markov Decision Processes with Unbounded Rewards: New Conditions.Journal of Mathematical Analysis and Applications 171, 111–124.CrossRefGoogle Scholar
  3. Hu Q. (1996). Continuous Time Markov Decision Processes with Discounted Moment Criterion.Journal of Mathematical Analysis and Applications 203, 1–12.CrossRefGoogle Scholar
  4. Hu Q., Liu J. and Yue W. (2003). Continuous Time Markov Decision Processes: Discounted Total Reward.International Journal of Pure and Applied Mathematics 7, 147–175.Google Scholar
  5. Hu Q. and Wang J. (1998). Continuous Time Markov Decision Processes with Nonuniformly Bounded Rate: Expected Total Rewards.Optimization 43, 219–233.CrossRefGoogle Scholar
  6. Serfozo R.F. (1979). An Equivalence Between Continuous and Discrete Time Markov Decision Processes.Operations Research 27, 616–620.Google Scholar

References

  1. Haviv M. and Puterman M.L. (1998). Bias Optimality in Controlled Queuing Systems.Journal of Applied Probability 35, 136–150.CrossRefGoogle Scholar
  2. Lewis M.E., Ayhan H. and Foley R.D. (1999). Bias Optimality in a Queue with Admission Control.Probability in the Engineering and Informational Sciences 13, 309–327.CrossRefGoogle Scholar
  3. Lewis M.E., Ayhan H. and Foley R.D. (2002). Bias Optimal Admission Policies for a Nonstationary Multiclass Queueing System.Journal of Applied Probability 39, 20–37.CrossRefGoogle Scholar

References

  1. Borkar V.S. (2004). Controlled Diffusion Processes.Probability Surveys 2, 213–244.Google Scholar
  2. Cao X.R. and Guo X.P. (2004). Partially Observable Markov Decision Processes With Reward Information Proceeding of 43rd IEEE Conference on Decision and Control, 4393–4398.Google Scholar
  3. Hernández-Lerma O. (1989).Adaptive Markov Control Processes. Springer.Google Scholar
  4. Hernández-Lerma O. and Lasserre J.B. (1996).Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer.Google Scholar
  5. Hernández-Lerma O. and Lasserre J.B. (1999).Further Topics on Discrete-Time Markov Control Processes. Springer.Google Scholar
  6. Howard R.A. (1960).Dynamic Programming and Markov Processes. MIT Press.Google Scholar
  7. Kaelbling L.P., Littman M.L. and Moore A.W. (1996). Reinforcement Learning: A Survey,Journal of Artificial Intelligence Research 4, 237–285.Google Scholar
  8. Puterman M.L. (1994).Markov Decision Processes. Wiley.Google Scholar
  9. Neyman A. and Sorin S. (2001).Stochastic Games and Applications. NATO Science Series, 570.Google Scholar

Copyright information

© Springer 2006

Authors and Affiliations

  • Xianping Guo
    • 1
  • Onésimo Hernández-Lerma
    • 2
  • Tomás Prieto-Rumeau
    • 3
  • Xi-Ren Cao
    • 4
  • Junyu Zhang
    • 4
  • Qiying Hu
    • 5
  • Mark E. Lewis
    • 6
  • Ricardo Vélez
    • 7
  1. 1.Zhongshan UniversityP.R. China
  2. 2.CINVESTAV-IPNMexico
  3. 3.Universidad Nacional de Educación a DistanciaSpain
  4. 4.Hong Kong University of Science and TechnologyHong Kong
  5. 5.Shanghai UniversityChina
  6. 6.Cornell UniversityUSA
  7. 7.Universidad Nacinal de Educación a DistanciaSpain

Personalised recommendations