Skip to main content

Part of the book series: International Series in Operations Research & Management Science ((ISOR,volume 40))

Abstract

In this chapter we study Markov decision processes (MDPs) with finite state and action spaces. This is the classical theory developed since the end of the fifties. We consider finite and infinite horizon models. For the finite horizon model the utility function of the total expected reward is commonly used. For the infinite horizon the utility function is less obvious. We consider several criteria: total discounted expected reward, average expected reward and more sensitive optimality criteria including the Blackwell optimality criterion. We end with a variety of other subjects.

The emphasis is on computational methods to compute optimal policies for these criteria. These methods are based on concepts like value iteration, policy iteration and linear programming. This survey covers about three hundred papers. Although the subject of finite state and action MDPs is classical, there are still open problems. We also mention some of them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S.C. Albright, (1979): “Structural results for partially observable Markov decision processes”, Operations Research 27, 1041–1053.

    Article  Google Scholar 

  2. E. Altman, (1999): “Constrained Markov decision processes”, Chapman & Hall/CRC, Boca Raton, Florida.

    Google Scholar 

  3. E. Altman, A. Hordijk and L.C.M. Kallenberg (1996): “On the value function in constrained control of Markov chains”, Mathematical Methods of Operations Research 44, 387–399.

    Article  Google Scholar 

  4. E. Altman and A. Shwartz (1991a): “Sensitivity of constrained Markov decision processes”, Annals of Operations Research 33, 1–22.

    Article  Google Scholar 

  5. E. Altman and A. Shwartz (1991b): “Adaptive control of constrained Markov chains”, IEEE-Transactions on Automatic Control 36, 454–462.

    Article  Google Scholar 

  6. E. Altman and A. Shwartz (1991c): “Adaptive control of constrained Markov decision chains: criteria and policies”, Annals of Operations Research 28, 101–134.

    Article  Google Scholar 

  7. E. Altman and A. Shwartz (1991): “Sensitivity of constrained Markov decision processes”, Annals of Operations Research 33, 1–22.

    Article  Google Scholar 

  8. E. Altman and F.M. Spieksma (1995): “The linear program approach in Markov decision processes”, Mathematical Methods of Operations Research 42, 169–188.

    Article  Google Scholar 

  9. J.S. Baras, D.J. Ma and A.M. Makowsky (1985): “K competing queues with linear costs and geometric service requirements: the µc-rule is always optimal” Systems Control Letters 6, 173–180.

    Article  Google Scholar 

  10. J. Bather (1973a): “Optimal decision procedures for finite Markov chains. Part II: Communicating systems”, Advances in Applied Probability 5, 521–540.

    Article  Google Scholar 

  11. J. Bather (1973b): “Optimal decision procedures for finite Markov chains. Part III: General convex systems”, Advances in Applied Probability 5, 541–553.

    Article  Google Scholar 

  12. M. Bayal-Gursoy and K.W. Ross (1992): “Variability-sensitivity Markov decision processes”, Mathematics of Operations Research 17, 558–571.

    Article  Google Scholar 

  13. R. Bellman (1957): “Dynamic programming”, Princeton University Press, Princeton.

    Google Scholar 

  14. A. Ben-Israel and S.D. Flam (1990): “A bisection/successive approximation method for computing Gittins indices”, Zeitschrift für Operations Research 34, 411–422.

    Google Scholar 

  15. D.P. Bertsekas (1976): “Dynamic programming and stochastic control”, Academic Press, New York.

    Google Scholar 

  16. D.R Bertsekas (1976b): “On error bounds for successive approximation methods”, IEEE Transactions on Automatic Control 21, 394–396.

    Article  Google Scholar 

  17. D.R Bertsekas (1987): “Dynamic programming: deterministic and stochastic models”, Prentice-Hall, Englewood Cliff.

    Google Scholar 

  18. D.R Bertsekas (1995): “Dynamic programming and optimal control I”, Athena Scientific, Belmont, Massachusetts.

    Google Scholar 

  19. D.R Bertsekas (1995): “Dynamic programming and optimal control II”, Athena Scientific, Belmont, Massachusetts+.

    Google Scholar 

  20. D.R. Bertsekas (1995c): “Generic rank-one corrections for value iteration in Markovian decision problems”, OR Letters 17, 111–119.

    Google Scholar 

  21. D.R. Bertsekas (1998): “A new value iteration method for the average cost dynamic programming problem”, SIAM Journal on Control and Optimization 36, 742–759.

    Article  Google Scholar 

  22. D.R Bertsekas and S.E. Shreve (1978) “Stochastic Optimal Control”, Academic Press, New York.

    Google Scholar 

  23. D.P. Bertsekas and J.N. Tsitsiklis (1991): “An analysis of stochastic shortest path problems”, Mathematics of Operations Research 16, 580–595.

    Article  Google Scholar 

  24. D. Bertsimas and J. Niño-Mora (1996): “Conservations laws, extended polymatroids and multi-armed bandit problems; a polyhedral approach to indexable systems”, Mathematics of Operations Research 21, 257–306.

    Article  Google Scholar 

  25. F.J. Beutler and K.W. Ross (1985): “Optimal policies for controlled Markov chains with a constraint”, Journal of Mathematical Analysis and Applications 112, 236–252.

    Article  Google Scholar 

  26. K.-J. Bierth (1987): “An expected average reward criterion”, Stochastic Processes and Applications 26, 133–140.

    Article  Google Scholar 

  27. D. Blackwell (1962): “Discrete dynamic programming”, Annals of Mathematical Statistics, 719–726.

    Google Scholar 

  28. L. Breiman (1964): “Stopping-rule problems”, in: E.F. Beckenbach (ed.), Applied Combinatorial Mathematics”, Wiley, New York, 284–319.

    Google Scholar 

  29. B.W. Brown (1965): “On the iterative method of dynamic programming on a finite space discrete time Markov process”, Annals of Mathematical Statistics 36, 1279–1285.

    Article  Google Scholar 

  30. J. Bruno, P. Downey and G.N. Frederickson (1981): “Sequencing tasks with exponential service times to minimize the expected flowtime or makespan”, Journal of the Association for Computing Machinery 28, 100–113.

    Article  Google Scholar 

  31. A.N. Burnetas, and M.N. Katehakis (1997): “Optimal adaptive policies for Markov decision processes”, Mathematics of Op. Research 22, 222–255.

    Article  Google Scholar 

  32. R. Cavazos-Cadena (1991): “Nonparametric estimation and adaptive control in a class of finite Markov decision chains”, Annals of Operations Research 28, 169–184.

    Article  Google Scholar 

  33. C.-S. Chang, A. Hordijk, R. Righter and G. Weiss (1994): “The stochastic optimality of SEPT in parallel machine scheduling”, Probability in the Engineering and Information Sciences 8, 179–188.

    Article  Google Scholar 

  34. M.C. Chen, Jr. (1973): “Optimal stopping in a discrete search problem”, Operations Research 21, 741–747.

    Article  Google Scholar 

  35. Y.-R. Chen and M.N. Katehakis (1986): “Linear programming for finite state bandit problems”, Mathematics of Operations Research 11, 180–183.

    Article  Google Scholar 

  36. Y.S. Chow and H. Robbins (1961): “A martingale system theorem and applications” in: J. Neyman (ed), “Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability”, Vol.1, University of Berkeley Press, Berkeley, 93–104.

    Google Scholar 

  37. K.-J. Chung (1989): “A note on maximal mean/standard deviation ratio in an undiscounted MDP”, OR Letters 8, 201–204.

    Google Scholar 

  38. K.-J. Chung (1992): “Remarks on maximal mean/standard deviation ratio in an undiscounted MDPs”, Opimization 26, 385–392.

    Article  Google Scholar 

  39. K.-J. Chung (1994): “Mean-variance trade-offs in an undiscounted MDP: the unichain case”, Operations Research 42, 184–188.

    Article  Google Scholar 

  40. G.B. Dantzig (1963): “Linear programming and extensions”, Princeton University Press, Princeton, New Jersey.

    Google Scholar 

  41. J.S. De Cani (1964): “A dynamic programming algorithm for embedded Markov chains when the planning horizon is at infinity”, Management Science 10, 716–733.

    Article  Google Scholar 

  42. G.T. De Ghellinck (1960): “Les problèmes de décisions séquentielles”, Cahiers du Centre de Recherche Opérationelle, 161–179.

    Google Scholar 

  43. G.T. De Ghellinck and G.D. Eppen (1967): “Linear programming solu- tions for separable Markovian decision problems”, Management Science 13, 371–394.

    Article  Google Scholar 

  44. R.S. Dembo and M. Haviv (1984): “Truncated policy iteration methods”, OR Letters 3, 243–246.

    Google Scholar 

  45. E.V. Denardo (1967): “Contraction mappings in the theory underlying dynamic programming”, SIAM Review 9, 165–167.

    Article  Google Scholar 

  46. E.V. Denardo (1968): “Separable Markovian decision problems”, Management Science 14, 451–462.

    Article  Google Scholar 

  47. E.V. Denardo (1970): “Computing a bias-optimal policy in a discrete-time Markov decision problem”, Operations Research 18, 279–289.

    Article  Google Scholar 

  48. E.V. Denardo (1971): “Markov renewal programs with small interest rates”, Annals of Mathematical Statistics 42, 477–496.

    Article  Google Scholar 

  49. E.V. Denardo (1973): “A Markov decision problem”, in: T.C. Hu and S.M. Robinson (eds.), “Mathematical Programming”, Academic Press, 33–68.

    Google Scholar 

  50. E.V. Denardo (1982): “Dynamic programming: models and applications”, Prentice-Hall, Englewood Cliff.

    Google Scholar 

  51. E.V. Denardo and B.L. Fox (1968): “Multichain Markov renewal pro- grams”, SIAM Journal on Applied Mathematics 16, 468–487.

    Article  Google Scholar 

  52. E.V. Denardo and B.L. Miller (1968): “An optimality condition for discrete dynamic programming with no discounting”, Annals of Mathematical Statistics 39, 1220–1227.

    Article  Google Scholar 

  53. E.V. Denardo and U.G. Rothblum (1979a): “Optimal stopping, expo- nential utility and linear programming”, Mathematical Programming 16, 228–244.

    Article  Google Scholar 

  54. E.V. Denardo and U.G. Rothblum (1979b): “Overtaking optimality for Markov decision chains”, Mathematics of Operations Research 4, 144–152.

    Article  Google Scholar 

  55. F. D’Epenoux (1960): “Sur un problème de production et de stockage dans l’aléatoire”, Revue Française de Recherche Opérationelle, 3–16.

    Google Scholar 

  56. C. Derman (1962): “On sequential decisions and Markov chains”, Management Science 9, 16–24.

    Article  Google Scholar 

  57. C. Derman (1963): “Optimal replacement rules when changes of states are Markovian”, in: R. Bellman (ed.), “Mathematical optimization techniques”, The Rand Corporation, R-396-PR, 201–212.

    Google Scholar 

  58. C. Derman (1970): “Finite state Markovian decision processes”, Academic Press, New York.

    Google Scholar 

  59. C. Derman and M. Klein (1965): “Some remarks on finite horizon Marko- vian decision models”, Operations Research 13, 272–278.

    Article  Google Scholar 

  60. C. Derman and J. Sacks (1960): “Replacement of periodically inspected equipment (an optimal stopping rule)”, Naval Research Logistics Quarterly 7, 597–607.

    Article  Google Scholar 

  61. C. Derman and R. Strauch (1966): “A note on memoryless rules for controlling sequential control problems”, Annals of Mathematical Statistics 37, 276–278.

    Article  Google Scholar 

  62. C. Derman and A.F. Veinott, Jr. (1972): “Constrained Markov decision chains”, Management Science 19, 389–390.

    Article  Google Scholar 

  63. H.M. Dietz and V. Nollau (1983): “Markov decision problems with countable state space”, Akademie-Verlag, Berlin.

    Google Scholar 

  64. L. Dubins and L.J. Savage (1965): “How to gamble if you must”, McGraw-Hill, New York.

    Google Scholar 

  65. S. Durinovics, H.M. Lee, M.N. Katehakis and J.A. Filar (1986): “Multiobjective Markov decision processes with average reward criterion”, Large Scale Systems 10, 215–226.

    Google Scholar 

  66. E.B. Dynkin (1979): “Controlled Markov process”, Springer-Verlag, New York.

    Book  Google Scholar 

  67. J.H. Eaton and L.A. Zadeh (1962): “Optimal pursuit strategies in discrete state probabilistic systems”, Transactions ASME Series D, Journal of Basic Engineering 84, 23–29.

    Article  Google Scholar 

  68. A. Ephremides, P. Varaiya and J. Walrand (1980): “A simple dynamic routing problem”, IEEE Transactions on Automatic Control AC-25, 690–693.

    Google Scholar 

  69. A. Federgruen (1984): “Markovian control problems: functional equations and algorithms”, Mathematical Centre Tract 97, Mathematical Centre, Amsterdam.

    Google Scholar 

  70. A. Federgruen and P.J. Schweitzer (1978): “Discounted and undiscounted value iteration in Markov decision problems: a survey”, in: M.L. Puterman (ed), “Dynamic programming and its applications”, Academic Press, New York, 23–52.

    Google Scholar 

  71. A. Federgruen and P.J. Schweitzer (1980): “A survey of asymptotic value-iteration for undiscounted Markovian decision processes”, in: R. Hartley, L.C. Thomas and D.J. White (eds.), “Recent development in Markov decision processes”, Academic Press, New York, 73–109.

    Google Scholar 

  72. A. Federgruen and P.J. Schweitzer (1984a): “A fixed-point approach to undiscounted Markov renewal programs”, SIAM Journal on Algebraic Discrete Methods 5, 539–550.

    Article  Google Scholar 

  73. A. Federgruen and P.J. Schweitzer (1984b): “Successive approximation methods for solving nested functional equations in Markov decision problems”, Mathematics of Operations Research 9, 319–344.

    Article  Google Scholar 

  74. A. Federgruen, P.J. Schweitzer and H.C. Tijms (1978): “Contraction map- pings underlying undiscounted Markov decision problems”, Journal of Mathematical Analysis and Applications 65, 711–730.

    Article  Google Scholar 

  75. A. Federgruen and D. Spreen (1980): “A new specification of the multichain policy iteration algorithm in undiscounted Markov renewal programs”, Management Science 26, 1211–1217.

    Article  Google Scholar 

  76. A. Federgruen and P. Zipkin (1984): “An efficient algorithm for computing optimal (s, S) policies”, Operations Research 34, 1268–1285.

    Article  Google Scholar 

  77. E.A. Feinberg and A. Shwartz (1994): “Markov decision models with weighted discounted criteria”, Mathematics of Operations Research 19, 152–168.

    Article  Google Scholar 

  78. J.A. Filar, L.C.M. Kallenberg and H.M. Lee (1989): “Variance-penalized Markov decision processes”, Mathematics of Operations Research 14, 147–161.

    Article  Google Scholar 

  79. J.A. Filar and O. J. Vrieze (1997): “Competitive Markov decision processes”, Springer-Verlag, New York.

    Google Scholar 

  80. B.L. Fox (1968): “(g, w)-optima in Markov renewal programs”, Management Science 15, 210–212.

    Article  Google Scholar 

  81. E. Frostig (1993): “Optimal policies for machine repairmen problems”, Journal of Applied Probability 30, 703–715.

    Article  Google Scholar 

  82. N. Furakawa (1980): “Characterization of optimal policies in vector-valued Markovian decision processes”, Mathematics of Operations Research 5, 271–279.

    Article  Google Scholar 

  83. S. Gal (1984): “An O(N3) algorithm for optimal replacement problems”, SIAM Journal on Control and Optimization 22, 902–910.

    Article  Google Scholar 

  84. R. Garbe and K.D. Glazebrook (1998): “On a new approach to the analysis of complex multi-armed bandit problems”, Mathematical Methods of Operations Research 48, 419–442.

    Article  Google Scholar 

  85. J.C. Gittins (1979): “Bandit processes and dynamic allocation indices”, Journal of the Royal Statistic Society Series B 14, 148–177.

    Google Scholar 

  86. J.C. Gittins and D.M. Jones (1974): “A dynamic allocation index for the sequential design of experiments”, in J. Gani (ed.) “Progress in Statistics”, North Holland, Amsterdam, 241–266.

    Google Scholar 

  87. K.D. Glazebrook and R. Garbe (1996): “Reflections on a new approach to Gittins indexation”, Journal of the Operational Research Society 47, 1301–1309.

    Google Scholar 

  88. K.D. Glazebrook and S. Greatrix (1995): “On transforming an index for generalized bandit problems”, J. of App. Prob. 32, 168–182.

    Article  Google Scholar 

  89. K.D. Glazebrook and R.W. Owen (1991): “New results for generalized bandit problems”, International Journal of System Sciences 22, 479–494.

    Article  Google Scholar 

  90. M.K. Ghosh (1990): “Markov decision processes with multiple costs”, OR Letters 9, 257–260.

    Google Scholar 

  91. R. Grinold (1973): “Elimination of suboptimal actions in Markov decision problems”, Operations Research 21, 848–851.

    Article  Google Scholar 

  92. R. Hartley, A.C. Lavercombe and L.C. Thomas (1986): “Computational comparison of policy iteration algorithms for discounted Markov decision processes”, Computers and Operations Research 13, 411–420.

    Article  Google Scholar 

  93. N.A.J. Hastings (1968): “Some notes on dynamic programming and replacement”, Operational Research Quarterly 19, 453–464.

    Article  Google Scholar 

  94. N.A.J. Hastings (1969): “Optimization of discounted Markov decision problems”, Operations Research Quarterly 20, 499–500.

    Article  Google Scholar 

  95. N.A.J. Hastings (1971): “Bounds on the gain of a Markov decision process”, Operations Research 19, 240–243.

    Article  Google Scholar 

  96. N.A.J. Hastings (1976): “A test for nonoptimal actions in undiscounted finite Markov decision chains”, Management Science 23, 87–92.

    Article  Google Scholar 

  97. N.A.J. Hastings and J.M.C.Mello (1973): “Tests for nonoptimal actions in discounted Markov decision problems”, Management Science 19, 1019–1022.

    Article  Google Scholar 

  98. N.A.J. Hastings and D.Sadjani (1979): “Markov programming with policy constraints”, European Journal of Operations Research 3, 253–255.

    Article  Google Scholar 

  99. N.A.J. Hastings and J.A.E.E. Van Nunen (1977): “The action elimination algorithm for Markov decision processes”, in H.C. Tijms and J. Wessels (eds), “Markov decision theory”, Mathematical Centre Tract 100, 161–170, Mathematical Centre, Amsterdam.

    Google Scholar 

  100. M. Haviv and M.L. Puterman (1991): “An improved algorithm for solving communicating average reward Markov decision processes”, Annals of Operations Research 28, 229–242.

    Article  Google Scholar 

  101. M.I. Henig (1983): “Vector-valued dynamic programming”, SIAM Journal on Control and Optimization 21, 490–499.

    Article  Google Scholar 

  102. O. Hernández-Lerma (1987): “Adaptive Markov control processes”, Springer-Verlag, New York.

    Google Scholar 

  103. O. Hernández-Lerma and J. B. Lasserre (1996): “Discrete-time Markov control processes: Basic optimality criteria”, Springer-Verlag, New York.

    Book  Google Scholar 

  104. O. Hernández-Lerma and J. B. Lasserre (1999): “Further topics on discrete-time Markov control processes”, Springer-Verlag, New York.

    Book  Google Scholar 

  105. M. Herzberg and U. Yechiali (1994): “Accelerating procedures of the value iteration algorithm for discounted Markov decision processes, based on a one-step look-ahead analysis”, Operations Research 42, 940–946.

    Article  Google Scholar 

  106. D.P. Heyman and M. J. Sobel (1984): “Stochastic models in Operations Research, Volume II, MacGraw-Hill, New York.

    Google Scholar 

  107. K. Hinderer (1970): “Foundations of non-stationary dynamic programming with discrete time parameter”, Springer-Verlag, New York.

    Book  Google Scholar 

  108. U.D. Holzbaur (1986a): “Entscheidungsmodelle über angeordneten Körpern”, Optimization 17, 515–524.

    Article  Google Scholar 

  109. U.D. Holzbaur (1986b): “Sensitivitätsanalysen in Entscheidungsmodellen”, Optimization 17, 525–533.

    Article  Google Scholar 

  110. U.D. Holzbaur (1994): “Bounds for the quality and the number of steps in Bellman’s value iteration algorithm”, OR Spektrum 15, 231–234.

    Article  Google Scholar 

  111. A. Hordijk (1971): “A sufficient condition for the existence of an optimal policy with respect to the average cost criterion in Markovian decision processes”, Transactions of the Sixth Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, Academia, Prague, 263–274.

    Google Scholar 

  112. A. Hordijk (1974): “Dynamic programming and Markov potential theory”, Mathematical Centre Tract 51, Amsterdam.

    Google Scholar 

  113. A. Hordijk, R. Dekker and L.C.M. Kallenberg (1985): “Sensitivity-analysis in discounted Markovian decision problems”, OR Spektrum 7, 143–151.

    Article  Google Scholar 

  114. A. Hordijk and L.C.M. Kallenberg (1979): “Linear programming and Markov decision chains”, Management Science 25, 352–362.

    Article  Google Scholar 

  115. A. Hordijk and L.C.M. Kallenberg (1984a): “Transient policies in discrete dynamic programming: linear programming including suboptimality tests and additional constraints”, Mathematical Programming 30, 46–70.

    Article  Google Scholar 

  116. A. Hordijk and L.C.M. Kallenberg (1984b): “Constrained undiscounted stochastic dynamic programming”, Mathematics of Operations Research 9, 276–289.

    Article  Google Scholar 

  117. A. Hordijk and J.A. Loeve (1994): “Undiscounted Markov decision chains with partial information; an algorithm for computing a locally optimal periodic policy”, Mathematical Methods of Operations Research 40, 163–181.

    Article  Google Scholar 

  118. A. Hordijk and H.C. Tijms (1974): “The method of successive approx- imations and Markovian decision problems”, Operations Research 22, 519–521.

    Article  Google Scholar 

  119. A. Hordijk and H.C. Tijms (1975): “A modified form of the iterative method of dynamic programming”, Annals of Statictics 3, 203–208.

    Article  Google Scholar 

  120. A. Hordijk and H.C. Tijms (1975): “On a conjecture of Iglehart”, Management Science 11, 1342–1345.

    Article  Google Scholar 

  121. R.A. Howard (1960): “Dynamic programming and Markov processes”, MIT Press, Cambridge.

    Google Scholar 

  122. R.A. Howard (1963): “Semi-Markovian decision processes”, Proceedings International Statistical Institute, Ottawa, Canada.

    Google Scholar 

  123. Y. Huang and L.C.M. Kallenberg (1994): “On finding optimal policies for Markov decision chains: a unifying framework for mean-variance trade-offs”, Mathematics of Operations Research 19, 434–448.

    Article  Google Scholar 

  124. G. Hübner (1977): “Improved procedures for eliminating suboptimal actions in Markov programming by the use of contraction properties”, Transactions of the 7th Prague Conference on Information Theory, Statistical Decision Functions, Reidel, Dordrecht, 257–263.

    Google Scholar 

  125. G. Hübner (1988): “A unified approach to adaptive control of average reward Markov decision processes”, OR Spektrum 10, 161–166.

    Article  Google Scholar 

  126. D. Iglehart (1963): “Optimality of (s, S)-policies in the infinite horizon dynamic inventory problem”, Management Science 9, 259–267.

    Article  Google Scholar 

  127. T. Ishikida and P. Varaiya (1994): “Multi-armed bandit problem revis- ited”, Journal of Optimization Theory and Applications 83, 113–154.

    Article  Google Scholar 

  128. R.G. Jeroslow (1972): “An algorithm for discrete dynamic programming with interest rates near zero”, Management Science Research Report no. 300, Carnegie-Mellon University, Pittsburgh.

    Google Scholar 

  129. W.S. Jewell (1963a): “Markov renewal programming. I: Formulation, finite return models”, Operations Research 11, 938–948.

    Article  Google Scholar 

  130. W.S. Jewell (1963b): “Markov renewal programming. II: Infinite return models, example”, Operations Research 11, 949–971.

    Article  Google Scholar 

  131. L.C.M. Kallenberg (1981a): “Finite horizon dynamic programming and linear programming”, Methods of Operations Research 43, 105–112.

    Google Scholar 

  132. L.C.M. Kallenberg (1981b): “Unconstrained and constrained dynamic programming over a finite horizon”, Report, University of Leiden, The Netherlands.

    Google Scholar 

  133. L.C.M. Kallenberg (1981c): “Linear programming to compute a bias-optimal policy”, in: B. Fleischmann et al. (eds.) “Operations Research Proceedings”, 433–440.

    Google Scholar 

  134. L.C.M. Kallenberg (1983): “Linear programming and finite Markovian control problems”, Mathematical Centre Tract 148, Mathematical Centre, Amsterdam.

    Google Scholar 

  135. L.C.M. Kallenberg (1986): “Note on M.N.Katehakis and Y.-R.Chen’s computation of the Gittins index”, Mathematics of Operations Research 11, 184–186.

    Article  Google Scholar 

  136. L.C.M. Kallenberg (1992): “Separable Markovian decision problem: the linear programming method in the multichain case”, OR Spektrum 14, 43–52.

    Article  Google Scholar 

  137. L.C.M. Kallenberg (1999): “Combinatorial problems in MDPs”, Report, University of Leiden, The Netherlands (to appear in the Proceedings of the Changsha International Workshop on Markov Processes & Controlled Markov Chains).

    Google Scholar 

  138. P.C. Kao (1973): “Optimal replacement rules when the changes of states are semi-Markovian”, Operations Research 21, 1231–1249.

    Article  Google Scholar 

  139. M.N. Katehakis and C. Derman (1984): “Optimal repair allocation in a series system”, Mathematics of Operations Research 9, 615–623.

    Article  Google Scholar 

  140. M.N. Katehakis and C. Derman (1989): “On the maintenance of systems composed of highly reliable components”, Management Science 35, 551–560.

    Article  Google Scholar 

  141. M.N. Katehakis and A.F. Veinott, Jr. (1987): “The multi-armed bandit problem: decomposition and computation”, Mathematics of Operations Research 12, 262–268.

    Article  Google Scholar 

  142. H. Kawai (1987): “A variance minimization problem for a Markov decision process”, European Journal of Operational Research 31, 140–145.

    Article  Google Scholar 

  143. H. Kawai and N. Katoh (1987): “Variance constrained Markov decision process”, Journal of the Operations Research Society of Japan 30, 88–100.

    Google Scholar 

  144. J.G. Kemeny and J.L. Snell (1960): “Finite Markov chains”, Van Nostrand, Princeton.

    Google Scholar 

  145. M. Klein (1962): “Inspection-maintenance-replacement schedules under Markovian deterioration”, Management Science 9, 25–32.

    Article  Google Scholar 

  146. P. Kolesar (1966): “Minimum-cost replacement under Markovian deterioration”, Management Science 12, 694–706.

    Article  Google Scholar 

  147. M. Kurano (1983): “Adaptive policies in Markov decision processes with uncertain transition matrices”, Journal of Information and Optimization Sciences 4, 21–40.

    Google Scholar 

  148. H. Kushner (1971): “Introduction to stochastic control”, Holt, Rineholt and Winston, New York.

    Google Scholar 

  149. H. Kushner and A.J. Keinmann (1971): “Accelerated procedures for the solution of discrete Markov control problems”, IEEE Transactions on Automatic Control 16, 147–152.

    Article  Google Scholar 

  150. E. Lanery (1967): “Etude asymptotique des systèmes Markovien à commande”, Revue d’Informatique et Recherche Operationelle 1, 3–56.

    Google Scholar 

  151. J.B. Lasserre (1994a): “A new policy iteration scheme for Markov decision processes using Schweitzer’s formula”, Journal of Applied Probability 31, 268–273.

    Article  Google Scholar 

  152. J.B. Lasserre (1994b): “Detecting optimal and non-optimal actions in average-cost Markov decision processes”, Journal of Applied Probability 31, 979–990.

    Article  Google Scholar 

  153. W. Lin and P.R. Kumar (1984): “Optimal control of a queueing system with two heterogeneous servers”, IEEE Transactions on Automatic Control AC-29, 696–705.

    Google Scholar 

  154. S.A. Lippman (1969): “Criterion equivalence in discrete dynamic programming”, Operations Research 17, 920–923.

    Article  Google Scholar 

  155. J.Y. Liu and K. Liu (1994): “An algorithm on the Gittins index”, Systems Science and Mathematical Science 7, 106–114.

    Google Scholar 

  156. Q.-S. Liu and K. Ohno (1992): “Multiobjective undiscounted Markov renewal program and its application to a tool replacement problem in an FMS”, Information and Decision Techniques 18, 67–77.

    Google Scholar 

  157. Q.-S. Liu, K. Ohno and H. Nakayama (1992): “Multi-objective discounted Markov processes with expectation and variance criteria”, International Journal of System Science 23, 903–914.

    Article  Google Scholar 

  158. J.A. Loeve (1995): “Markov decision chains with partial information”, PhD dissertation, University of Leiden, The Netherlands.

    Google Scholar 

  159. W.S. Lovejoy (1987): “Some monotonicity results for partially observed Markov processes”, Operations Research 35, 736–743.

    Article  Google Scholar 

  160. W.S. Lovejoy (1991a): “Computationally feasible bounds for partially observed Markov decision processes”, Operations Research 39, 162–175.

    Article  Google Scholar 

  161. W.S. Lovejoy (1991b) “A survey of algorithmic methods for partially observed Markov decision processes”, Annals of Op. Research 28, 47–66.

    Article  Google Scholar 

  162. J. Macqueen (1966): “A modified programming method for Markovian decision problems”, Journal of Mathematical Analysis and Applications 14, 38–43.

    Article  Google Scholar 

  163. J. Macqueen (1967): “A test for suboptimal actions in Markov decision problems”, Operations Research 15, 559–561.

    Article  Google Scholar 

  164. A.S. Manne (1960): “Linear programming and sequential decisions”, Management Science, 259–267.

    Google Scholar 

  165. U. Meister and U. Holzbaur (1986): “A polynomial time bound for Howard’s policy improvement algorithm”, OR Spektrum 8, 37–40.

    Article  Google Scholar 

  166. B.L. Miller and A.F. Veinott Jr. (1969): “Discrete dynamic programming with a small interest rate”, Annals of Mathematical Statistics 40, 366–370.

    Article  Google Scholar 

  167. H. Mine and S. Osaki (1970): “Markov decision processes”, American Elsevier, New York.

    Google Scholar 

  168. G.E. Monahan (1982): “A survey of partially observable Markov decision processes: theory, models and algorithms”, Management Science 28, 1–16.

    Article  Google Scholar 

  169. T. Morton (1971): “Undiscounted Markov renewal programming via mod- ified successive approximations”, Operations Research 19, 1081–1089.

    Article  Google Scholar 

  170. J.L. Nazareth and R.B. Kulkarni (1986): “Linear programming formulations of Markov decision processes”, OR Letters 5, 13–16.

    Google Scholar 

  171. M.K. Ng (1999): “A note on policy iteration algorithms for discounted Markov decision problems”, OR Letters 25, 195–197.

    Google Scholar 

  172. A. Odoni (1969): “On finding the maximal gain for Markov decision processes”, Operations Research 17, 857–860.

    Article  Google Scholar 

  173. S. Oezekici (1988): “Optimal periodic replacement of multicomponent reliability systems”, Operations Research 36, 542–552.

    Article  Google Scholar 

  174. K. Ohno (1981): “A unified approach to algorithms with a suboptimality test in discounted semi-Markov decision processes”, Journal of the Operations Research Society of Japan 24, 296–323.

    Google Scholar 

  175. S. Osaki and H. Mine (1968): “Linear programming algorithms for semi-Markovian decision processes”, Journal of Mathematical Analysis and Applications 22, 356–381.

    Article  Google Scholar 

  176. T. Parthasarathy, S.H. Tijs and O.J. Vrieze (1984), “Stochastic games with state independent transitions and reparable rewards in: G. Hammer and D. Pallaschke (eds.), Selected Topics in Operations Research and Mathematical Economics.

    Google Scholar 

  177. L.K. Platzman (1977): “Improved conditions for convergence in undiscounted Markov renewal programming”, Op. Research 25, 529–533.

    Article  Google Scholar 

  178. M.A. Pollatschek and B. Avi-Itzhak (1969): “Algorithms for stochastic games with geometric interpretation”, Management Science 15, 399–415.

    Article  Google Scholar 

  179. E.L. Porteus (1971): “Some bounds for discounted sequential decision processes”, Management Science 18, 7–11.

    Article  Google Scholar 

  180. E.L. Porteus (1975): “Bounds and transformations for discounted finite Markov decision chains”, Operations Research 23, 761–784.

    Article  Google Scholar 

  181. E.L. Porteus (1980a): “Improved iterative computation of the expected return in Markov and semi-Markov chains”, Zeitschrift für Operations Research 24, 155–170.

    Google Scholar 

  182. E.L. Porteus (1980b): “Overview of iterative methods for discounted finite Markov and semi-Markov chains”, in: R. Hartley, L.C. Thomas and D.J. White (eds.), “Recent development in Markov decision processes”, Academic Press, New York, 1–20.

    Google Scholar 

  183. E.L. Porteus (1981): “Computing the discounted return in Markov and semi- Markov chains”, Naval Research Logistics Quarterly 28, 567–577.

    Article  Google Scholar 

  184. E. L. Porteus and J.C. Totten (1978): “Accelerated computation of the expected discounted return in a Markov chain”, Operations Research 26, 350–358.

    Article  Google Scholar 

  185. M.L. Puterman (1981): “Computational methods for Markov decision methods”, Proceedings of 1981 Joint Automatic Control Conference.

    Google Scholar 

  186. M.L. Puterman (1994): “Markov decision processes”, Wiley, New York.

    Book  Google Scholar 

  187. M.L. Puterman and S.L. Brumelle (1979): “On the convergence of policy iteration in stationary dynamic programming”, Mathematics of Operations Research 4, 60–69.

    Article  Google Scholar 

  188. M.L. Puterman and M.C. Shin (1978): “Modified policy iteration algorithms for discounted Markov decision chains”, Management Science 24, 1127–1137.

    Article  Google Scholar 

  189. M.L. Puterman and M.C. Shin (1982): “Action elimination procedures for modified policy iteration algorithms” Operations Research 30, 301–318.

    Article  Google Scholar 

  190. D. Reetz (1973): “Solution of a Markovian decision problem by successive overrelaxation”, Zeitschrift für Operations Research 17, 29–32.

    Google Scholar 

  191. D. Reetz (1976): “A decision exclusion algorithm for a class of Markovian decision processes”, Zeitschrift für Operations Research 20, 125–131.

    Google Scholar 

  192. U. Rieder (1991): “Structural results for partially observed control problems”, Zeitschrift für Operations Research 35, 473–490.

    Google Scholar 

  193. R. Righter (1994): “Scheduling”, in: M. Shaked and J.G. Shantikumar (eds.), “Stochastic orders and their applications”, Academic Press, 381–432.

    Google Scholar 

  194. M. Roosta (1982): “Routing through a network with maximum reliability”, Journal of Mathematical Analysis and Applications 88, 341–347.

    Article  Google Scholar 

  195. K.W. Ross (1989): “Randomized and past-dependent policies for Markov decision processes with multiple constraints”, Operations Research 37, 474–477.

    Article  Google Scholar 

  196. K.W. Ross and R. Varadarajan (1991): “Multichain Markov decision processes with a sample path constraint: a decomposition approach”, Mathematics of Operations Research 16, 195–207.

    Article  Google Scholar 

  197. S.M. Ross (1969): “A problem in optimal search and stop”, Operations Research 17, 984–992.

    Article  Google Scholar 

  198. S.M. Ross (1970): “Applied probability models with optimization applications”, Holden-Day, San Francisco.

    Google Scholar 

  199. S.M. Ross (1974): “Dynamic programming and gambling models”, Advances in Applied Probability 6, 593–606.

    Article  Google Scholar 

  200. S.M. Ross (1983): “Introduction to stochastic dynamic programming”, Academic Press, New York.

    Google Scholar 

  201. U.G. Rothblum (1979): “Iterated successive approximation for sequential decision processes”, in J.W.B. van Overhagen and H.C. Tijms (eds.), “Stochastic control and optimization”, Free University, Amsterdam, 30–32.

    Google Scholar 

  202. H. Scarf (1960): “The optimality of (s, S) polices in the dynamic inventory problem”, Chapter 13 in: K.J. Arrow, S. Karlin and P. Suppes (eds.), “Mathematical methods in the social sciences”, Stanford University Press, Stanford.

    Google Scholar 

  203. H. Schellhaas (1974): “Zur extrapolation in Markorffschen Entscheidungsmodellen mit Diskontierung”, Zeitschrift für Operations Research 18, 91–104.

    Google Scholar 

  204. N. Schmitz (1985): “How good is Howard’s policy improvement algorithm?”, Zeitschrift fur Operations Research 29, 315–316.

    Google Scholar 

  205. L. Schrage (1968): “A proof of the optimality of the shortest remaining processing time discipline”, Operations Research 16, 687–690.

    Article  Google Scholar 

  206. P.J. Schweitzer (1965): “Perturbation theory and Markovian decision processes”, Ph.D. dissertation, M.I.T., Op. Research Center Report 15.

    Google Scholar 

  207. P.J. Schweitzer (1968): “Perturbation theory and finite Markov chains” Journal of Applied Probability 5, 401–413.

    Article  Google Scholar 

  208. P.J. Schweitzer (1971a): “Multiple policy improvements in undiscounted Markov renewal programming”, Operations Research 19. 784–793.

    Article  Google Scholar 

  209. P.J. Schweitzer (1971b): “Iterative solution of the functional equations of undiscounted Markov renewal programming”, Journal of Mathematical Analysis and Applications 34, 495–501.

    Article  Google Scholar 

  210. P.J. Schweitzer (1984): “A value-iteration scheme for undiscounted multichain Markov renewal programs”, ZOR—Zeitschrift für Operations Research 28, 143–152.

    Google Scholar 

  211. P.J. Schweitzer (1985): “The variational calculus and approximations in policy space for Markov decision processes”, Journal of Mathematical Analysis and Applications 110, 568–582.

    Article  Google Scholar 

  212. P.J. Schweitzer (1987): “A Brouwer fixed-point mapping approach to communicating Markov decision processes”, Journal of Mathematical Analysis and Applications 123, 117–130.

    Article  Google Scholar 

  213. P.J. Schweitzer (1991): “Block-scaling of value-iteration for discounted Markov renewal programming”, Annals of Op. Research 29, 603–630.

    Article  Google Scholar 

  214. P.J. Schweitzer and A. Federgruen (1977): “The asymptotic behavior of value iteration in Markov decision problems”, Mathematics of Operations Research 2, 360–381.

    Article  Google Scholar 

  215. P.J. Schweitzer and A. Federgruen (1978a): “Foolproof convergence in multichain policy iteration”, Journal of Mathematical Analysis and Applications 64, 360–368.

    Article  Google Scholar 

  216. P.J. Schweitzer and A. Federgruen (1978b): “The functional equations of undiscounted Markov renewal programming”, Mathematics of Operations Research 3, 308–321.

    Article  Google Scholar 

  217. P.J. Schweitzer and A. Federgruen (1979): “Geometric convergence of value iteration in multichain Markov decision problems”, Advances of Applied Probability 11, 188–217.

    Article  Google Scholar 

  218. L.I. Sennott (1999): “Stochastic dynamic programming and the control of queueing systems”, Wiley, New York.

    Google Scholar 

  219. E.L. Sernik and S.I. Markus (1991): “On the computation of the optimal cost function for discrete time Markov models with partial observations”, Annals of Operations Research 29, 471–512.

    Article  Google Scholar 

  220. J.F. Shapiro (1975): “Brouwer’s fixed point theorem and finite state space Markovian decision theory”, Journal of Mathematical Analysis and Applications 49, 710–712.

    Article  Google Scholar 

  221. L.S. Shapley (1953): “Stochastic games”, Proceedings of the National Academy of Sciences, 1095–1100.

    Google Scholar 

  222. Y.S. Sherif and M.L. Smith (1981): “Optimal maintenance policies for systems subject to failure—A review”, Naval Research Logistics Quarterly 28, 47–74.

    Article  Google Scholar 

  223. K. Sladky (1974): “On the set of optimal controls for Markov chains with rewards”, Kybernatika 10, 350–367.

    Google Scholar 

  224. R.D. Smallwood (1966): “Optimum policy regions for Markov processes with discounting”, Operations Research 14, 658–669.

    Article  Google Scholar 

  225. R.D. Smallwood and E.Sondik (1973): “The optimal control of partially observable Markov processes over a finite horizon”, Operations Research 21, 1071–1088.

    Article  Google Scholar 

  226. D.R. Smith (1978): “Optimal repairman allocation—asymptotic results”, Management Science 24, 665–674.

    Article  Google Scholar 

  227. M.J. Sobel (1981), “Myopic solutions of Markov decision processes and stochastic games”, Operations Research 29, 995–1009.

    Article  Google Scholar 

  228. M.J. Sobel (1985): “Maximal mean/standard deviation ratio in an undiscounted MDP”, OR Letters 4, 157–159.

    Google Scholar 

  229. M.J. Sobel (1994): “Mean-variance trade-offs in an undiscounted MDP”, Operations Research 42, 175–183.

    Article  Google Scholar 

  230. E. Sondik (1978): “The optimal control of partially observable Markov processes over the infinite horizon: discounted costs”, Operations Research 26, 282–304.

    Article  Google Scholar 

  231. I.M. Sonin (1999): “The elimination algorithm for the problem of optimal stopping”, Mathematical Methods of Operations Research 49, 111–124.

    Google Scholar 

  232. D. Spreen (1981): “A further anti-cycling rule in multi-chain policy iter- ation for undiscounted Markov renewal programs”, Zeitschrift für Operations Research 25, 225–234.

    Google Scholar 

  233. J. Stein (1988): “On efficiency of linear programming applied to dis- counted Markovian decision problems”, OR Spektrum 10, 153–160.

    Article  Google Scholar 

  234. S.S. Stidham, Jr. (1985): “Optimal control of admission to a queueing system”, IEEE Transactions on Automatic Control AC-30, 705–713.

    Google Scholar 

  235. S.S. Stidham, Jr. and R.R. Weber (1993): “A survey of Markov decision models for control of networks of queues”, Queueing Systems 13, 291–314.

    Article  Google Scholar 

  236. J. Stoer and R. Bulirsch (1980): “Introduction to numerical analysis”, Springer-Verlag, New York.

    Google Scholar 

  237. R. Strauch and A.F. Veinott, Jr. (1966): “A property of sequential control processes”, Report, Rand McNally, Chicago.

    Google Scholar 

  238. M. Sun (1993): “Revised simplex algorithm for finite Markov decision processes”, Journal of Optimization Theory and Applications 79, 405–413.

    Article  Google Scholar 

  239. L.C. Thomas (1981): “Second order bounds for Markov decision processes”, Journal of Mathematical Analysis and Applications 80, 294–297.

    Article  Google Scholar 

  240. L.C. Thomas (1983): “Constrained Markov decision processes as multiobjective problems”, in: “Multi-objective decision making”, Academic Press, 77–94.

    Google Scholar 

  241. H.C. Tijms (1986): “Stochastic modelling and analysis: a computational approach”, Wiley, Chichester.

    Google Scholar 

  242. J.N. Tsitsiklis (1986): “A lemma on the multi-armed bandit problem”, IEEE Transactions on Automatic Control 31, 576–577.

    Article  Google Scholar 

  243. J.N. Tsitsiklis (1993): “A short proof of the Gittins index theorem”, Annals of Applied Probability 4, 194–199.

    Article  Google Scholar 

  244. F.A. Van der Duyn Schouten and S.G. Vanneste (1990): “Analysis and computation of (n, N)-strategies for maintenance of a two-component system”, European Journal of Operations Research 48, 260–274.

    Article  Google Scholar 

  245. J. Van der Wal (1980): “The method of value oriented successive approximations for the average reward Markov decision processes”, OR Spektrum 1, 233–242.

    Article  Google Scholar 

  246. J. Van der Wal (1981): “Stochastic dynamic programming”, Mathematical Centre Tract 139, Mathematical Centre, Amsterdam.

    Google Scholar 

  247. K.M. Van Hee (1978): “Markov strategies in dynamic programming”, Mathematics of Operations Research 3, 191–201.

    Google Scholar 

  248. K.M. Van Hee, A. Hordijk and J. Van der Wal (1977): “Successive approximations for convergent dynamic programming”, in: H.C. Tijms and J. Wessels (eds.), “Markov decision theory”, Mathematical Centre Tract no. 93, Mathematical Centre, Amsterdam, 183–211.

    Google Scholar 

  249. J.A.E.E. Van Nunen (1976a): “A set of successive approximation method for discounted Markovian decision problems”, Zeitschrift für Operations Research 20, 203–208.

    Google Scholar 

  250. J.A.E.E. Van Nunen (1976b): “Contracting Markov decision processes”, Mathematical Centre Tract 71, Mathematical Centre, Amsterdam.

    Google Scholar 

  251. J.A.E.E. Van Nunen (1976c): “Improved successive approximation methods for discounted Markovian decision processes”, in: A. Prekopa (ed.), “Progress in Operations Research”, North Holland, Amsterdam, 667–682.

    Google Scholar 

  252. J.A.E.E. Van Nunen and J. Wessels (1976): “A principle for generating optimization procedures for discounted Markov decision processes”, Colloquia Mathematica Societatis Bolyai Janos, Vol. 12, North Holland, Amsterdam, 683–695.

    Google Scholar 

  253. J.A.E.E. Van Nunen and J. Wessels (1977): “The generation of successive approximations for Markov decision processes using stopping times”, in: “Markov decision theory”, H. Tijms and J. Wessels (eds.), Mathematical Centre Tract 93, Mathematical Centre, Amsterdam, 25–37.

    Google Scholar 

  254. P.P. Varaiya, J.C. Walrand and C. Buyukkoc (1985): “Extensions of the multi-armed bandit problem: the discounted case”, IEEE Transactions on Automatic Control 30, 426–439.

    Article  Google Scholar 

  255. A.F. Veinott, Jr. (1966a): “On the optimality of (s, S) inventory policies: new condition and a new proof”, SIAM Journal on Applied Mathematics 14, 1067–1083.

    Article  Google Scholar 

  256. A.F. Veinott, Jr. (1966b): “On finding optimal policies in discrete dynamic programming with no discounting”, Annals of Math. Stats. 37, 1284–1294.

    Article  Google Scholar 

  257. A.F. Veinott, Jr. (1969): “Discrete dynamic programming with sensitive discount optimality criteria”, Annals of Math. Stats. 40, 1635–1660.

    Article  Google Scholar 

  258. A.F. Veinott, Jr. (1974): “Markov decision chains”, in: G.B. Dantzig and B.C. Eaves (eds.), “Studies in Optimization”, Studies in Mathematics, Volume 10, The Mathematical Association of America, 124–159.

    Google Scholar 

  259. R.C. Vergin and M. Scribian (1977): “Maintenance scheduling for multicomponent equipment”, AIIE Transactions 9, 297–305.

    Article  Google Scholar 

  260. O.J. Vrieze, (1987): “Stochastic games with finite state and action spaces”, CWI Tract 33, Centre for Mathematics and Computer Science, Amsterdam.

    Google Scholar 

  261. K. Wakuta (1992): “Optimal stationary policies in the vector-valued Markov decision process”, Stochastic Processes and its Applications 42, 149–156.

    Article  Google Scholar 

  262. K. Wakuta (1995): “Vector-valued Markov decision processes and the systems of linear inequalities”, Stochastic Processes and its Applications 56, 159–169.

    Article  Google Scholar 

  263. K. Wakuta (1996): “A new class of policies in vector-valued Markov deci- sion processes”, Journal of Mathematical Analysis and Applications 202, 623–628.

    Article  Google Scholar 

  264. K. Wakuta (1999): “A note on the structure of value spaces in vector-valued Markov decision processes”, Mathematical Methods of Operations Research 49, 77–86.

    Google Scholar 

  265. J. Walrand (1988): “An introduction to queueing networks”, Prentice-Hall, Englewood Cliffs, New Jersey.

    Google Scholar 

  266. R.R. Weber (1982): “Scheduling jobs with stochastic processing requirements on parallel machines to minimize makespan or flowtime.

    Google Scholar 

  267. R.R. Weber (1992): “On the Gittins index for multi-armed bandits”, Annals of Applied Probability 2, 1024–1033.

    Article  Google Scholar 

  268. R.R. Weber and S.S. Stidham, Jr. (1987): “Optimal control of services rates in networks of queues”, Advances in Applied Probability 19, 202–218.

    Article  Google Scholar 

  269. G. Weiss (1982): “Multiserver stochastic scheduling”, in: M.A.H. Dempster, J.K. Lenstra and A.H.G. Rinnooy Kan (eds.), “Deterministic and stochastic scheduling”, Reidel, Dordrecht, Holland, 157–179.

    Chapter  Google Scholar 

  270. G. Weiss (1988): “Branching bandit processes”, Probability in the Engineering and Information Sciences 2, 269–278.

    Article  Google Scholar 

  271. J. Wessels and J.A.E.E. Van Nunen (1975): “Discounted semi-Markov decision processes: linear programming and policy iteration”, Statistical Neerlandica 29, 1–7.

    Article  Google Scholar 

  272. J. Wessels (1977): “Stopping times on Markov programming”, in: Transactions of the 7th Prague Conference on Information Theory, Statistical Decision Functions and Random Processes, Academia, Prague, pp. 575–585.

    Google Scholar 

  273. C.C. White, III (1976): “Procedures for the solution of a finite-horizon, partially observed, semi-Markov optimization problem”, Operations Research 24, 348–358.

    Article  Google Scholar 

  274. C.C. White, III (1991): “A survey of solution techniques for the partially observed Markov decision process”, Annals of Operations Research 33, 215–230.

    Article  Google Scholar 

  275. C.C. White, III and W.T. Scherer (1989): “Solution procedures for par- tially observed Markov decision processes”, Op. Research 37, 791–797.

    Article  Google Scholar 

  276. C.C. White, III and W.T. Scherer (1994): “Finite-memory suboptimal design for partially observed Markov decision processes”, Op. Research 42, 439–455.

    Article  Google Scholar 

  277. D.J. White (1963): “Dynamic programming, Markov chains, and the method of successive approximations”, Journal of Mathematical Analysis and Applications 6, 373–376.

    Article  Google Scholar 

  278. D.J. White (1978): “Elimination of non-optimal actions in Markov decision processes”, in: M.L. Puterman (ed.) Dynamic programming and its applications, Academic Press, New York, 131–160.

    Google Scholar 

  279. D.J. White (1982): “Multi-objective infinite-horizon discounted Markov decision processes”, Journal of Mathematical Analysis and Applications 89, 639–647.

    Article  Google Scholar 

  280. D.J. White (1985): “Real applications of Markov decision theory”, Interfaces 15:6, 73–83.

    Article  Google Scholar 

  281. D.J. White (1988): “Further real applications of Markov decision theory”, Interfaces 18:5, 55–61

    Article  Google Scholar 

  282. D.J. White (1988): “Mean, variance and probabilistic criteria in finite Markov decision processes: a review”, Journal of Optimization Theory and Applications 56, 1–30.

    Article  Google Scholar 

  283. D.J. White (1992): “Computational approaches to variance-penalized Markov decision processes”, OR Spektrum 14, 79–83.

    Article  Google Scholar 

  284. D.J. White (1993): “A survey of applications of Markov decision processes”, Journal of the Operational Research Society 44, 1073–1096.

    Google Scholar 

  285. D.J. White (1993): “Markov decision processes”, Wiley, Chichester.

    Google Scholar 

  286. D.J. White (1994): “A mathematical programming approach to a problem in variance penalised Markov decision processes”, OR Spektrum 15, 225–230.

    Article  Google Scholar 

  287. D.J. White (1995): “A superharmonic approach to solving infinite horizon partially observable Markov decision problems”, Mathematical Methods of Operations Research 41, 71–88.

    Article  Google Scholar 

  288. P. Whittle (1980): “Multi-armed bandits and the Gittins index”, Journal of the Royal Statistical Society, Series B 42, 143–149.

    Google Scholar 

  289. P. Whittle (1982): “Optimization over time; dynamic programming and stochastic control”, Volume I, Wiley, New York.

    Google Scholar 

  290. P. Whittle (1982): “Optimization over time; dynamic programming and stochastic control”, Volume II, Wiley, New York.

    Google Scholar 

  291. M. Yasuda (1988): “The optimal value of Markov stopping problems with one-step look ahead policy”, Journal of Applied Probability 25, 544–552.

    Article  Google Scholar 

  292. Y.-S. Zheng and A. Federgruen (1991): “Finding optimal (s, S)-policies is about as ssimple as evaluating a single policy”, Op. Research 39, 654–665.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kallenberg, L. (2003). Finite State and Action MDPS. In: Feinberg, E.A., Shwartz, A. (eds) Handbook of Markov Decision Processes. International Series in Operations Research & Management Science, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0805-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-0805-2_2

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5248-8

  • Online ISBN: 978-1-4615-0805-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics