Finite State and Action MDPS

  • Lodewijk Kallenberg
Part of the International Series in Operations Research & Management Science book series (ISOR, volume 40)

Abstract

In this chapter we study Markov decision processes (MDPs) with finite state and action spaces. This is the classical theory developed since the end of the fifties. We consider finite and infinite horizon models. For the finite horizon model the utility function of the total expected reward is commonly used. For the infinite horizon the utility function is less obvious. We consider several criteria: total discounted expected reward, average expected reward and more sensitive optimality criteria including the Blackwell optimality criterion. We end with a variety of other subjects.

The emphasis is on computational methods to compute optimal policies for these criteria. These methods are based on concepts like value iteration, policy iteration and linear programming. This survey covers about three hundred papers. Although the subject of finite state and action MDPs is classical, there are still open problems. We also mention some of them.

Keywords

Hull Stein Nite Cani Univer 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    S.C. Albright, (1979): “Structural results for partially observable Markov decision processes”, Operations Research 27, 1041–1053.CrossRefGoogle Scholar
  2. [2]
    E. Altman, (1999): “Constrained Markov decision processes”, Chapman & Hall/CRC, Boca Raton, Florida.Google Scholar
  3. [3]
    E. Altman, A. Hordijk and L.C.M. Kallenberg (1996): “On the value function in constrained control of Markov chains”, Mathematical Methods of Operations Research 44, 387–399.CrossRefGoogle Scholar
  4. [4]
    E. Altman and A. Shwartz (1991a): “Sensitivity of constrained Markov decision processes”, Annals of Operations Research 33, 1–22.CrossRefGoogle Scholar
  5. [5]
    E. Altman and A. Shwartz (1991b): “Adaptive control of constrained Markov chains”, IEEE-Transactions on Automatic Control 36, 454–462.CrossRefGoogle Scholar
  6. [6]
    E. Altman and A. Shwartz (1991c): “Adaptive control of constrained Markov decision chains: criteria and policies”, Annals of Operations Research 28, 101–134.CrossRefGoogle Scholar
  7. [7]
    E. Altman and A. Shwartz (1991): “Sensitivity of constrained Markov decision processes”, Annals of Operations Research 33, 1–22.CrossRefGoogle Scholar
  8. [8]
    E. Altman and F.M. Spieksma (1995): “The linear program approach in Markov decision processes”, Mathematical Methods of Operations Research 42, 169–188.CrossRefGoogle Scholar
  9. [9]
    J.S. Baras, D.J. Ma and A.M. Makowsky (1985): “K competing queues with linear costs and geometric service requirements: the µc-rule is always optimal” Systems Control Letters 6, 173–180.CrossRefGoogle Scholar
  10. [10]
    J. Bather (1973a): “Optimal decision procedures for finite Markov chains. Part II: Communicating systems”, Advances in Applied Probability 5, 521–540.CrossRefGoogle Scholar
  11. [11]
    J. Bather (1973b): “Optimal decision procedures for finite Markov chains. Part III: General convex systems”, Advances in Applied Probability 5, 541–553.CrossRefGoogle Scholar
  12. [12]
    M. Bayal-Gursoy and K.W. Ross (1992): “Variability-sensitivity Markov decision processes”, Mathematics of Operations Research 17, 558–571.CrossRefGoogle Scholar
  13. [13]
    R. Bellman (1957): “Dynamic programming”, Princeton University Press, Princeton.Google Scholar
  14. [14]
    A. Ben-Israel and S.D. Flam (1990): “A bisection/successive approximation method for computing Gittins indices”, Zeitschrift für Operations Research 34, 411–422.Google Scholar
  15. [15]
    D.P. Bertsekas (1976): “Dynamic programming and stochastic control”, Academic Press, New York.Google Scholar
  16. [16]
    D.R Bertsekas (1976b): “On error bounds for successive approximation methods”, IEEE Transactions on Automatic Control 21, 394–396.CrossRefGoogle Scholar
  17. [17]
    D.R Bertsekas (1987): “Dynamic programming: deterministic and stochastic models”, Prentice-Hall, Englewood Cliff.Google Scholar
  18. [18]
    D.R Bertsekas (1995): “Dynamic programming and optimal control I”, Athena Scientific, Belmont, Massachusetts.Google Scholar
  19. [19]
    D.R Bertsekas (1995): “Dynamic programming and optimal control II”, Athena Scientific, Belmont, Massachusetts+.Google Scholar
  20. [20]
    D.R. Bertsekas (1995c): “Generic rank-one corrections for value iteration in Markovian decision problems”, OR Letters 17, 111–119.Google Scholar
  21. [21]
    D.R. Bertsekas (1998): “A new value iteration method for the average cost dynamic programming problem”, SIAM Journal on Control and Optimization 36, 742–759.CrossRefGoogle Scholar
  22. [22]
    D.R Bertsekas and S.E. Shreve (1978) “Stochastic Optimal Control”, Academic Press, New York.Google Scholar
  23. [23]
    D.P. Bertsekas and J.N. Tsitsiklis (1991): “An analysis of stochastic shortest path problems”, Mathematics of Operations Research 16, 580–595.CrossRefGoogle Scholar
  24. [24]
    D. Bertsimas and J. Niño-Mora (1996): “Conservations laws, extended polymatroids and multi-armed bandit problems; a polyhedral approach to indexable systems”, Mathematics of Operations Research 21, 257–306.CrossRefGoogle Scholar
  25. [25]
    F.J. Beutler and K.W. Ross (1985): “Optimal policies for controlled Markov chains with a constraint”, Journal of Mathematical Analysis and Applications 112, 236–252.CrossRefGoogle Scholar
  26. [26]
    K.-J. Bierth (1987): “An expected average reward criterion”, Stochastic Processes and Applications 26, 133–140.CrossRefGoogle Scholar
  27. [27]
    D. Blackwell (1962): “Discrete dynamic programming”, Annals of Mathematical Statistics, 719–726.Google Scholar
  28. [28]
    L. Breiman (1964): “Stopping-rule problems”, in: E.F. Beckenbach (ed.), Applied Combinatorial Mathematics”, Wiley, New York, 284–319.Google Scholar
  29. [29]
    B.W. Brown (1965): “On the iterative method of dynamic programming on a finite space discrete time Markov process”, Annals of Mathematical Statistics 36, 1279–1285.CrossRefGoogle Scholar
  30. [30]
    J. Bruno, P. Downey and G.N. Frederickson (1981): “Sequencing tasks with exponential service times to minimize the expected flowtime or makespan”, Journal of the Association for Computing Machinery 28, 100–113.CrossRefGoogle Scholar
  31. [31]
    A.N. Burnetas, and M.N. Katehakis (1997): “Optimal adaptive policies for Markov decision processes”, Mathematics of Op. Research 22, 222–255.CrossRefGoogle Scholar
  32. [32]
    R. Cavazos-Cadena (1991): “Nonparametric estimation and adaptive control in a class of finite Markov decision chains”, Annals of Operations Research 28, 169–184.CrossRefGoogle Scholar
  33. [33]
    C.-S. Chang, A. Hordijk, R. Righter and G. Weiss (1994): “The stochastic optimality of SEPT in parallel machine scheduling”, Probability in the Engineering and Information Sciences 8, 179–188.CrossRefGoogle Scholar
  34. [34]
    M.C. Chen, Jr. (1973): “Optimal stopping in a discrete search problem”, Operations Research 21, 741–747.CrossRefGoogle Scholar
  35. [35]
    Y.-R. Chen and M.N. Katehakis (1986): “Linear programming for finite state bandit problems”, Mathematics of Operations Research 11, 180–183.CrossRefGoogle Scholar
  36. [36]
    Y.S. Chow and H. Robbins (1961): “A martingale system theorem and applications” in: J. Neyman (ed), “Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability”, Vol.1, University of Berkeley Press, Berkeley, 93–104.Google Scholar
  37. [37]
    K.-J. Chung (1989): “A note on maximal mean/standard deviation ratio in an undiscounted MDP”, OR Letters 8, 201–204.Google Scholar
  38. [38]
    K.-J. Chung (1992): “Remarks on maximal mean/standard deviation ratio in an undiscounted MDPs”, Opimization 26, 385–392.CrossRefGoogle Scholar
  39. [39]
    K.-J. Chung (1994): “Mean-variance trade-offs in an undiscounted MDP: the unichain case”, Operations Research 42, 184–188.CrossRefGoogle Scholar
  40. [40]
    G.B. Dantzig (1963): “Linear programming and extensions”, Princeton University Press, Princeton, New Jersey.Google Scholar
  41. [41]
    J.S. De Cani (1964): “A dynamic programming algorithm for embedded Markov chains when the planning horizon is at infinity”, Management Science 10, 716–733.CrossRefGoogle Scholar
  42. [42]
    G.T. De Ghellinck (1960): “Les problèmes de décisions séquentielles”, Cahiers du Centre de Recherche Opérationelle, 161–179.Google Scholar
  43. [43]
    G.T. De Ghellinck and G.D. Eppen (1967): “Linear programming solu- tions for separable Markovian decision problems”, Management Science 13, 371–394.CrossRefGoogle Scholar
  44. [44]
    R.S. Dembo and M. Haviv (1984): “Truncated policy iteration methods”, OR Letters 3, 243–246.Google Scholar
  45. [45]
    E.V. Denardo (1967): “Contraction mappings in the theory underlying dynamic programming”, SIAM Review 9, 165–167.CrossRefGoogle Scholar
  46. [46]
    E.V. Denardo (1968): “Separable Markovian decision problems”, Management Science 14, 451–462.CrossRefGoogle Scholar
  47. [47]
    E.V. Denardo (1970): “Computing a bias-optimal policy in a discrete-time Markov decision problem”, Operations Research 18, 279–289.CrossRefGoogle Scholar
  48. [48]
    E.V. Denardo (1971): “Markov renewal programs with small interest rates”, Annals of Mathematical Statistics 42, 477–496.CrossRefGoogle Scholar
  49. [49]
    E.V. Denardo (1973): “A Markov decision problem”, in: T.C. Hu and S.M. Robinson (eds.), “Mathematical Programming”, Academic Press, 33–68.Google Scholar
  50. [50]
    E.V. Denardo (1982): “Dynamic programming: models and applications”, Prentice-Hall, Englewood Cliff.Google Scholar
  51. [51]
    E.V. Denardo and B.L. Fox (1968): “Multichain Markov renewal pro- grams”, SIAM Journal on Applied Mathematics 16, 468–487.CrossRefGoogle Scholar
  52. [52]
    E.V. Denardo and B.L. Miller (1968): “An optimality condition for discrete dynamic programming with no discounting”, Annals of Mathematical Statistics 39, 1220–1227.CrossRefGoogle Scholar
  53. [53]
    E.V. Denardo and U.G. Rothblum (1979a): “Optimal stopping, expo- nential utility and linear programming”, Mathematical Programming 16, 228–244.CrossRefGoogle Scholar
  54. [54]
    E.V. Denardo and U.G. Rothblum (1979b): “Overtaking optimality for Markov decision chains”, Mathematics of Operations Research 4, 144–152.CrossRefGoogle Scholar
  55. [55]
    F. D’Epenoux (1960): “Sur un problème de production et de stockage dans l’aléatoire”, Revue Française de Recherche Opérationelle, 3–16.Google Scholar
  56. [56]
    C. Derman (1962): “On sequential decisions and Markov chains”, Management Science 9, 16–24.CrossRefGoogle Scholar
  57. [57]
    C. Derman (1963): “Optimal replacement rules when changes of states are Markovian”, in: R. Bellman (ed.), “Mathematical optimization techniques”, The Rand Corporation, R-396-PR, 201–212.Google Scholar
  58. [58]
    C. Derman (1970): “Finite state Markovian decision processes”, Academic Press, New York.Google Scholar
  59. [59]
    C. Derman and M. Klein (1965): “Some remarks on finite horizon Marko- vian decision models”, Operations Research 13, 272–278.CrossRefGoogle Scholar
  60. [60]
    C. Derman and J. Sacks (1960): “Replacement of periodically inspected equipment (an optimal stopping rule)”, Naval Research Logistics Quarterly 7, 597–607.CrossRefGoogle Scholar
  61. [61]
    C. Derman and R. Strauch (1966): “A note on memoryless rules for controlling sequential control problems”, Annals of Mathematical Statistics 37, 276–278.CrossRefGoogle Scholar
  62. [62]
    C. Derman and A.F. Veinott, Jr. (1972): “Constrained Markov decision chains”, Management Science 19, 389–390.CrossRefGoogle Scholar
  63. [63]
    H.M. Dietz and V. Nollau (1983): “Markov decision problems with countable state space”, Akademie-Verlag, Berlin.Google Scholar
  64. [64]
    L. Dubins and L.J. Savage (1965): “How to gamble if you must”, McGraw-Hill, New York.Google Scholar
  65. [65]
    S. Durinovics, H.M. Lee, M.N. Katehakis and J.A. Filar (1986): “Multiobjective Markov decision processes with average reward criterion”, Large Scale Systems 10, 215–226.Google Scholar
  66. [66]
    E.B. Dynkin (1979): “Controlled Markov process”, Springer-Verlag, New York.CrossRefGoogle Scholar
  67. [67]
    J.H. Eaton and L.A. Zadeh (1962): “Optimal pursuit strategies in discrete state probabilistic systems”, Transactions ASME Series D, Journal of Basic Engineering 84, 23–29.CrossRefGoogle Scholar
  68. [68]
    A. Ephremides, P. Varaiya and J. Walrand (1980): “A simple dynamic routing problem”, IEEE Transactions on Automatic Control AC-25, 690–693.Google Scholar
  69. [69]
    A. Federgruen (1984): “Markovian control problems: functional equations and algorithms”, Mathematical Centre Tract 97, Mathematical Centre, Amsterdam.Google Scholar
  70. [70]
    A. Federgruen and P.J. Schweitzer (1978): “Discounted and undiscounted value iteration in Markov decision problems: a survey”, in: M.L. Puterman (ed), “Dynamic programming and its applications”, Academic Press, New York, 23–52.Google Scholar
  71. [71]
    A. Federgruen and P.J. Schweitzer (1980): “A survey of asymptotic value-iteration for undiscounted Markovian decision processes”, in: R. Hartley, L.C. Thomas and D.J. White (eds.), “Recent development in Markov decision processes”, Academic Press, New York, 73–109.Google Scholar
  72. [72]
    A. Federgruen and P.J. Schweitzer (1984a): “A fixed-point approach to undiscounted Markov renewal programs”, SIAM Journal on Algebraic Discrete Methods 5, 539–550.CrossRefGoogle Scholar
  73. [73]
    A. Federgruen and P.J. Schweitzer (1984b): “Successive approximation methods for solving nested functional equations in Markov decision problems”, Mathematics of Operations Research 9, 319–344.CrossRefGoogle Scholar
  74. [74]
    A. Federgruen, P.J. Schweitzer and H.C. Tijms (1978): “Contraction map- pings underlying undiscounted Markov decision problems”, Journal of Mathematical Analysis and Applications 65, 711–730.CrossRefGoogle Scholar
  75. [75]
    A. Federgruen and D. Spreen (1980): “A new specification of the multichain policy iteration algorithm in undiscounted Markov renewal programs”, Management Science 26, 1211–1217.CrossRefGoogle Scholar
  76. [76]
    A. Federgruen and P. Zipkin (1984): “An efficient algorithm for computing optimal (s, S) policies”, Operations Research 34, 1268–1285.CrossRefGoogle Scholar
  77. [77]
    E.A. Feinberg and A. Shwartz (1994): “Markov decision models with weighted discounted criteria”, Mathematics of Operations Research 19, 152–168.CrossRefGoogle Scholar
  78. [78]
    J.A. Filar, L.C.M. Kallenberg and H.M. Lee (1989): “Variance-penalized Markov decision processes”, Mathematics of Operations Research 14, 147–161.CrossRefGoogle Scholar
  79. [79]
    J.A. Filar and O. J. Vrieze (1997): “Competitive Markov decision processes”, Springer-Verlag, New York.Google Scholar
  80. [80]
    B.L. Fox (1968): “(g, w)-optima in Markov renewal programs”, Management Science 15, 210–212.CrossRefGoogle Scholar
  81. [81]
    E. Frostig (1993): “Optimal policies for machine repairmen problems”, Journal of Applied Probability 30, 703–715.CrossRefGoogle Scholar
  82. [82]
    N. Furakawa (1980): “Characterization of optimal policies in vector-valued Markovian decision processes”, Mathematics of Operations Research 5, 271–279.CrossRefGoogle Scholar
  83. [83]
    S. Gal (1984): “An O(N3) algorithm for optimal replacement problems”, SIAM Journal on Control and Optimization 22, 902–910.CrossRefGoogle Scholar
  84. [84]
    R. Garbe and K.D. Glazebrook (1998): “On a new approach to the analysis of complex multi-armed bandit problems”, Mathematical Methods of Operations Research 48, 419–442.CrossRefGoogle Scholar
  85. [85]
    J.C. Gittins (1979): “Bandit processes and dynamic allocation indices”, Journal of the Royal Statistic Society Series B 14, 148–177.Google Scholar
  86. [86]
    J.C. Gittins and D.M. Jones (1974): “A dynamic allocation index for the sequential design of experiments”, in J. Gani (ed.) “Progress in Statistics”, North Holland, Amsterdam, 241–266.Google Scholar
  87. [87]
    K.D. Glazebrook and R. Garbe (1996): “Reflections on a new approach to Gittins indexation”, Journal of the Operational Research Society 47, 1301–1309.Google Scholar
  88. [88]
    K.D. Glazebrook and S. Greatrix (1995): “On transforming an index for generalized bandit problems”, J. of App. Prob. 32, 168–182.CrossRefGoogle Scholar
  89. [89]
    K.D. Glazebrook and R.W. Owen (1991): “New results for generalized bandit problems”, International Journal of System Sciences 22, 479–494.CrossRefGoogle Scholar
  90. [90]
    M.K. Ghosh (1990): “Markov decision processes with multiple costs”, OR Letters 9, 257–260.Google Scholar
  91. [91]
    R. Grinold (1973): “Elimination of suboptimal actions in Markov decision problems”, Operations Research 21, 848–851.CrossRefGoogle Scholar
  92. [92]
    R. Hartley, A.C. Lavercombe and L.C. Thomas (1986): “Computational comparison of policy iteration algorithms for discounted Markov decision processes”, Computers and Operations Research 13, 411–420.CrossRefGoogle Scholar
  93. [93]
    N.A.J. Hastings (1968): “Some notes on dynamic programming and replacement”, Operational Research Quarterly 19, 453–464.CrossRefGoogle Scholar
  94. [94]
    N.A.J. Hastings (1969): “Optimization of discounted Markov decision problems”, Operations Research Quarterly 20, 499–500.CrossRefGoogle Scholar
  95. [95]
    N.A.J. Hastings (1971): “Bounds on the gain of a Markov decision process”, Operations Research 19, 240–243.CrossRefGoogle Scholar
  96. [96]
    N.A.J. Hastings (1976): “A test for nonoptimal actions in undiscounted finite Markov decision chains”, Management Science 23, 87–92.CrossRefGoogle Scholar
  97. [97]
    N.A.J. Hastings and J.M.C.Mello (1973): “Tests for nonoptimal actions in discounted Markov decision problems”, Management Science 19, 1019–1022.CrossRefGoogle Scholar
  98. [98]
    N.A.J. Hastings and D.Sadjani (1979): “Markov programming with policy constraints”, European Journal of Operations Research 3, 253–255.CrossRefGoogle Scholar
  99. [99]
    N.A.J. Hastings and J.A.E.E. Van Nunen (1977): “The action elimination algorithm for Markov decision processes”, in H.C. Tijms and J. Wessels (eds), “Markov decision theory”, Mathematical Centre Tract 100, 161–170, Mathematical Centre, Amsterdam.Google Scholar
  100. [100]
    M. Haviv and M.L. Puterman (1991): “An improved algorithm for solving communicating average reward Markov decision processes”, Annals of Operations Research 28, 229–242.CrossRefGoogle Scholar
  101. [101]
    M.I. Henig (1983): “Vector-valued dynamic programming”, SIAM Journal on Control and Optimization 21, 490–499.CrossRefGoogle Scholar
  102. [102]
    O. Hernández-Lerma (1987): “Adaptive Markov control processes”, Springer-Verlag, New York.Google Scholar
  103. [103]
    O. Hernández-Lerma and J. B. Lasserre (1996): “Discrete-time Markov control processes: Basic optimality criteria”, Springer-Verlag, New York.CrossRefGoogle Scholar
  104. [104]
    O. Hernández-Lerma and J. B. Lasserre (1999): “Further topics on discrete-time Markov control processes”, Springer-Verlag, New York.CrossRefGoogle Scholar
  105. [105]
    M. Herzberg and U. Yechiali (1994): “Accelerating procedures of the value iteration algorithm for discounted Markov decision processes, based on a one-step look-ahead analysis”, Operations Research 42, 940–946.CrossRefGoogle Scholar
  106. [106]
    D.P. Heyman and M. J. Sobel (1984): “Stochastic models in Operations Research, Volume II, MacGraw-Hill, New York.Google Scholar
  107. [107]
    K. Hinderer (1970): “Foundations of non-stationary dynamic programming with discrete time parameter”, Springer-Verlag, New York.CrossRefGoogle Scholar
  108. [108]
    U.D. Holzbaur (1986a): “Entscheidungsmodelle über angeordneten Körpern”, Optimization 17, 515–524.CrossRefGoogle Scholar
  109. [109]
    U.D. Holzbaur (1986b): “Sensitivitätsanalysen in Entscheidungsmodellen”, Optimization 17, 525–533.CrossRefGoogle Scholar
  110. [110]
    U.D. Holzbaur (1994): “Bounds for the quality and the number of steps in Bellman’s value iteration algorithm”, OR Spektrum 15, 231–234.CrossRefGoogle Scholar
  111. [111]
    A. Hordijk (1971): “A sufficient condition for the existence of an optimal policy with respect to the average cost criterion in Markovian decision processes”, Transactions of the Sixth Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, Academia, Prague, 263–274.Google Scholar
  112. [112]
    A. Hordijk (1974): “Dynamic programming and Markov potential theory”, Mathematical Centre Tract 51, Amsterdam.Google Scholar
  113. [113]
    A. Hordijk, R. Dekker and L.C.M. Kallenberg (1985): “Sensitivity-analysis in discounted Markovian decision problems”, OR Spektrum 7, 143–151.CrossRefGoogle Scholar
  114. [114]
    A. Hordijk and L.C.M. Kallenberg (1979): “Linear programming and Markov decision chains”, Management Science 25, 352–362.CrossRefGoogle Scholar
  115. [115]
    A. Hordijk and L.C.M. Kallenberg (1984a): “Transient policies in discrete dynamic programming: linear programming including suboptimality tests and additional constraints”, Mathematical Programming 30, 46–70.CrossRefGoogle Scholar
  116. [116]
    A. Hordijk and L.C.M. Kallenberg (1984b): “Constrained undiscounted stochastic dynamic programming”, Mathematics of Operations Research 9, 276–289.CrossRefGoogle Scholar
  117. [117]
    A. Hordijk and J.A. Loeve (1994): “Undiscounted Markov decision chains with partial information; an algorithm for computing a locally optimal periodic policy”, Mathematical Methods of Operations Research 40, 163–181.CrossRefGoogle Scholar
  118. [118]
    A. Hordijk and H.C. Tijms (1974): “The method of successive approx- imations and Markovian decision problems”, Operations Research 22, 519–521.CrossRefGoogle Scholar
  119. [119]
    A. Hordijk and H.C. Tijms (1975): “A modified form of the iterative method of dynamic programming”, Annals of Statictics 3, 203–208.CrossRefGoogle Scholar
  120. [120]
    A. Hordijk and H.C. Tijms (1975): “On a conjecture of Iglehart”, Management Science 11, 1342–1345.CrossRefGoogle Scholar
  121. [121]
    R.A. Howard (1960): “Dynamic programming and Markov processes”, MIT Press, Cambridge.Google Scholar
  122. [122]
    R.A. Howard (1963): “Semi-Markovian decision processes”, Proceedings International Statistical Institute, Ottawa, Canada.Google Scholar
  123. [123]
    Y. Huang and L.C.M. Kallenberg (1994): “On finding optimal policies for Markov decision chains: a unifying framework for mean-variance trade-offs”, Mathematics of Operations Research 19, 434–448.CrossRefGoogle Scholar
  124. [124]
    G. Hübner (1977): “Improved procedures for eliminating suboptimal actions in Markov programming by the use of contraction properties”, Transactions of the 7th Prague Conference on Information Theory, Statistical Decision Functions, Reidel, Dordrecht, 257–263.Google Scholar
  125. [125]
    G. Hübner (1988): “A unified approach to adaptive control of average reward Markov decision processes”, OR Spektrum 10, 161–166.CrossRefGoogle Scholar
  126. [126]
    D. Iglehart (1963): “Optimality of (s, S)-policies in the infinite horizon dynamic inventory problem”, Management Science 9, 259–267.CrossRefGoogle Scholar
  127. [127]
    T. Ishikida and P. Varaiya (1994): “Multi-armed bandit problem revis- ited”, Journal of Optimization Theory and Applications 83, 113–154.CrossRefGoogle Scholar
  128. [128]
    R.G. Jeroslow (1972): “An algorithm for discrete dynamic programming with interest rates near zero”, Management Science Research Report no. 300, Carnegie-Mellon University, Pittsburgh.Google Scholar
  129. [129]
    W.S. Jewell (1963a): “Markov renewal programming. I: Formulation, finite return models”, Operations Research 11, 938–948.CrossRefGoogle Scholar
  130. [130]
    W.S. Jewell (1963b): “Markov renewal programming. II: Infinite return models, example”, Operations Research 11, 949–971.CrossRefGoogle Scholar
  131. [131]
    L.C.M. Kallenberg (1981a): “Finite horizon dynamic programming and linear programming”, Methods of Operations Research 43, 105–112.Google Scholar
  132. [132]
    L.C.M. Kallenberg (1981b): “Unconstrained and constrained dynamic programming over a finite horizon”, Report, University of Leiden, The Netherlands.Google Scholar
  133. [133]
    L.C.M. Kallenberg (1981c): “Linear programming to compute a bias-optimal policy”, in: B. Fleischmann et al. (eds.) “Operations Research Proceedings”, 433–440.Google Scholar
  134. [134]
    L.C.M. Kallenberg (1983): “Linear programming and finite Markovian control problems”, Mathematical Centre Tract 148, Mathematical Centre, Amsterdam.Google Scholar
  135. [135]
    L.C.M. Kallenberg (1986): “Note on M.N.Katehakis and Y.-R.Chen’s computation of the Gittins index”, Mathematics of Operations Research 11, 184–186.CrossRefGoogle Scholar
  136. [136]
    L.C.M. Kallenberg (1992): “Separable Markovian decision problem: the linear programming method in the multichain case”, OR Spektrum 14, 43–52.CrossRefGoogle Scholar
  137. [137]
    L.C.M. Kallenberg (1999): “Combinatorial problems in MDPs”, Report, University of Leiden, The Netherlands (to appear in the Proceedings of the Changsha International Workshop on Markov Processes & Controlled Markov Chains).Google Scholar
  138. [138]
    P.C. Kao (1973): “Optimal replacement rules when the changes of states are semi-Markovian”, Operations Research 21, 1231–1249.CrossRefGoogle Scholar
  139. [139]
    M.N. Katehakis and C. Derman (1984): “Optimal repair allocation in a series system”, Mathematics of Operations Research 9, 615–623.CrossRefGoogle Scholar
  140. [140]
    M.N. Katehakis and C. Derman (1989): “On the maintenance of systems composed of highly reliable components”, Management Science 35, 551–560.CrossRefGoogle Scholar
  141. [141]
    M.N. Katehakis and A.F. Veinott, Jr. (1987): “The multi-armed bandit problem: decomposition and computation”, Mathematics of Operations Research 12, 262–268.CrossRefGoogle Scholar
  142. [142]
    H. Kawai (1987): “A variance minimization problem for a Markov decision process”, European Journal of Operational Research 31, 140–145.CrossRefGoogle Scholar
  143. [143]
    H. Kawai and N. Katoh (1987): “Variance constrained Markov decision process”, Journal of the Operations Research Society of Japan 30, 88–100.Google Scholar
  144. [144]
    J.G. Kemeny and J.L. Snell (1960): “Finite Markov chains”, Van Nostrand, Princeton.Google Scholar
  145. [145]
    M. Klein (1962): “Inspection-maintenance-replacement schedules under Markovian deterioration”, Management Science 9, 25–32.CrossRefGoogle Scholar
  146. [146]
    P. Kolesar (1966): “Minimum-cost replacement under Markovian deterioration”, Management Science 12, 694–706.CrossRefGoogle Scholar
  147. [147]
    M. Kurano (1983): “Adaptive policies in Markov decision processes with uncertain transition matrices”, Journal of Information and Optimization Sciences 4, 21–40.Google Scholar
  148. [148]
    H. Kushner (1971): “Introduction to stochastic control”, Holt, Rineholt and Winston, New York.Google Scholar
  149. [149]
    H. Kushner and A.J. Keinmann (1971): “Accelerated procedures for the solution of discrete Markov control problems”, IEEE Transactions on Automatic Control 16, 147–152.CrossRefGoogle Scholar
  150. [150]
    E. Lanery (1967): “Etude asymptotique des systèmes Markovien à commande”, Revue d’Informatique et Recherche Operationelle 1, 3–56.Google Scholar
  151. [151]
    J.B. Lasserre (1994a): “A new policy iteration scheme for Markov decision processes using Schweitzer’s formula”, Journal of Applied Probability 31, 268–273.CrossRefGoogle Scholar
  152. [152]
    J.B. Lasserre (1994b): “Detecting optimal and non-optimal actions in average-cost Markov decision processes”, Journal of Applied Probability 31, 979–990.CrossRefGoogle Scholar
  153. [153]
    W. Lin and P.R. Kumar (1984): “Optimal control of a queueing system with two heterogeneous servers”, IEEE Transactions on Automatic Control AC-29, 696–705.Google Scholar
  154. [154]
    S.A. Lippman (1969): “Criterion equivalence in discrete dynamic programming”, Operations Research 17, 920–923.CrossRefGoogle Scholar
  155. [155]
    J.Y. Liu and K. Liu (1994): “An algorithm on the Gittins index”, Systems Science and Mathematical Science 7, 106–114.Google Scholar
  156. [156]
    Q.-S. Liu and K. Ohno (1992): “Multiobjective undiscounted Markov renewal program and its application to a tool replacement problem in an FMS”, Information and Decision Techniques 18, 67–77.Google Scholar
  157. [157]
    Q.-S. Liu, K. Ohno and H. Nakayama (1992): “Multi-objective discounted Markov processes with expectation and variance criteria”, International Journal of System Science 23, 903–914.CrossRefGoogle Scholar
  158. [158]
    J.A. Loeve (1995): “Markov decision chains with partial information”, PhD dissertation, University of Leiden, The Netherlands.Google Scholar
  159. [159]
    W.S. Lovejoy (1987): “Some monotonicity results for partially observed Markov processes”, Operations Research 35, 736–743.CrossRefGoogle Scholar
  160. [160]
    W.S. Lovejoy (1991a): “Computationally feasible bounds for partially observed Markov decision processes”, Operations Research 39, 162–175.CrossRefGoogle Scholar
  161. [161]
    W.S. Lovejoy (1991b) “A survey of algorithmic methods for partially observed Markov decision processes”, Annals of Op. Research 28, 47–66.CrossRefGoogle Scholar
  162. [162]
    J. Macqueen (1966): “A modified programming method for Markovian decision problems”, Journal of Mathematical Analysis and Applications 14, 38–43.CrossRefGoogle Scholar
  163. [163]
    J. Macqueen (1967): “A test for suboptimal actions in Markov decision problems”, Operations Research 15, 559–561.CrossRefGoogle Scholar
  164. [164]
    A.S. Manne (1960): “Linear programming and sequential decisions”, Management Science, 259–267.Google Scholar
  165. [165]
    U. Meister and U. Holzbaur (1986): “A polynomial time bound for Howard’s policy improvement algorithm”, OR Spektrum 8, 37–40.CrossRefGoogle Scholar
  166. [166]
    B.L. Miller and A.F. Veinott Jr. (1969): “Discrete dynamic programming with a small interest rate”, Annals of Mathematical Statistics 40, 366–370.CrossRefGoogle Scholar
  167. [167]
    H. Mine and S. Osaki (1970): “Markov decision processes”, American Elsevier, New York.Google Scholar
  168. [168]
    G.E. Monahan (1982): “A survey of partially observable Markov decision processes: theory, models and algorithms”, Management Science 28, 1–16.CrossRefGoogle Scholar
  169. [169]
    T. Morton (1971): “Undiscounted Markov renewal programming via mod- ified successive approximations”, Operations Research 19, 1081–1089.CrossRefGoogle Scholar
  170. [170]
    J.L. Nazareth and R.B. Kulkarni (1986): “Linear programming formulations of Markov decision processes”, OR Letters 5, 13–16.Google Scholar
  171. [171]
    M.K. Ng (1999): “A note on policy iteration algorithms for discounted Markov decision problems”, OR Letters 25, 195–197.Google Scholar
  172. [172]
    A. Odoni (1969): “On finding the maximal gain for Markov decision processes”, Operations Research 17, 857–860.CrossRefGoogle Scholar
  173. [173]
    S. Oezekici (1988): “Optimal periodic replacement of multicomponent reliability systems”, Operations Research 36, 542–552.CrossRefGoogle Scholar
  174. [174]
    K. Ohno (1981): “A unified approach to algorithms with a suboptimality test in discounted semi-Markov decision processes”, Journal of the Operations Research Society of Japan 24, 296–323.Google Scholar
  175. [175]
    S. Osaki and H. Mine (1968): “Linear programming algorithms for semi-Markovian decision processes”, Journal of Mathematical Analysis and Applications 22, 356–381.CrossRefGoogle Scholar
  176. [176]
    T. Parthasarathy, S.H. Tijs and O.J. Vrieze (1984), “Stochastic games with state independent transitions and reparable rewards in: G. Hammer and D. Pallaschke (eds.), Selected Topics in Operations Research and Mathematical Economics.Google Scholar
  177. [177]
    L.K. Platzman (1977): “Improved conditions for convergence in undiscounted Markov renewal programming”, Op. Research 25, 529–533.CrossRefGoogle Scholar
  178. [178]
    M.A. Pollatschek and B. Avi-Itzhak (1969): “Algorithms for stochastic games with geometric interpretation”, Management Science 15, 399–415.CrossRefGoogle Scholar
  179. [179]
    E.L. Porteus (1971): “Some bounds for discounted sequential decision processes”, Management Science 18, 7–11.CrossRefGoogle Scholar
  180. [180]
    E.L. Porteus (1975): “Bounds and transformations for discounted finite Markov decision chains”, Operations Research 23, 761–784.CrossRefGoogle Scholar
  181. [181]
    E.L. Porteus (1980a): “Improved iterative computation of the expected return in Markov and semi-Markov chains”, Zeitschrift für Operations Research 24, 155–170.Google Scholar
  182. [182]
    E.L. Porteus (1980b): “Overview of iterative methods for discounted finite Markov and semi-Markov chains”, in: R. Hartley, L.C. Thomas and D.J. White (eds.), “Recent development in Markov decision processes”, Academic Press, New York, 1–20.Google Scholar
  183. [183]
    E.L. Porteus (1981): “Computing the discounted return in Markov and semi- Markov chains”, Naval Research Logistics Quarterly 28, 567–577.CrossRefGoogle Scholar
  184. [184]
    E. L. Porteus and J.C. Totten (1978): “Accelerated computation of the expected discounted return in a Markov chain”, Operations Research 26, 350–358.CrossRefGoogle Scholar
  185. [185]
    M.L. Puterman (1981): “Computational methods for Markov decision methods”, Proceedings of 1981 Joint Automatic Control Conference.Google Scholar
  186. [186]
    M.L. Puterman (1994): “Markov decision processes”, Wiley, New York.CrossRefGoogle Scholar
  187. [187]
    M.L. Puterman and S.L. Brumelle (1979): “On the convergence of policy iteration in stationary dynamic programming”, Mathematics of Operations Research 4, 60–69.CrossRefGoogle Scholar
  188. [188]
    M.L. Puterman and M.C. Shin (1978): “Modified policy iteration algorithms for discounted Markov decision chains”, Management Science 24, 1127–1137.CrossRefGoogle Scholar
  189. [189]
    M.L. Puterman and M.C. Shin (1982): “Action elimination procedures for modified policy iteration algorithms” Operations Research 30, 301–318.CrossRefGoogle Scholar
  190. [190]
    D. Reetz (1973): “Solution of a Markovian decision problem by successive overrelaxation”, Zeitschrift für Operations Research 17, 29–32.Google Scholar
  191. [191]
    D. Reetz (1976): “A decision exclusion algorithm for a class of Markovian decision processes”, Zeitschrift für Operations Research 20, 125–131.Google Scholar
  192. [192]
    U. Rieder (1991): “Structural results for partially observed control problems”, Zeitschrift für Operations Research 35, 473–490.Google Scholar
  193. [193]
    R. Righter (1994): “Scheduling”, in: M. Shaked and J.G. Shantikumar (eds.), “Stochastic orders and their applications”, Academic Press, 381–432.Google Scholar
  194. [194]
    M. Roosta (1982): “Routing through a network with maximum reliability”, Journal of Mathematical Analysis and Applications 88, 341–347.CrossRefGoogle Scholar
  195. [195]
    K.W. Ross (1989): “Randomized and past-dependent policies for Markov decision processes with multiple constraints”, Operations Research 37, 474–477.CrossRefGoogle Scholar
  196. [196]
    K.W. Ross and R. Varadarajan (1991): “Multichain Markov decision processes with a sample path constraint: a decomposition approach”, Mathematics of Operations Research 16, 195–207.CrossRefGoogle Scholar
  197. [197]
    S.M. Ross (1969): “A problem in optimal search and stop”, Operations Research 17, 984–992.CrossRefGoogle Scholar
  198. [198]
    S.M. Ross (1970): “Applied probability models with optimization applications”, Holden-Day, San Francisco.Google Scholar
  199. [199]
    S.M. Ross (1974): “Dynamic programming and gambling models”, Advances in Applied Probability 6, 593–606.CrossRefGoogle Scholar
  200. [200]
    S.M. Ross (1983): “Introduction to stochastic dynamic programming”, Academic Press, New York.Google Scholar
  201. [201]
    U.G. Rothblum (1979): “Iterated successive approximation for sequential decision processes”, in J.W.B. van Overhagen and H.C. Tijms (eds.), “Stochastic control and optimization”, Free University, Amsterdam, 30–32.Google Scholar
  202. [202]
    H. Scarf (1960): “The optimality of (s, S) polices in the dynamic inventory problem”, Chapter 13 in: K.J. Arrow, S. Karlin and P. Suppes (eds.), “Mathematical methods in the social sciences”, Stanford University Press, Stanford.Google Scholar
  203. [203]
    H. Schellhaas (1974): “Zur extrapolation in Markorffschen Entscheidungsmodellen mit Diskontierung”, Zeitschrift für Operations Research 18, 91–104.Google Scholar
  204. [204]
    N. Schmitz (1985): “How good is Howard’s policy improvement algorithm?”, Zeitschrift fur Operations Research 29, 315–316.Google Scholar
  205. [205]
    L. Schrage (1968): “A proof of the optimality of the shortest remaining processing time discipline”, Operations Research 16, 687–690.CrossRefGoogle Scholar
  206. [206]
    P.J. Schweitzer (1965): “Perturbation theory and Markovian decision processes”, Ph.D. dissertation, M.I.T., Op. Research Center Report 15.Google Scholar
  207. [207]
    P.J. Schweitzer (1968): “Perturbation theory and finite Markov chains” Journal of Applied Probability 5, 401–413.CrossRefGoogle Scholar
  208. [208]
    P.J. Schweitzer (1971a): “Multiple policy improvements in undiscounted Markov renewal programming”, Operations Research 19. 784–793.CrossRefGoogle Scholar
  209. [209]
    P.J. Schweitzer (1971b): “Iterative solution of the functional equations of undiscounted Markov renewal programming”, Journal of Mathematical Analysis and Applications 34, 495–501.CrossRefGoogle Scholar
  210. [210]
    P.J. Schweitzer (1984): “A value-iteration scheme for undiscounted multichain Markov renewal programs”, ZOR—Zeitschrift für Operations Research 28, 143–152.Google Scholar
  211. [211]
    P.J. Schweitzer (1985): “The variational calculus and approximations in policy space for Markov decision processes”, Journal of Mathematical Analysis and Applications 110, 568–582.CrossRefGoogle Scholar
  212. [212]
    P.J. Schweitzer (1987): “A Brouwer fixed-point mapping approach to communicating Markov decision processes”, Journal of Mathematical Analysis and Applications 123, 117–130.CrossRefGoogle Scholar
  213. [213]
    P.J. Schweitzer (1991): “Block-scaling of value-iteration for discounted Markov renewal programming”, Annals of Op. Research 29, 603–630.CrossRefGoogle Scholar
  214. [214]
    P.J. Schweitzer and A. Federgruen (1977): “The asymptotic behavior of value iteration in Markov decision problems”, Mathematics of Operations Research 2, 360–381.CrossRefGoogle Scholar
  215. [215]
    P.J. Schweitzer and A. Federgruen (1978a): “Foolproof convergence in multichain policy iteration”, Journal of Mathematical Analysis and Applications 64, 360–368.CrossRefGoogle Scholar
  216. [216]
    P.J. Schweitzer and A. Federgruen (1978b): “The functional equations of undiscounted Markov renewal programming”, Mathematics of Operations Research 3, 308–321.CrossRefGoogle Scholar
  217. [217]
    P.J. Schweitzer and A. Federgruen (1979): “Geometric convergence of value iteration in multichain Markov decision problems”, Advances of Applied Probability 11, 188–217.CrossRefGoogle Scholar
  218. [218]
    L.I. Sennott (1999): “Stochastic dynamic programming and the control of queueing systems”, Wiley, New York.Google Scholar
  219. [219]
    E.L. Sernik and S.I. Markus (1991): “On the computation of the optimal cost function for discrete time Markov models with partial observations”, Annals of Operations Research 29, 471–512.CrossRefGoogle Scholar
  220. [220]
    J.F. Shapiro (1975): “Brouwer’s fixed point theorem and finite state space Markovian decision theory”, Journal of Mathematical Analysis and Applications 49, 710–712.CrossRefGoogle Scholar
  221. [221]
    L.S. Shapley (1953): “Stochastic games”, Proceedings of the National Academy of Sciences, 1095–1100.Google Scholar
  222. [222]
    Y.S. Sherif and M.L. Smith (1981): “Optimal maintenance policies for systems subject to failure—A review”, Naval Research Logistics Quarterly 28, 47–74.CrossRefGoogle Scholar
  223. [223]
    K. Sladky (1974): “On the set of optimal controls for Markov chains with rewards”, Kybernatika 10, 350–367.Google Scholar
  224. [224]
    R.D. Smallwood (1966): “Optimum policy regions for Markov processes with discounting”, Operations Research 14, 658–669.CrossRefGoogle Scholar
  225. [225]
    R.D. Smallwood and E.Sondik (1973): “The optimal control of partially observable Markov processes over a finite horizon”, Operations Research 21, 1071–1088.CrossRefGoogle Scholar
  226. [226]
    D.R. Smith (1978): “Optimal repairman allocation—asymptotic results”, Management Science 24, 665–674.CrossRefGoogle Scholar
  227. [227]
    M.J. Sobel (1981), “Myopic solutions of Markov decision processes and stochastic games”, Operations Research 29, 995–1009.CrossRefGoogle Scholar
  228. [228]
    M.J. Sobel (1985): “Maximal mean/standard deviation ratio in an undiscounted MDP”, OR Letters 4, 157–159.Google Scholar
  229. [229]
    M.J. Sobel (1994): “Mean-variance trade-offs in an undiscounted MDP”, Operations Research 42, 175–183.CrossRefGoogle Scholar
  230. [230]
    E. Sondik (1978): “The optimal control of partially observable Markov processes over the infinite horizon: discounted costs”, Operations Research 26, 282–304.CrossRefGoogle Scholar
  231. [231]
    I.M. Sonin (1999): “The elimination algorithm for the problem of optimal stopping”, Mathematical Methods of Operations Research 49, 111–124.Google Scholar
  232. [232]
    D. Spreen (1981): “A further anti-cycling rule in multi-chain policy iter- ation for undiscounted Markov renewal programs”, Zeitschrift für Operations Research 25, 225–234.Google Scholar
  233. [233]
    J. Stein (1988): “On efficiency of linear programming applied to dis- counted Markovian decision problems”, OR Spektrum 10, 153–160.CrossRefGoogle Scholar
  234. [234]
    S.S. Stidham, Jr. (1985): “Optimal control of admission to a queueing system”, IEEE Transactions on Automatic Control AC-30, 705–713.Google Scholar
  235. [235]
    S.S. Stidham, Jr. and R.R. Weber (1993): “A survey of Markov decision models for control of networks of queues”, Queueing Systems 13, 291–314.CrossRefGoogle Scholar
  236. [236]
    J. Stoer and R. Bulirsch (1980): “Introduction to numerical analysis”, Springer-Verlag, New York.Google Scholar
  237. [237]
    R. Strauch and A.F. Veinott, Jr. (1966): “A property of sequential control processes”, Report, Rand McNally, Chicago.Google Scholar
  238. [238]
    M. Sun (1993): “Revised simplex algorithm for finite Markov decision processes”, Journal of Optimization Theory and Applications 79, 405–413.CrossRefGoogle Scholar
  239. [239]
    L.C. Thomas (1981): “Second order bounds for Markov decision processes”, Journal of Mathematical Analysis and Applications 80, 294–297.CrossRefGoogle Scholar
  240. [240]
    L.C. Thomas (1983): “Constrained Markov decision processes as multiobjective problems”, in: “Multi-objective decision making”, Academic Press, 77–94.Google Scholar
  241. [241]
    H.C. Tijms (1986): “Stochastic modelling and analysis: a computational approach”, Wiley, Chichester.Google Scholar
  242. [242]
    J.N. Tsitsiklis (1986): “A lemma on the multi-armed bandit problem”, IEEE Transactions on Automatic Control 31, 576–577.CrossRefGoogle Scholar
  243. [243]
    J.N. Tsitsiklis (1993): “A short proof of the Gittins index theorem”, Annals of Applied Probability 4, 194–199.CrossRefGoogle Scholar
  244. [244]
    F.A. Van der Duyn Schouten and S.G. Vanneste (1990): “Analysis and computation of (n, N)-strategies for maintenance of a two-component system”, European Journal of Operations Research 48, 260–274.CrossRefGoogle Scholar
  245. [245]
    J. Van der Wal (1980): “The method of value oriented successive approximations for the average reward Markov decision processes”, OR Spektrum 1, 233–242.CrossRefGoogle Scholar
  246. [246]
    J. Van der Wal (1981): “Stochastic dynamic programming”, Mathematical Centre Tract 139, Mathematical Centre, Amsterdam.Google Scholar
  247. [247]
    K.M. Van Hee (1978): “Markov strategies in dynamic programming”, Mathematics of Operations Research 3, 191–201.Google Scholar
  248. [248]
    K.M. Van Hee, A. Hordijk and J. Van der Wal (1977): “Successive approximations for convergent dynamic programming”, in: H.C. Tijms and J. Wessels (eds.), “Markov decision theory”, Mathematical Centre Tract no. 93, Mathematical Centre, Amsterdam, 183–211.Google Scholar
  249. [249]
    J.A.E.E. Van Nunen (1976a): “A set of successive approximation method for discounted Markovian decision problems”, Zeitschrift für Operations Research 20, 203–208.Google Scholar
  250. [250]
    J.A.E.E. Van Nunen (1976b): “Contracting Markov decision processes”, Mathematical Centre Tract 71, Mathematical Centre, Amsterdam.Google Scholar
  251. [251]
    J.A.E.E. Van Nunen (1976c): “Improved successive approximation methods for discounted Markovian decision processes”, in: A. Prekopa (ed.), “Progress in Operations Research”, North Holland, Amsterdam, 667–682.Google Scholar
  252. [252]
    J.A.E.E. Van Nunen and J. Wessels (1976): “A principle for generating optimization procedures for discounted Markov decision processes”, Colloquia Mathematica Societatis Bolyai Janos, Vol. 12, North Holland, Amsterdam, 683–695.Google Scholar
  253. [253]
    J.A.E.E. Van Nunen and J. Wessels (1977): “The generation of successive approximations for Markov decision processes using stopping times”, in: “Markov decision theory”, H. Tijms and J. Wessels (eds.), Mathematical Centre Tract 93, Mathematical Centre, Amsterdam, 25–37.Google Scholar
  254. [254]
    P.P. Varaiya, J.C. Walrand and C. Buyukkoc (1985): “Extensions of the multi-armed bandit problem: the discounted case”, IEEE Transactions on Automatic Control 30, 426–439.CrossRefGoogle Scholar
  255. [255]
    A.F. Veinott, Jr. (1966a): “On the optimality of (s, S) inventory policies: new condition and a new proof”, SIAM Journal on Applied Mathematics 14, 1067–1083.CrossRefGoogle Scholar
  256. [256]
    A.F. Veinott, Jr. (1966b): “On finding optimal policies in discrete dynamic programming with no discounting”, Annals of Math. Stats. 37, 1284–1294.CrossRefGoogle Scholar
  257. [257]
    A.F. Veinott, Jr. (1969): “Discrete dynamic programming with sensitive discount optimality criteria”, Annals of Math. Stats. 40, 1635–1660.CrossRefGoogle Scholar
  258. [258]
    A.F. Veinott, Jr. (1974): “Markov decision chains”, in: G.B. Dantzig and B.C. Eaves (eds.), “Studies in Optimization”, Studies in Mathematics, Volume 10, The Mathematical Association of America, 124–159.Google Scholar
  259. [259]
    R.C. Vergin and M. Scribian (1977): “Maintenance scheduling for multicomponent equipment”, AIIE Transactions 9, 297–305.CrossRefGoogle Scholar
  260. [260]
    O.J. Vrieze, (1987): “Stochastic games with finite state and action spaces”, CWI Tract 33, Centre for Mathematics and Computer Science, Amsterdam.Google Scholar
  261. [261]
    K. Wakuta (1992): “Optimal stationary policies in the vector-valued Markov decision process”, Stochastic Processes and its Applications 42, 149–156.CrossRefGoogle Scholar
  262. [262]
    K. Wakuta (1995): “Vector-valued Markov decision processes and the systems of linear inequalities”, Stochastic Processes and its Applications 56, 159–169.CrossRefGoogle Scholar
  263. [263]
    K. Wakuta (1996): “A new class of policies in vector-valued Markov deci- sion processes”, Journal of Mathematical Analysis and Applications 202, 623–628.CrossRefGoogle Scholar
  264. [264]
    K. Wakuta (1999): “A note on the structure of value spaces in vector-valued Markov decision processes”, Mathematical Methods of Operations Research 49, 77–86.Google Scholar
  265. [265]
    J. Walrand (1988): “An introduction to queueing networks”, Prentice-Hall, Englewood Cliffs, New Jersey.Google Scholar
  266. [266]
    R.R. Weber (1982): “Scheduling jobs with stochastic processing requirements on parallel machines to minimize makespan or flowtime.Google Scholar
  267. [267]
    R.R. Weber (1992): “On the Gittins index for multi-armed bandits”, Annals of Applied Probability 2, 1024–1033.CrossRefGoogle Scholar
  268. [268]
    R.R. Weber and S.S. Stidham, Jr. (1987): “Optimal control of services rates in networks of queues”, Advances in Applied Probability 19, 202–218.CrossRefGoogle Scholar
  269. [269]
    G. Weiss (1982): “Multiserver stochastic scheduling”, in: M.A.H. Dempster, J.K. Lenstra and A.H.G. Rinnooy Kan (eds.), “Deterministic and stochastic scheduling”, Reidel, Dordrecht, Holland, 157–179.CrossRefGoogle Scholar
  270. [270]
    G. Weiss (1988): “Branching bandit processes”, Probability in the Engineering and Information Sciences 2, 269–278.CrossRefGoogle Scholar
  271. [271]
    J. Wessels and J.A.E.E. Van Nunen (1975): “Discounted semi-Markov decision processes: linear programming and policy iteration”, Statistical Neerlandica 29, 1–7.CrossRefGoogle Scholar
  272. [272]
    J. Wessels (1977): “Stopping times on Markov programming”, in: Transactions of the 7th Prague Conference on Information Theory, Statistical Decision Functions and Random Processes, Academia, Prague, pp. 575–585.Google Scholar
  273. [273]
    C.C. White, III (1976): “Procedures for the solution of a finite-horizon, partially observed, semi-Markov optimization problem”, Operations Research 24, 348–358.CrossRefGoogle Scholar
  274. [274]
    C.C. White, III (1991): “A survey of solution techniques for the partially observed Markov decision process”, Annals of Operations Research 33, 215–230.CrossRefGoogle Scholar
  275. [275]
    C.C. White, III and W.T. Scherer (1989): “Solution procedures for par- tially observed Markov decision processes”, Op. Research 37, 791–797.CrossRefGoogle Scholar
  276. [276]
    C.C. White, III and W.T. Scherer (1994): “Finite-memory suboptimal design for partially observed Markov decision processes”, Op. Research 42, 439–455.CrossRefGoogle Scholar
  277. [277]
    D.J. White (1963): “Dynamic programming, Markov chains, and the method of successive approximations”, Journal of Mathematical Analysis and Applications 6, 373–376.CrossRefGoogle Scholar
  278. [278]
    D.J. White (1978): “Elimination of non-optimal actions in Markov decision processes”, in: M.L. Puterman (ed.) Dynamic programming and its applications, Academic Press, New York, 131–160.Google Scholar
  279. [279]
    D.J. White (1982): “Multi-objective infinite-horizon discounted Markov decision processes”, Journal of Mathematical Analysis and Applications 89, 639–647.CrossRefGoogle Scholar
  280. [280]
    D.J. White (1985): “Real applications of Markov decision theory”, Interfaces 15:6, 73–83.CrossRefGoogle Scholar
  281. [281]
    D.J. White (1988): “Further real applications of Markov decision theory”, Interfaces 18:5, 55–61CrossRefGoogle Scholar
  282. [282]
    D.J. White (1988): “Mean, variance and probabilistic criteria in finite Markov decision processes: a review”, Journal of Optimization Theory and Applications 56, 1–30.CrossRefGoogle Scholar
  283. [283]
    D.J. White (1992): “Computational approaches to variance-penalized Markov decision processes”, OR Spektrum 14, 79–83.CrossRefGoogle Scholar
  284. [284]
    D.J. White (1993): “A survey of applications of Markov decision processes”, Journal of the Operational Research Society 44, 1073–1096.Google Scholar
  285. [285]
    D.J. White (1993): “Markov decision processes”, Wiley, Chichester.Google Scholar
  286. [286]
    D.J. White (1994): “A mathematical programming approach to a problem in variance penalised Markov decision processes”, OR Spektrum 15, 225–230.CrossRefGoogle Scholar
  287. [287]
    D.J. White (1995): “A superharmonic approach to solving infinite horizon partially observable Markov decision problems”, Mathematical Methods of Operations Research 41, 71–88.CrossRefGoogle Scholar
  288. [288]
    P. Whittle (1980): “Multi-armed bandits and the Gittins index”, Journal of the Royal Statistical Society, Series B 42, 143–149.Google Scholar
  289. [289]
    P. Whittle (1982): “Optimization over time; dynamic programming and stochastic control”, Volume I, Wiley, New York.Google Scholar
  290. [290]
    P. Whittle (1982): “Optimization over time; dynamic programming and stochastic control”, Volume II, Wiley, New York.Google Scholar
  291. [291]
    M. Yasuda (1988): “The optimal value of Markov stopping problems with one-step look ahead policy”, Journal of Applied Probability 25, 544–552.CrossRefGoogle Scholar
  292. [292]
    Y.-S. Zheng and A. Federgruen (1991): “Finding optimal (s, S)-policies is about as ssimple as evaluating a single policy”, Op. Research 39, 654–665.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2003

Authors and Affiliations

  • Lodewijk Kallenberg
    • 1
  1. 1.Mathematical InstituteUniversity of LeidenLeidenThe Netherlands

Personalised recommendations