Finite State and Action MDPS

Kallenberg, Lodewijk

doi:10.1007/978-1-4615-0805-2_2

Lodewijk Kallenberg⁴

Part of the book series: International Series in Operations Research & Management Science ((ISOR,volume 40))

1648 Accesses
8 Citations
3 Altmetric

Abstract

In this chapter we study Markov decision processes (MDPs) with finite state and action spaces. This is the classical theory developed since the end of the fifties. We consider finite and infinite horizon models. For the finite horizon model the utility function of the total expected reward is commonly used. For the infinite horizon the utility function is less obvious. We consider several criteria: total discounted expected reward, average expected reward and more sensitive optimality criteria including the Blackwell optimality criterion. We end with a variety of other subjects.

The emphasis is on computational methods to compute optimal policies for these criteria. These methods are based on concepts like value iteration, policy iteration and linear programming. This survey covers about three hundred papers. Although the subject of finite state and action MDPs is classical, there are still open problems. We also mention some of them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S.C. Albright, (1979): “Structural results for partially observable Markov decision processes”, Operations Research 27, 1041–1053.
Article Google Scholar
E. Altman, (1999): “Constrained Markov decision processes”, Chapman & Hall/CRC, Boca Raton, Florida.
Google Scholar
E. Altman, A. Hordijk and L.C.M. Kallenberg (1996): “On the value function in constrained control of Markov chains”, Mathematical Methods of Operations Research 44, 387–399.
Article Google Scholar
E. Altman and A. Shwartz (1991a): “Sensitivity of constrained Markov decision processes”, Annals of Operations Research 33, 1–22.
Article Google Scholar
E. Altman and A. Shwartz (1991b): “Adaptive control of constrained Markov chains”, IEEE-Transactions on Automatic Control 36, 454–462.
Article Google Scholar
E. Altman and A. Shwartz (1991c): “Adaptive control of constrained Markov decision chains: criteria and policies”, Annals of Operations Research 28, 101–134.
Article Google Scholar
E. Altman and A. Shwartz (1991): “Sensitivity of constrained Markov decision processes”, Annals of Operations Research 33, 1–22.
Article Google Scholar
E. Altman and F.M. Spieksma (1995): “The linear program approach in Markov decision processes”, Mathematical Methods of Operations Research 42, 169–188.
Article Google Scholar
J.S. Baras, D.J. Ma and A.M. Makowsky (1985): “K competing queues with linear costs and geometric service requirements: the µc-rule is always optimal” Systems Control Letters 6, 173–180.
Article Google Scholar
J. Bather (1973a): “Optimal decision procedures for finite Markov chains. Part II: Communicating systems”, Advances in Applied Probability 5, 521–540.
Article Google Scholar
J. Bather (1973b): “Optimal decision procedures for finite Markov chains. Part III: General convex systems”, Advances in Applied Probability 5, 541–553.
Article Google Scholar
M. Bayal-Gursoy and K.W. Ross (1992): “Variability-sensitivity Markov decision processes”, Mathematics of Operations Research 17, 558–571.
Article Google Scholar
R. Bellman (1957): “Dynamic programming”, Princeton University Press, Princeton.
Google Scholar
A. Ben-Israel and S.D. Flam (1990): “A bisection/successive approximation method for computing Gittins indices”, Zeitschrift für Operations Research 34, 411–422.
Google Scholar
D.P. Bertsekas (1976): “Dynamic programming and stochastic control”, Academic Press, New York.
Google Scholar
D.R Bertsekas (1976b): “On error bounds for successive approximation methods”, IEEE Transactions on Automatic Control 21, 394–396.
Article Google Scholar
D.R Bertsekas (1987): “Dynamic programming: deterministic and stochastic models”, Prentice-Hall, Englewood Cliff.
Google Scholar
D.R Bertsekas (1995): “Dynamic programming and optimal control I”, Athena Scientific, Belmont, Massachusetts.
Google Scholar
D.R Bertsekas (1995): “Dynamic programming and optimal control II”, Athena Scientific, Belmont, Massachusetts+.
Google Scholar
D.R. Bertsekas (1995c): “Generic rank-one corrections for value iteration in Markovian decision problems”, OR Letters 17, 111–119.
Google Scholar
D.R. Bertsekas (1998): “A new value iteration method for the average cost dynamic programming problem”, SIAM Journal on Control and Optimization 36, 742–759.
Article Google Scholar
D.R Bertsekas and S.E. Shreve (1978) “Stochastic Optimal Control”, Academic Press, New York.
Google Scholar
D.P. Bertsekas and J.N. Tsitsiklis (1991): “An analysis of stochastic shortest path problems”, Mathematics of Operations Research 16, 580–595.
Article Google Scholar
D. Bertsimas and J. Niño-Mora (1996): “Conservations laws, extended polymatroids and multi-armed bandit problems; a polyhedral approach to indexable systems”, Mathematics of Operations Research 21, 257–306.
Article Google Scholar
F.J. Beutler and K.W. Ross (1985): “Optimal policies for controlled Markov chains with a constraint”, Journal of Mathematical Analysis and Applications 112, 236–252.
Article Google Scholar
K.-J. Bierth (1987): “An expected average reward criterion”, Stochastic Processes and Applications 26, 133–140.
Article Google Scholar
D. Blackwell (1962): “Discrete dynamic programming”, Annals of Mathematical Statistics, 719–726.
Google Scholar
L. Breiman (1964): “Stopping-rule problems”, in: E.F. Beckenbach (ed.), Applied Combinatorial Mathematics”, Wiley, New York, 284–319.
Google Scholar
B.W. Brown (1965): “On the iterative method of dynamic programming on a finite space discrete time Markov process”, Annals of Mathematical Statistics 36, 1279–1285.
Article Google Scholar
J. Bruno, P. Downey and G.N. Frederickson (1981): “Sequencing tasks with exponential service times to minimize the expected flowtime or makespan”, Journal of the Association for Computing Machinery 28, 100–113.
Article Google Scholar
A.N. Burnetas, and M.N. Katehakis (1997): “Optimal adaptive policies for Markov decision processes”, Mathematics of Op. Research 22, 222–255.
Article Google Scholar
R. Cavazos-Cadena (1991): “Nonparametric estimation and adaptive control in a class of finite Markov decision chains”, Annals of Operations Research 28, 169–184.
Article Google Scholar
C.-S. Chang, A. Hordijk, R. Righter and G. Weiss (1994): “The stochastic optimality of SEPT in parallel machine scheduling”, Probability in the Engineering and Information Sciences 8, 179–188.
Article Google Scholar
M.C. Chen, Jr. (1973): “Optimal stopping in a discrete search problem”, Operations Research 21, 741–747.
Article Google Scholar
Y.-R. Chen and M.N. Katehakis (1986): “Linear programming for finite state bandit problems”, Mathematics of Operations Research 11, 180–183.
Article Google Scholar
Y.S. Chow and H. Robbins (1961): “A martingale system theorem and applications” in: J. Neyman (ed), “Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability”, Vol.1, University of Berkeley Press, Berkeley, 93–104.
Google Scholar
K.-J. Chung (1989): “A note on maximal mean/standard deviation ratio in an undiscounted MDP”, OR Letters 8, 201–204.
Google Scholar
K.-J. Chung (1992): “Remarks on maximal mean/standard deviation ratio in an undiscounted MDPs”, Opimization 26, 385–392.
Article Google Scholar
K.-J. Chung (1994): “Mean-variance trade-offs in an undiscounted MDP: the unichain case”, Operations Research 42, 184–188.
Article Google Scholar
G.B. Dantzig (1963): “Linear programming and extensions”, Princeton University Press, Princeton, New Jersey.
Google Scholar
J.S. De Cani (1964): “A dynamic programming algorithm for embedded Markov chains when the planning horizon is at infinity”, Management Science 10, 716–733.
Article Google Scholar
G.T. De Ghellinck (1960): “Les problèmes de décisions séquentielles”, Cahiers du Centre de Recherche Opérationelle, 161–179.
Google Scholar
G.T. De Ghellinck and G.D. Eppen (1967): “Linear programming solu- tions for separable Markovian decision problems”, Management Science 13, 371–394.
Article Google Scholar
R.S. Dembo and M. Haviv (1984): “Truncated policy iteration methods”, OR Letters 3, 243–246.
Google Scholar
E.V. Denardo (1967): “Contraction mappings in the theory underlying dynamic programming”, SIAM Review 9, 165–167.
Article Google Scholar
E.V. Denardo (1968): “Separable Markovian decision problems”, Management Science 14, 451–462.
Article Google Scholar
E.V. Denardo (1970): “Computing a bias-optimal policy in a discrete-time Markov decision problem”, Operations Research 18, 279–289.
Article Google Scholar
E.V. Denardo (1971): “Markov renewal programs with small interest rates”, Annals of Mathematical Statistics 42, 477–496.
Article Google Scholar
E.V. Denardo (1973): “A Markov decision problem”, in: T.C. Hu and S.M. Robinson (eds.), “Mathematical Programming”, Academic Press, 33–68.
Google Scholar
E.V. Denardo (1982): “Dynamic programming: models and applications”, Prentice-Hall, Englewood Cliff.
Google Scholar
E.V. Denardo and B.L. Fox (1968): “Multichain Markov renewal pro- grams”, SIAM Journal on Applied Mathematics 16, 468–487.
Article Google Scholar
E.V. Denardo and B.L. Miller (1968): “An optimality condition for discrete dynamic programming with no discounting”, Annals of Mathematical Statistics 39, 1220–1227.
Article Google Scholar
E.V. Denardo and U.G. Rothblum (1979a): “Optimal stopping, expo- nential utility and linear programming”, Mathematical Programming 16, 228–244.
Article Google Scholar
E.V. Denardo and U.G. Rothblum (1979b): “Overtaking optimality for Markov decision chains”, Mathematics of Operations Research 4, 144–152.
Article Google Scholar
F. D’Epenoux (1960): “Sur un problème de production et de stockage dans l’aléatoire”, Revue Française de Recherche Opérationelle, 3–16.
Google Scholar
C. Derman (1962): “On sequential decisions and Markov chains”, Management Science 9, 16–24.
Article Google Scholar
C. Derman (1963): “Optimal replacement rules when changes of states are Markovian”, in: R. Bellman (ed.), “Mathematical optimization techniques”, The Rand Corporation, R-396-PR, 201–212.
Google Scholar
C. Derman (1970): “Finite state Markovian decision processes”, Academic Press, New York.
Google Scholar
C. Derman and M. Klein (1965): “Some remarks on finite horizon Marko- vian decision models”, Operations Research 13, 272–278.
Article Google Scholar
C. Derman and J. Sacks (1960): “Replacement of periodically inspected equipment (an optimal stopping rule)”, Naval Research Logistics Quarterly 7, 597–607.
Article Google Scholar
C. Derman and R. Strauch (1966): “A note on memoryless rules for controlling sequential control problems”, Annals of Mathematical Statistics 37, 276–278.
Article Google Scholar
C. Derman and A.F. Veinott, Jr. (1972): “Constrained Markov decision chains”, Management Science 19, 389–390.
Article Google Scholar
H.M. Dietz and V. Nollau (1983): “Markov decision problems with countable state space”, Akademie-Verlag, Berlin.
Google Scholar
L. Dubins and L.J. Savage (1965): “How to gamble if you must”, McGraw-Hill, New York.
Google Scholar
S. Durinovics, H.M. Lee, M.N. Katehakis and J.A. Filar (1986): “Multiobjective Markov decision processes with average reward criterion”, Large Scale Systems 10, 215–226.
Google Scholar
E.B. Dynkin (1979): “Controlled Markov process”, Springer-Verlag, New York.
Book Google Scholar
J.H. Eaton and L.A. Zadeh (1962): “Optimal pursuit strategies in discrete state probabilistic systems”, Transactions ASME Series D, Journal of Basic Engineering 84, 23–29.
Article Google Scholar
A. Ephremides, P. Varaiya and J. Walrand (1980): “A simple dynamic routing problem”, IEEE Transactions on Automatic Control AC-25, 690–693.
Google Scholar
A. Federgruen (1984): “Markovian control problems: functional equations and algorithms”, Mathematical Centre Tract 97, Mathematical Centre, Amsterdam.
Google Scholar
A. Federgruen and P.J. Schweitzer (1978): “Discounted and undiscounted value iteration in Markov decision problems: a survey”, in: M.L. Puterman (ed), “Dynamic programming and its applications”, Academic Press, New York, 23–52.
Google Scholar
A. Federgruen and P.J. Schweitzer (1980): “A survey of asymptotic value-iteration for undiscounted Markovian decision processes”, in: R. Hartley, L.C. Thomas and D.J. White (eds.), “Recent development in Markov decision processes”, Academic Press, New York, 73–109.
Google Scholar
A. Federgruen and P.J. Schweitzer (1984a): “A fixed-point approach to undiscounted Markov renewal programs”, SIAM Journal on Algebraic Discrete Methods 5, 539–550.
Article Google Scholar
A. Federgruen and P.J. Schweitzer (1984b): “Successive approximation methods for solving nested functional equations in Markov decision problems”, Mathematics of Operations Research 9, 319–344.
Article Google Scholar
A. Federgruen, P.J. Schweitzer and H.C. Tijms (1978): “Contraction map- pings underlying undiscounted Markov decision problems”, Journal of Mathematical Analysis and Applications 65, 711–730.
Article Google Scholar
A. Federgruen and D. Spreen (1980): “A new specification of the multichain policy iteration algorithm in undiscounted Markov renewal programs”, Management Science 26, 1211–1217.
Article Google Scholar
A. Federgruen and P. Zipkin (1984): “An efficient algorithm for computing optimal (s, S) policies”, Operations Research 34, 1268–1285.
Article Google Scholar
E.A. Feinberg and A. Shwartz (1994): “Markov decision models with weighted discounted criteria”, Mathematics of Operations Research 19, 152–168.
Article Google Scholar
J.A. Filar, L.C.M. Kallenberg and H.M. Lee (1989): “Variance-penalized Markov decision processes”, Mathematics of Operations Research 14, 147–161.
Article Google Scholar
J.A. Filar and O. J. Vrieze (1997): “Competitive Markov decision processes”, Springer-Verlag, New York.
Google Scholar
B.L. Fox (1968): “(g, w)-optima in Markov renewal programs”, Management Science 15, 210–212.
Article Google Scholar
E. Frostig (1993): “Optimal policies for machine repairmen problems”, Journal of Applied Probability 30, 703–715.
Article Google Scholar
N. Furakawa (1980): “Characterization of optimal policies in vector-valued Markovian decision processes”, Mathematics of Operations Research 5, 271–279.
Article Google Scholar
S. Gal (1984): “An O(N3) algorithm for optimal replacement problems”, SIAM Journal on Control and Optimization 22, 902–910.
Article Google Scholar
R. Garbe and K.D. Glazebrook (1998): “On a new approach to the analysis of complex multi-armed bandit problems”, Mathematical Methods of Operations Research 48, 419–442.
Article Google Scholar
J.C. Gittins (1979): “Bandit processes and dynamic allocation indices”, Journal of the Royal Statistic Society Series B 14, 148–177.
Google Scholar
J.C. Gittins and D.M. Jones (1974): “A dynamic allocation index for the sequential design of experiments”, in J. Gani (ed.) “Progress in Statistics”, North Holland, Amsterdam, 241–266.
Google Scholar
K.D. Glazebrook and R. Garbe (1996): “Reflections on a new approach to Gittins indexation”, Journal of the Operational Research Society 47, 1301–1309.
Google Scholar
K.D. Glazebrook and S. Greatrix (1995): “On transforming an index for generalized bandit problems”, J. of App. Prob. 32, 168–182.
Article Google Scholar
K.D. Glazebrook and R.W. Owen (1991): “New results for generalized bandit problems”, International Journal of System Sciences 22, 479–494.
Article Google Scholar
M.K. Ghosh (1990): “Markov decision processes with multiple costs”, OR Letters 9, 257–260.
Google Scholar
R. Grinold (1973): “Elimination of suboptimal actions in Markov decision problems”, Operations Research 21, 848–851.
Article Google Scholar
R. Hartley, A.C. Lavercombe and L.C. Thomas (1986): “Computational comparison of policy iteration algorithms for discounted Markov decision processes”, Computers and Operations Research 13, 411–420.
Article Google Scholar
N.A.J. Hastings (1968): “Some notes on dynamic programming and replacement”, Operational Research Quarterly 19, 453–464.
Article Google Scholar
N.A.J. Hastings (1969): “Optimization of discounted Markov decision problems”, Operations Research Quarterly 20, 499–500.
Article Google Scholar
N.A.J. Hastings (1971): “Bounds on the gain of a Markov decision process”, Operations Research 19, 240–243.
Article Google Scholar
N.A.J. Hastings (1976): “A test for nonoptimal actions in undiscounted finite Markov decision chains”, Management Science 23, 87–92.
Article Google Scholar
N.A.J. Hastings and J.M.C.Mello (1973): “Tests for nonoptimal actions in discounted Markov decision problems”, Management Science 19, 1019–1022.
Article Google Scholar
N.A.J. Hastings and D.Sadjani (1979): “Markov programming with policy constraints”, European Journal of Operations Research 3, 253–255.
Article Google Scholar
N.A.J. Hastings and J.A.E.E. Van Nunen (1977): “The action elimination algorithm for Markov decision processes”, in H.C. Tijms and J. Wessels (eds), “Markov decision theory”, Mathematical Centre Tract 100, 161–170, Mathematical Centre, Amsterdam.
Google Scholar
M. Haviv and M.L. Puterman (1991): “An improved algorithm for solving communicating average reward Markov decision processes”, Annals of Operations Research 28, 229–242.
Article Google Scholar
M.I. Henig (1983): “Vector-valued dynamic programming”, SIAM Journal on Control and Optimization 21, 490–499.
Article Google Scholar
O. Hernández-Lerma (1987): “Adaptive Markov control processes”, Springer-Verlag, New York.
Google Scholar
O. Hernández-Lerma and J. B. Lasserre (1996): “Discrete-time Markov control processes: Basic optimality criteria”, Springer-Verlag, New York.
Book Google Scholar
O. Hernández-Lerma and J. B. Lasserre (1999): “Further topics on discrete-time Markov control processes”, Springer-Verlag, New York.
Book Google Scholar
M. Herzberg and U. Yechiali (1994): “Accelerating procedures of the value iteration algorithm for discounted Markov decision processes, based on a one-step look-ahead analysis”, Operations Research 42, 940–946.
Article Google Scholar
D.P. Heyman and M. J. Sobel (1984): “Stochastic models in Operations Research, Volume II, MacGraw-Hill, New York.
Google Scholar
K. Hinderer (1970): “Foundations of non-stationary dynamic programming with discrete time parameter”, Springer-Verlag, New York.
Book Google Scholar
U.D. Holzbaur (1986a): “Entscheidungsmodelle über angeordneten Körpern”, Optimization 17, 515–524.
Article Google Scholar
U.D. Holzbaur (1986b): “Sensitivitätsanalysen in Entscheidungsmodellen”, Optimization 17, 525–533.
Article Google Scholar
U.D. Holzbaur (1994): “Bounds for the quality and the number of steps in Bellman’s value iteration algorithm”, OR Spektrum 15, 231–234.
Article Google Scholar
A. Hordijk (1971): “A sufficient condition for the existence of an optimal policy with respect to the average cost criterion in Markovian decision processes”, Transactions of the Sixth Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, Academia, Prague, 263–274.
Google Scholar
A. Hordijk (1974): “Dynamic programming and Markov potential theory”, Mathematical Centre Tract 51, Amsterdam.
Google Scholar
A. Hordijk, R. Dekker and L.C.M. Kallenberg (1985): “Sensitivity-analysis in discounted Markovian decision problems”, OR Spektrum 7, 143–151.
Article Google Scholar
A. Hordijk and L.C.M. Kallenberg (1979): “Linear programming and Markov decision chains”, Management Science 25, 352–362.
Article Google Scholar
A. Hordijk and L.C.M. Kallenberg (1984a): “Transient policies in discrete dynamic programming: linear programming including suboptimality tests and additional constraints”, Mathematical Programming 30, 46–70.
Article Google Scholar
A. Hordijk and L.C.M. Kallenberg (1984b): “Constrained undiscounted stochastic dynamic programming”, Mathematics of Operations Research 9, 276–289.
Article Google Scholar
A. Hordijk and J.A. Loeve (1994): “Undiscounted Markov decision chains with partial information; an algorithm for computing a locally optimal periodic policy”, Mathematical Methods of Operations Research 40, 163–181.
Article Google Scholar
A. Hordijk and H.C. Tijms (1974): “The method of successive approx- imations and Markovian decision problems”, Operations Research 22, 519–521.
Article Google Scholar
A. Hordijk and H.C. Tijms (1975): “A modified form of the iterative method of dynamic programming”, Annals of Statictics 3, 203–208.
Article Google Scholar
A. Hordijk and H.C. Tijms (1975): “On a conjecture of Iglehart”, Management Science 11, 1342–1345.
Article Google Scholar
R.A. Howard (1960): “Dynamic programming and Markov processes”, MIT Press, Cambridge.
Google Scholar
R.A. Howard (1963): “Semi-Markovian decision processes”, Proceedings International Statistical Institute, Ottawa, Canada.
Google Scholar
Y. Huang and L.C.M. Kallenberg (1994): “On finding optimal policies for Markov decision chains: a unifying framework for mean-variance trade-offs”, Mathematics of Operations Research 19, 434–448.
Article Google Scholar
G. Hübner (1977): “Improved procedures for eliminating suboptimal actions in Markov programming by the use of contraction properties”, Transactions of the 7th Prague Conference on Information Theory, Statistical Decision Functions, Reidel, Dordrecht, 257–263.
Google Scholar
G. Hübner (1988): “A unified approach to adaptive control of average reward Markov decision processes”, OR Spektrum 10, 161–166.
Article Google Scholar
D. Iglehart (1963): “Optimality of (s, S)-policies in the infinite horizon dynamic inventory problem”, Management Science 9, 259–267.
Article Google Scholar
T. Ishikida and P. Varaiya (1994): “Multi-armed bandit problem revis- ited”, Journal of Optimization Theory and Applications 83, 113–154.
Article Google Scholar
R.G. Jeroslow (1972): “An algorithm for discrete dynamic programming with interest rates near zero”, Management Science Research Report no. 300, Carnegie-Mellon University, Pittsburgh.
Google Scholar
W.S. Jewell (1963a): “Markov renewal programming. I: Formulation, finite return models”, Operations Research 11, 938–948.
Article Google Scholar
W.S. Jewell (1963b): “Markov renewal programming. II: Infinite return models, example”, Operations Research 11, 949–971.
Article Google Scholar
L.C.M. Kallenberg (1981a): “Finite horizon dynamic programming and linear programming”, Methods of Operations Research 43, 105–112.
Google Scholar
L.C.M. Kallenberg (1981b): “Unconstrained and constrained dynamic programming over a finite horizon”, Report, University of Leiden, The Netherlands.
Google Scholar
L.C.M. Kallenberg (1981c): “Linear programming to compute a bias-optimal policy”, in: B. Fleischmann et al. (eds.) “Operations Research Proceedings”, 433–440.
Google Scholar
L.C.M. Kallenberg (1983): “Linear programming and finite Markovian control problems”, Mathematical Centre Tract 148, Mathematical Centre, Amsterdam.
Google Scholar
L.C.M. Kallenberg (1986): “Note on M.N.Katehakis and Y.-R.Chen’s computation of the Gittins index”, Mathematics of Operations Research 11, 184–186.
Article Google Scholar
L.C.M. Kallenberg (1992): “Separable Markovian decision problem: the linear programming method in the multichain case”, OR Spektrum 14, 43–52.
Article Google Scholar
L.C.M. Kallenberg (1999): “Combinatorial problems in MDPs”, Report, University of Leiden, The Netherlands (to appear in the Proceedings of the Changsha International Workshop on Markov Processes & Controlled Markov Chains).
Google Scholar
P.C. Kao (1973): “Optimal replacement rules when the changes of states are semi-Markovian”, Operations Research 21, 1231–1249.
Article Google Scholar
M.N. Katehakis and C. Derman (1984): “Optimal repair allocation in a series system”, Mathematics of Operations Research 9, 615–623.
Article Google Scholar
M.N. Katehakis and C. Derman (1989): “On the maintenance of systems composed of highly reliable components”, Management Science 35, 551–560.
Article Google Scholar
M.N. Katehakis and A.F. Veinott, Jr. (1987): “The multi-armed bandit problem: decomposition and computation”, Mathematics of Operations Research 12, 262–268.
Article Google Scholar
H. Kawai (1987): “A variance minimization problem for a Markov decision process”, European Journal of Operational Research 31, 140–145.
Article Google Scholar
H. Kawai and N. Katoh (1987): “Variance constrained Markov decision process”, Journal of the Operations Research Society of Japan 30, 88–100.
Google Scholar
J.G. Kemeny and J.L. Snell (1960): “Finite Markov chains”, Van Nostrand, Princeton.
Google Scholar
M. Klein (1962): “Inspection-maintenance-replacement schedules under Markovian deterioration”, Management Science 9, 25–32.
Article Google Scholar
P. Kolesar (1966): “Minimum-cost replacement under Markovian deterioration”, Management Science 12, 694–706.
Article Google Scholar
M. Kurano (1983): “Adaptive policies in Markov decision processes with uncertain transition matrices”, Journal of Information and Optimization Sciences 4, 21–40.
Google Scholar
H. Kushner (1971): “Introduction to stochastic control”, Holt, Rineholt and Winston, New York.
Google Scholar
H. Kushner and A.J. Keinmann (1971): “Accelerated procedures for the solution of discrete Markov control problems”, IEEE Transactions on Automatic Control 16, 147–152.
Article Google Scholar
E. Lanery (1967): “Etude asymptotique des systèmes Markovien à commande”, Revue d’Informatique et Recherche Operationelle 1, 3–56.
Google Scholar
J.B. Lasserre (1994a): “A new policy iteration scheme for Markov decision processes using Schweitzer’s formula”, Journal of Applied Probability 31, 268–273.
Article Google Scholar
J.B. Lasserre (1994b): “Detecting optimal and non-optimal actions in average-cost Markov decision processes”, Journal of Applied Probability 31, 979–990.
Article Google Scholar
W. Lin and P.R. Kumar (1984): “Optimal control of a queueing system with two heterogeneous servers”, IEEE Transactions on Automatic Control AC-29, 696–705.
Google Scholar
S.A. Lippman (1969): “Criterion equivalence in discrete dynamic programming”, Operations Research 17, 920–923.
Article Google Scholar
J.Y. Liu and K. Liu (1994): “An algorithm on the Gittins index”, Systems Science and Mathematical Science 7, 106–114.
Google Scholar
Q.-S. Liu and K. Ohno (1992): “Multiobjective undiscounted Markov renewal program and its application to a tool replacement problem in an FMS”, Information and Decision Techniques 18, 67–77.
Google Scholar
Q.-S. Liu, K. Ohno and H. Nakayama (1992): “Multi-objective discounted Markov processes with expectation and variance criteria”, International Journal of System Science 23, 903–914.
Article Google Scholar
J.A. Loeve (1995): “Markov decision chains with partial information”, PhD dissertation, University of Leiden, The Netherlands.
Google Scholar
W.S. Lovejoy (1987): “Some monotonicity results for partially observed Markov processes”, Operations Research 35, 736–743.
Article Google Scholar
W.S. Lovejoy (1991a): “Computationally feasible bounds for partially observed Markov decision processes”, Operations Research 39, 162–175.
Article Google Scholar
W.S. Lovejoy (1991b) “A survey of algorithmic methods for partially observed Markov decision processes”, Annals of Op. Research 28, 47–66.
Article Google Scholar
J. Macqueen (1966): “A modified programming method for Markovian decision problems”, Journal of Mathematical Analysis and Applications 14, 38–43.
Article Google Scholar
J. Macqueen (1967): “A test for suboptimal actions in Markov decision problems”, Operations Research 15, 559–561.
Article Google Scholar
A.S. Manne (1960): “Linear programming and sequential decisions”, Management Science, 259–267.
Google Scholar
U. Meister and U. Holzbaur (1986): “A polynomial time bound for Howard’s policy improvement algorithm”, OR Spektrum 8, 37–40.
Article Google Scholar
B.L. Miller and A.F. Veinott Jr. (1969): “Discrete dynamic programming with a small interest rate”, Annals of Mathematical Statistics 40, 366–370.
Article Google Scholar
H. Mine and S. Osaki (1970): “Markov decision processes”, American Elsevier, New York.
Google Scholar
G.E. Monahan (1982): “A survey of partially observable Markov decision processes: theory, models and algorithms”, Management Science 28, 1–16.
Article Google Scholar
T. Morton (1971): “Undiscounted Markov renewal programming via mod- ified successive approximations”, Operations Research 19, 1081–1089.
Article Google Scholar
J.L. Nazareth and R.B. Kulkarni (1986): “Linear programming formulations of Markov decision processes”, OR Letters 5, 13–16.
Google Scholar
M.K. Ng (1999): “A note on policy iteration algorithms for discounted Markov decision problems”, OR Letters 25, 195–197.
Google Scholar
A. Odoni (1969): “On finding the maximal gain for Markov decision processes”, Operations Research 17, 857–860.
Article Google Scholar
S. Oezekici (1988): “Optimal periodic replacement of multicomponent reliability systems”, Operations Research 36, 542–552.
Article Google Scholar
K. Ohno (1981): “A unified approach to algorithms with a suboptimality test in discounted semi-Markov decision processes”, Journal of the Operations Research Society of Japan 24, 296–323.
Google Scholar
S. Osaki and H. Mine (1968): “Linear programming algorithms for semi-Markovian decision processes”, Journal of Mathematical Analysis and Applications 22, 356–381.
Article Google Scholar
T. Parthasarathy, S.H. Tijs and O.J. Vrieze (1984), “Stochastic games with state independent transitions and reparable rewards in: G. Hammer and D. Pallaschke (eds.), Selected Topics in Operations Research and Mathematical Economics.
Google Scholar
L.K. Platzman (1977): “Improved conditions for convergence in undiscounted Markov renewal programming”, Op. Research 25, 529–533.
Article Google Scholar
M.A. Pollatschek and B. Avi-Itzhak (1969): “Algorithms for stochastic games with geometric interpretation”, Management Science 15, 399–415.
Article Google Scholar
E.L. Porteus (1971): “Some bounds for discounted sequential decision processes”, Management Science 18, 7–11.
Article Google Scholar
E.L. Porteus (1975): “Bounds and transformations for discounted finite Markov decision chains”, Operations Research 23, 761–784.
Article Google Scholar
E.L. Porteus (1980a): “Improved iterative computation of the expected return in Markov and semi-Markov chains”, Zeitschrift für Operations Research 24, 155–170.
Google Scholar
E.L. Porteus (1980b): “Overview of iterative methods for discounted finite Markov and semi-Markov chains”, in: R. Hartley, L.C. Thomas and D.J. White (eds.), “Recent development in Markov decision processes”, Academic Press, New York, 1–20.
Google Scholar
E.L. Porteus (1981): “Computing the discounted return in Markov and semi- Markov chains”, Naval Research Logistics Quarterly 28, 567–577.
Article Google Scholar
E. L. Porteus and J.C. Totten (1978): “Accelerated computation of the expected discounted return in a Markov chain”, Operations Research 26, 350–358.
Article Google Scholar
M.L. Puterman (1981): “Computational methods for Markov decision methods”, Proceedings of 1981 Joint Automatic Control Conference.
Google Scholar
M.L. Puterman (1994): “Markov decision processes”, Wiley, New York.
Book Google Scholar
M.L. Puterman and S.L. Brumelle (1979): “On the convergence of policy iteration in stationary dynamic programming”, Mathematics of Operations Research 4, 60–69.
Article Google Scholar
M.L. Puterman and M.C. Shin (1978): “Modified policy iteration algorithms for discounted Markov decision chains”, Management Science 24, 1127–1137.
Article Google Scholar
M.L. Puterman and M.C. Shin (1982): “Action elimination procedures for modified policy iteration algorithms” Operations Research 30, 301–318.
Article Google Scholar
D. Reetz (1973): “Solution of a Markovian decision problem by successive overrelaxation”, Zeitschrift für Operations Research 17, 29–32.
Google Scholar
D. Reetz (1976): “A decision exclusion algorithm for a class of Markovian decision processes”, Zeitschrift für Operations Research 20, 125–131.
Google Scholar
U. Rieder (1991): “Structural results for partially observed control problems”, Zeitschrift für Operations Research 35, 473–490.
Google Scholar
R. Righter (1994): “Scheduling”, in: M. Shaked and J.G. Shantikumar (eds.), “Stochastic orders and their applications”, Academic Press, 381–432.
Google Scholar
M. Roosta (1982): “Routing through a network with maximum reliability”, Journal of Mathematical Analysis and Applications 88, 341–347.
Article Google Scholar
K.W. Ross (1989): “Randomized and past-dependent policies for Markov decision processes with multiple constraints”, Operations Research 37, 474–477.
Article Google Scholar
K.W. Ross and R. Varadarajan (1991): “Multichain Markov decision processes with a sample path constraint: a decomposition approach”, Mathematics of Operations Research 16, 195–207.
Article Google Scholar
S.M. Ross (1969): “A problem in optimal search and stop”, Operations Research 17, 984–992.
Article Google Scholar
S.M. Ross (1970): “Applied probability models with optimization applications”, Holden-Day, San Francisco.
Google Scholar
S.M. Ross (1974): “Dynamic programming and gambling models”, Advances in Applied Probability 6, 593–606.
Article Google Scholar
S.M. Ross (1983): “Introduction to stochastic dynamic programming”, Academic Press, New York.
Google Scholar
U.G. Rothblum (1979): “Iterated successive approximation for sequential decision processes”, in J.W.B. van Overhagen and H.C. Tijms (eds.), “Stochastic control and optimization”, Free University, Amsterdam, 30–32.
Google Scholar
H. Scarf (1960): “The optimality of (s, S) polices in the dynamic inventory problem”, Chapter 13 in: K.J. Arrow, S. Karlin and P. Suppes (eds.), “Mathematical methods in the social sciences”, Stanford University Press, Stanford.
Google Scholar
H. Schellhaas (1974): “Zur extrapolation in Markorffschen Entscheidungsmodellen mit Diskontierung”, Zeitschrift für Operations Research 18, 91–104.
Google Scholar
N. Schmitz (1985): “How good is Howard’s policy improvement algorithm?”, Zeitschrift fur Operations Research 29, 315–316.
Google Scholar
L. Schrage (1968): “A proof of the optimality of the shortest remaining processing time discipline”, Operations Research 16, 687–690.
Article Google Scholar
P.J. Schweitzer (1965): “Perturbation theory and Markovian decision processes”, Ph.D. dissertation, M.I.T., Op. Research Center Report 15.
Google Scholar
P.J. Schweitzer (1968): “Perturbation theory and finite Markov chains” Journal of Applied Probability 5, 401–413.
Article Google Scholar
P.J. Schweitzer (1971a): “Multiple policy improvements in undiscounted Markov renewal programming”, Operations Research 19. 784–793.
Article Google Scholar
P.J. Schweitzer (1971b): “Iterative solution of the functional equations of undiscounted Markov renewal programming”, Journal of Mathematical Analysis and Applications 34, 495–501.
Article Google Scholar
P.J. Schweitzer (1984): “A value-iteration scheme for undiscounted multichain Markov renewal programs”, ZOR—Zeitschrift für Operations Research 28, 143–152.
Google Scholar
P.J. Schweitzer (1985): “The variational calculus and approximations in policy space for Markov decision processes”, Journal of Mathematical Analysis and Applications 110, 568–582.
Article Google Scholar
P.J. Schweitzer (1987): “A Brouwer fixed-point mapping approach to communicating Markov decision processes”, Journal of Mathematical Analysis and Applications 123, 117–130.
Article Google Scholar
P.J. Schweitzer (1991): “Block-scaling of value-iteration for discounted Markov renewal programming”, Annals of Op. Research 29, 603–630.
Article Google Scholar
P.J. Schweitzer and A. Federgruen (1977): “The asymptotic behavior of value iteration in Markov decision problems”, Mathematics of Operations Research 2, 360–381.
Article Google Scholar
P.J. Schweitzer and A. Federgruen (1978a): “Foolproof convergence in multichain policy iteration”, Journal of Mathematical Analysis and Applications 64, 360–368.
Article Google Scholar
P.J. Schweitzer and A. Federgruen (1978b): “The functional equations of undiscounted Markov renewal programming”, Mathematics of Operations Research 3, 308–321.
Article Google Scholar
P.J. Schweitzer and A. Federgruen (1979): “Geometric convergence of value iteration in multichain Markov decision problems”, Advances of Applied Probability 11, 188–217.
Article Google Scholar
L.I. Sennott (1999): “Stochastic dynamic programming and the control of queueing systems”, Wiley, New York.
Google Scholar
E.L. Sernik and S.I. Markus (1991): “On the computation of the optimal cost function for discrete time Markov models with partial observations”, Annals of Operations Research 29, 471–512.
Article Google Scholar
J.F. Shapiro (1975): “Brouwer’s fixed point theorem and finite state space Markovian decision theory”, Journal of Mathematical Analysis and Applications 49, 710–712.
Article Google Scholar
L.S. Shapley (1953): “Stochastic games”, Proceedings of the National Academy of Sciences, 1095–1100.
Google Scholar
Y.S. Sherif and M.L. Smith (1981): “Optimal maintenance policies for systems subject to failure—A review”, Naval Research Logistics Quarterly 28, 47–74.
Article Google Scholar
K. Sladky (1974): “On the set of optimal controls for Markov chains with rewards”, Kybernatika 10, 350–367.
Google Scholar
R.D. Smallwood (1966): “Optimum policy regions for Markov processes with discounting”, Operations Research 14, 658–669.
Article Google Scholar
R.D. Smallwood and E.Sondik (1973): “The optimal control of partially observable Markov processes over a finite horizon”, Operations Research 21, 1071–1088.
Article Google Scholar
D.R. Smith (1978): “Optimal repairman allocation—asymptotic results”, Management Science 24, 665–674.
Article Google Scholar
M.J. Sobel (1981), “Myopic solutions of Markov decision processes and stochastic games”, Operations Research 29, 995–1009.
Article Google Scholar
M.J. Sobel (1985): “Maximal mean/standard deviation ratio in an undiscounted MDP”, OR Letters 4, 157–159.
Google Scholar
M.J. Sobel (1994): “Mean-variance trade-offs in an undiscounted MDP”, Operations Research 42, 175–183.
Article Google Scholar
E. Sondik (1978): “The optimal control of partially observable Markov processes over the infinite horizon: discounted costs”, Operations Research 26, 282–304.
Article Google Scholar
I.M. Sonin (1999): “The elimination algorithm for the problem of optimal stopping”, Mathematical Methods of Operations Research 49, 111–124.
Google Scholar
D. Spreen (1981): “A further anti-cycling rule in multi-chain policy iter- ation for undiscounted Markov renewal programs”, Zeitschrift für Operations Research 25, 225–234.
Google Scholar
J. Stein (1988): “On efficiency of linear programming applied to dis- counted Markovian decision problems”, OR Spektrum 10, 153–160.
Article Google Scholar
S.S. Stidham, Jr. (1985): “Optimal control of admission to a queueing system”, IEEE Transactions on Automatic Control AC-30, 705–713.
Google Scholar
S.S. Stidham, Jr. and R.R. Weber (1993): “A survey of Markov decision models for control of networks of queues”, Queueing Systems 13, 291–314.
Article Google Scholar
J. Stoer and R. Bulirsch (1980): “Introduction to numerical analysis”, Springer-Verlag, New York.
Google Scholar
R. Strauch and A.F. Veinott, Jr. (1966): “A property of sequential control processes”, Report, Rand McNally, Chicago.
Google Scholar
M. Sun (1993): “Revised simplex algorithm for finite Markov decision processes”, Journal of Optimization Theory and Applications 79, 405–413.
Article Google Scholar
L.C. Thomas (1981): “Second order bounds for Markov decision processes”, Journal of Mathematical Analysis and Applications 80, 294–297.
Article Google Scholar
L.C. Thomas (1983): “Constrained Markov decision processes as multiobjective problems”, in: “Multi-objective decision making”, Academic Press, 77–94.
Google Scholar
H.C. Tijms (1986): “Stochastic modelling and analysis: a computational approach”, Wiley, Chichester.
Google Scholar
J.N. Tsitsiklis (1986): “A lemma on the multi-armed bandit problem”, IEEE Transactions on Automatic Control 31, 576–577.
Article Google Scholar
J.N. Tsitsiklis (1993): “A short proof of the Gittins index theorem”, Annals of Applied Probability 4, 194–199.
Article Google Scholar
F.A. Van der Duyn Schouten and S.G. Vanneste (1990): “Analysis and computation of (n, N)-strategies for maintenance of a two-component system”, European Journal of Operations Research 48, 260–274.
Article Google Scholar
J. Van der Wal (1980): “The method of value oriented successive approximations for the average reward Markov decision processes”, OR Spektrum 1, 233–242.
Article Google Scholar
J. Van der Wal (1981): “Stochastic dynamic programming”, Mathematical Centre Tract 139, Mathematical Centre, Amsterdam.
Google Scholar
K.M. Van Hee (1978): “Markov strategies in dynamic programming”, Mathematics of Operations Research 3, 191–201.
Google Scholar
K.M. Van Hee, A. Hordijk and J. Van der Wal (1977): “Successive approximations for convergent dynamic programming”, in: H.C. Tijms and J. Wessels (eds.), “Markov decision theory”, Mathematical Centre Tract no. 93, Mathematical Centre, Amsterdam, 183–211.
Google Scholar
J.A.E.E. Van Nunen (1976a): “A set of successive approximation method for discounted Markovian decision problems”, Zeitschrift für Operations Research 20, 203–208.
Google Scholar
J.A.E.E. Van Nunen (1976b): “Contracting Markov decision processes”, Mathematical Centre Tract 71, Mathematical Centre, Amsterdam.
Google Scholar
J.A.E.E. Van Nunen (1976c): “Improved successive approximation methods for discounted Markovian decision processes”, in: A. Prekopa (ed.), “Progress in Operations Research”, North Holland, Amsterdam, 667–682.
Google Scholar
J.A.E.E. Van Nunen and J. Wessels (1976): “A principle for generating optimization procedures for discounted Markov decision processes”, Colloquia Mathematica Societatis Bolyai Janos, Vol. 12, North Holland, Amsterdam, 683–695.
Google Scholar
J.A.E.E. Van Nunen and J. Wessels (1977): “The generation of successive approximations for Markov decision processes using stopping times”, in: “Markov decision theory”, H. Tijms and J. Wessels (eds.), Mathematical Centre Tract 93, Mathematical Centre, Amsterdam, 25–37.
Google Scholar
P.P. Varaiya, J.C. Walrand and C. Buyukkoc (1985): “Extensions of the multi-armed bandit problem: the discounted case”, IEEE Transactions on Automatic Control 30, 426–439.
Article Google Scholar
A.F. Veinott, Jr. (1966a): “On the optimality of (s, S) inventory policies: new condition and a new proof”, SIAM Journal on Applied Mathematics 14, 1067–1083.
Article Google Scholar
A.F. Veinott, Jr. (1966b): “On finding optimal policies in discrete dynamic programming with no discounting”, Annals of Math. Stats. 37, 1284–1294.
Article Google Scholar
A.F. Veinott, Jr. (1969): “Discrete dynamic programming with sensitive discount optimality criteria”, Annals of Math. Stats. 40, 1635–1660.
Article Google Scholar
A.F. Veinott, Jr. (1974): “Markov decision chains”, in: G.B. Dantzig and B.C. Eaves (eds.), “Studies in Optimization”, Studies in Mathematics, Volume 10, The Mathematical Association of America, 124–159.
Google Scholar
R.C. Vergin and M. Scribian (1977): “Maintenance scheduling for multicomponent equipment”, AIIE Transactions 9, 297–305.
Article Google Scholar
O.J. Vrieze, (1987): “Stochastic games with finite state and action spaces”, CWI Tract 33, Centre for Mathematics and Computer Science, Amsterdam.
Google Scholar
K. Wakuta (1992): “Optimal stationary policies in the vector-valued Markov decision process”, Stochastic Processes and its Applications 42, 149–156.
Article Google Scholar
K. Wakuta (1995): “Vector-valued Markov decision processes and the systems of linear inequalities”, Stochastic Processes and its Applications 56, 159–169.
Article Google Scholar
K. Wakuta (1996): “A new class of policies in vector-valued Markov deci- sion processes”, Journal of Mathematical Analysis and Applications 202, 623–628.
Article Google Scholar
K. Wakuta (1999): “A note on the structure of value spaces in vector-valued Markov decision processes”, Mathematical Methods of Operations Research 49, 77–86.
Google Scholar
J. Walrand (1988): “An introduction to queueing networks”, Prentice-Hall, Englewood Cliffs, New Jersey.
Google Scholar
R.R. Weber (1982): “Scheduling jobs with stochastic processing requirements on parallel machines to minimize makespan or flowtime.
Google Scholar
R.R. Weber (1992): “On the Gittins index for multi-armed bandits”, Annals of Applied Probability 2, 1024–1033.
Article Google Scholar
R.R. Weber and S.S. Stidham, Jr. (1987): “Optimal control of services rates in networks of queues”, Advances in Applied Probability 19, 202–218.
Article Google Scholar
G. Weiss (1982): “Multiserver stochastic scheduling”, in: M.A.H. Dempster, J.K. Lenstra and A.H.G. Rinnooy Kan (eds.), “Deterministic and stochastic scheduling”, Reidel, Dordrecht, Holland, 157–179.
Chapter Google Scholar
G. Weiss (1988): “Branching bandit processes”, Probability in the Engineering and Information Sciences 2, 269–278.
Article Google Scholar
J. Wessels and J.A.E.E. Van Nunen (1975): “Discounted semi-Markov decision processes: linear programming and policy iteration”, Statistical Neerlandica 29, 1–7.
Article Google Scholar
J. Wessels (1977): “Stopping times on Markov programming”, in: Transactions of the 7th Prague Conference on Information Theory, Statistical Decision Functions and Random Processes, Academia, Prague, pp. 575–585.
Google Scholar
C.C. White, III (1976): “Procedures for the solution of a finite-horizon, partially observed, semi-Markov optimization problem”, Operations Research 24, 348–358.
Article Google Scholar
C.C. White, III (1991): “A survey of solution techniques for the partially observed Markov decision process”, Annals of Operations Research 33, 215–230.
Article Google Scholar
C.C. White, III and W.T. Scherer (1989): “Solution procedures for par- tially observed Markov decision processes”, Op. Research 37, 791–797.
Article Google Scholar
C.C. White, III and W.T. Scherer (1994): “Finite-memory suboptimal design for partially observed Markov decision processes”, Op. Research 42, 439–455.
Article Google Scholar
D.J. White (1963): “Dynamic programming, Markov chains, and the method of successive approximations”, Journal of Mathematical Analysis and Applications 6, 373–376.
Article Google Scholar
D.J. White (1978): “Elimination of non-optimal actions in Markov decision processes”, in: M.L. Puterman (ed.) Dynamic programming and its applications, Academic Press, New York, 131–160.
Google Scholar
D.J. White (1982): “Multi-objective infinite-horizon discounted Markov decision processes”, Journal of Mathematical Analysis and Applications 89, 639–647.
Article Google Scholar
D.J. White (1985): “Real applications of Markov decision theory”, Interfaces 15:6, 73–83.
Article Google Scholar
D.J. White (1988): “Further real applications of Markov decision theory”, Interfaces 18:5, 55–61
Article Google Scholar
D.J. White (1988): “Mean, variance and probabilistic criteria in finite Markov decision processes: a review”, Journal of Optimization Theory and Applications 56, 1–30.
Article Google Scholar
D.J. White (1992): “Computational approaches to variance-penalized Markov decision processes”, OR Spektrum 14, 79–83.
Article Google Scholar
D.J. White (1993): “A survey of applications of Markov decision processes”, Journal of the Operational Research Society 44, 1073–1096.
Google Scholar
D.J. White (1993): “Markov decision processes”, Wiley, Chichester.
Google Scholar
D.J. White (1994): “A mathematical programming approach to a problem in variance penalised Markov decision processes”, OR Spektrum 15, 225–230.
Article Google Scholar
D.J. White (1995): “A superharmonic approach to solving infinite horizon partially observable Markov decision problems”, Mathematical Methods of Operations Research 41, 71–88.
Article Google Scholar
P. Whittle (1980): “Multi-armed bandits and the Gittins index”, Journal of the Royal Statistical Society, Series B 42, 143–149.
Google Scholar
P. Whittle (1982): “Optimization over time; dynamic programming and stochastic control”, Volume I, Wiley, New York.
Google Scholar
P. Whittle (1982): “Optimization over time; dynamic programming and stochastic control”, Volume II, Wiley, New York.
Google Scholar
M. Yasuda (1988): “The optimal value of Markov stopping problems with one-step look ahead policy”, Journal of Applied Probability 25, 544–552.
Article Google Scholar
Y.-S. Zheng and A. Federgruen (1991): “Finding optimal (s, S)-policies is about as ssimple as evaluating a single policy”, Op. Research 39, 654–665.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Mathematical Institute, University of Leiden, 2300 RA, Leiden, The Netherlands
Lodewijk Kallenberg

Authors

Lodewijk Kallenberg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

State University of New York at Stony Brook, USA
Eugene A. Feinberg
Technion—Israel Institute of Technology, Israel
Adam Shwartz

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kallenberg, L. (2003). Finite State and Action MDPS. In: Feinberg, E.A., Shwartz, A. (eds) Handbook of Markov Decision Processes. International Series in Operations Research & Management Science, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0805-2_2

Download citation

DOI: https://doi.org/10.1007/978-1-4615-0805-2_2
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5248-8
Online ISBN: 978-1-4615-0805-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics