A survey of algorithms for some restricted classes of Markov decision problems

White, D. J.

doi:10.1007/978-3-642-99749-5_13

A survey of algorithms for some restricted classes of Markov decision problems

D. J. White¹

Conference paper

98 Accesses
2 Citations

Part of the book series: Proceedings in Operations Research 8 ((ORP,volume 1978))

Abstract

This paper will give a survey of some standard infinite stage Markov decision process equations and algorithms for solving these. The paper will restrict itself to finite state sets, finite action sets, and bounded rewards. The reader will find Schweitzer [80] of some relevance for a more general, but now out-of-date, reference list to Markov decision processes. Although we restrict ourselves to conventional Markov decision processes, we refer the reader to more general formulations such as those of Koehler [48], [49] who considers Leontief and essentially Leontief type extensions, and the method of successive approximations, and that of Rothblum [75].

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. Anthonisse, H.C. Tijms, On White’s Condition in Dynamic Programming, Mathematische Centrum, Report B.W. 46/75, Amsterdam, 1975.
Google Scholar
J. Bather, Optimal Decision Procedures for Finite Markov Chains, Pt.II: Communicating Systems, Adv. Appl. Prob. 5, 1975, pp. 521–540.
Article Google Scholar
R. Bellman, A Markovian Decision Process, J. Math. Mech., 6, 1957, pp. 679–684.
Google Scholar
P.F. Bestwick, D. Sadjadi, An Action Elimination Algorithm for the Discounted Semi-Markov Problem, University of Bradford Management Centre, 1978.
Google Scholar
D. Blackwell, Discrete Dynamic Programming, Ann. Math. Stat. 33, 1962, pp. 719–726.
Article Google Scholar
B.W. Brown, On The Iterative Method of Dynamic Programming on a Finite Space Discrete Time Markov Process, Ann. Math. Stat. 36, 1965, pp. 1279–1285.
Article Google Scholar
G. Dantzig, P. Wolfe, Linear Programming in a Markov Chain, Opns. Res., X, 1962, pp. 702–710.
Google Scholar
J. De Cani, A Dynamic Programming Algorithm for Imbedded Markov Chains When the Planning Horizon is at Infinity, Man. Sci. 10, 1964; pp. 716–733.
Google Scholar
G.T. De Ghellinck, G.D. Eppen, Linear Programming Solutions for Separable Markovian Decision Problems, Man. Sci. 13, 1967, pp. 371–394.
Google Scholar
E.V. Denardo, B. Fox, Multichain Markov Renewal Programs, SIAM J. Appl. Math. 16, 1968, pp. 468–487.
Google Scholar
E.V. Denardo, Markov Renewal Programs with Small Interest Rates, Ann. Math. Stat. 42, 1971, pp. 477–496.
Article Google Scholar
E.V. Denardo, Separable Markov Decision Problems, Man. Sci. 14, 1968, pp. 451–462.
Article Google Scholar
E.V. Denardo, On Linear Programming in a Markov Decision Problem, Man. Sci. 16, 1970A, pp. 281–288.
Google Scholar
E.V. Denardo, Contraction Mappings in the Theory Underlying Dynamic Programming, SIAM Rev. 9, 1967, pp. 165–177.
Article Google Scholar
E.V. Denardo, A Markov Decision Problem, in T.C. Hu, S.M. Robinson (Eds.), Mathematical Programming, Academic Press, New York, 1973.
Google Scholar
C. Derman, On Sequential Decisions and Markov Chains, Man. Sci. IX, 1962, pp. 16–24.
Google Scholar
C. Derman, Markovian Decision Processes, Ann. Math. Stat., 37, 1966, pp. 1545–1553.
Article Google Scholar
B. Curtis Eaves, Complementary Pivot Theory and Markov Decision Chains, in S. Karamordiou, C.B. Garcia (Eds.), Fixed Points, Algorithms and Applications, First International Conference on Computing Fixed Points with Applications, Clemson University, South Carolina, June 1974.
Google Scholar
A. Federgruen, P.J. Schweitzer, Discounted and Undiscounted Value Iteration in Markov Decision Problems: A Survey, Math. Centrum Report B.W; 78/77, August 1978, Amsterdam.
Google Scholar
B.L. Fox, Numerical Computation; Transient Behaviour of a Markov Renewal Process, Paper No. 119, Département d’Informatique, University of Montreal, 1973.
Google Scholar
B.L. Fox, Markov Renewal Programming by Linear Fractional Programming, SIAM J. Appl. Math. 14, 1966, pp. 1418–1432.
Google Scholar
R. Grinold, Elimination of Suboptimal Actions in Markov Decision Problems, Opns. Res. 21, 1973, pp. 848–851.
Google Scholar
N.A.J. Hastings, J.M.C. Mello, Tests for Suboptimal Actions in Discounted Markov Programming, Man. Sci. 19, 1973, pp. 1019–1022.
Google Scholar
N.A.J. Hastings, A Test for Non-Optimal Actions in Undiscoutned Markov Decision Chains, Man. Sci. 23, 1976, pp. 87–92.
Google Scholar
N.A.J. Hastings, Bounds on the Gain of a Markov Decision Process, Opns. Res. 19, 1971, pp. 240–243.
Google Scholar
N.A.J. Hastings, J.A.E.E. van Nunen, The Action Elimination Algorithms for Markov Decision Processes, Memorandum COSOR 76–20, Department of Mathematics, Eindhoven University of Technology, 1976.
Google Scholar
N.A.J. Hastings, Optimisation of Discounted Markov Decision Problems, Opl. Res. Q. 20, 1969, pp. 499–500.
Google Scholar
N.A.J. Hastings, Some Notes on Dynamic Programming and Replacement, Opl. Res. Q. 19, 1968, pp. 453–457.
Google Scholar
N.A.J. Hastings, J.M.C. Mello, Decision Networks, Wiley, 1978.
Google Scholar
K. Hinderer, Estimates for Finite-Stage Dynamic Programs, J.M.A.A., 55, 1976, pp. 205–238.
Google Scholar
K. Hinderer, On Approximate Solutions of Finite-Stage Dynamic Programs, International Conference on Dynamic Programming, University of British Columbia, Vancouver, 1977.
Google Scholar
K. Hinderer, G. Hübner, On Exact and Approximate Solutions of Unstructured Finite Stage Dynamic Programs, Proceedings of the Advanced Seminar on Markov Decision Theory, Mathematical Centre, Amsterdam, 1976.
Google Scholar
K. Hinderer, G. Hübner, Recent Results on Finite Stage Stochastic Dynamic Programs, Proceedings of the 41st Session of the International Statistical Institute, New Delhi, December 1977.
Google Scholar
K. Hinderer, W. Whitt, On Approximate Solutions of Finite-Stage Dynamic Programs, Proceedings of International Conference on Dynamic Programming, University of British Columbia, Vancouver, 1977.
Google Scholar
H. Hinomoto, Linear Programming of Markovian Decisions, Man. Sci. Th., 18, 1971, pp. 88–96.
Google Scholar
D. Hitchcock, J. MacQueen, On Computing the Expected Discounted Return in a Markov Chain, Naval Res. Log. Quart. 17, 1970, pp. 237–241.
Article Google Scholar
A. Hordijk, H. Tijms, A Modified Form of the Iterative Method of Dynamic Programming, Ann. Math. Stat. 3, 1975, pp. 203–208.
Article Google Scholar
A. Hordijk, H. Tijms, The Method of Successive Approximations and Markovian Decision Problems, Opns. Res. 22, 1974, pp. 519–521.
Google Scholar
A. Hordijk, P.J. Schweitzer, H. Tijms, The Asymptotic Behaviour of the Minimal Total Expected Cost for the Denumerable State Markov Decision Model, J. Appl. Prob. 12, 1975, pp. 298–305.
Article Google Scholar
A. Hordijk, L.C.M. Kallenberg, On Solving Markov Decision Problems by Linear Programming, International Conference on Markov Decision Processes, Manchester, July 1978.
Google Scholar
R.A. Howard, Semi-Markovian Control Systems, in: Semi-Markovian Decision Processes, Proc. 34 Session Bull. Internat. Stat. Inst. 40, Book 2, 1964, pp. 625–652.
Google Scholar
R.A. Howard, Dynamic Programming and Markov Processes, 1960.
Google Scholar
R.A. Howard, Research in Semi-Markovian Soc. of Japan, 6, 1964, pp. 163–199.
Google Scholar
G. Hübner, Extrapolation and Exclusion Stage Markov Decision Models, Doctoral burg, Germany, 1977. Decision Structures, J. Opns. Res. of Suboptimal Actions in Finite- Dissertation, University of Ham
Google Scholar
G. Hübner, Improved Procedures for Eliminating Suboptimal Actions in Markov Programming by Use of Contraction Properties, in: Information Theory, Statistical Decision Functions, Random Processes, Transactions of the Seventh Prague Conference and the European Meeting of Statisticians, D. Reidel, 1977.
Google Scholar
W. Jewell, Markov Renewal Programming, Opns. Res. 11, 1963, pp. 938–971.
Google Scholar
A.J. Kleinman, H.J. Kushner, Accelerated Procedures for the Solution of Discrete Markov Control Problems, I.E.E.E. Trans. Aut. Cont. A.C.-16 1971, pp. 147–152.
Google Scholar
G. Koehler, Generalised Markov Decision Processes, Dept. of Management, University of Florida, Feb. 1978.
Google Scholar
G. Koehler, Value Convergence in a Generalised Markov Decision Process, Dept. of Management, University of Florida, Jan. 1978.
Google Scholar
M. Krasnosel’skii, G. Vainikko, P. Zabreicko, Ya. Rutitiskii, and V. Stetsenko, Approximate Solution of Operator Equations, Wolters-Naardkoff Pub., Groningen, 1972.
Chapter Google Scholar
H. Kushner, Introduction to Stochastic Control, Holt, Rinehart and Winston, 1971.
Google Scholar
E. Lanery, Etude asymptotique des systèmes Markoviens a commande, Revue d’Informatique et Recherche 0pérationelle, 1, 1967, pp. 3–56.
Google Scholar
S. Lefschetz, Introduction to Topology, Princeton University Press, 1949.
Google Scholar
J. MacQueen, A Modified Dynamic Programming Method for Markovian Decision Problems, J.M.A.A. 14, 1966, pp. 38–43.
Google Scholar
J. MacQueen, A Test for Sub-Optimal Actions in Markovian Decision Problems, Opns. Res. 15, 1967, pp. 559–561.
Google Scholar
A. Manne, Linear Programming and Sequential Decisions, Man. Sci. 6, 1960, pp. 259–267.
Article Google Scholar
B. Miller, A. Veinott, Dynamic Programming with a Small Interest Rate, Ann. Math. Stat. 40, 1969, pp. 366–370.
Article Google Scholar
H. Mine, S. Osaki, Linear Programming Algorithms for Semi-Markovian Decision Processes, J. Math. Ann. App. 22, 1968, pp. 356–381.
Article Google Scholar
H. Mine, Y. Tabata, On the Direct Sums of Markovian Decision Processes, J.M.A.A., 28, 1969, pp. 284–293.
Google Scholar
H. Mine, S. Osaki, Some Remarks on a Markov Decision Process with an Absorbing State, J. M.A.A., 23, 1968, pp. 327–334.
Google Scholar
H. Mine, S. Osaki, Linear Programming Considerations on Markovian Decision Processes with No Discounting, J. Math. Anal. Appl. 26, 1969, pp. 221–230.
Article Google Scholar
H. Mine, S. Osaki, Markovian Decision Processes, Elsevier, 1970.
Google Scholar
T.E. Morton, Using Strong Convergence to Accelerate Value Iteration, Graduate School of Industrial Administration, Carnegie-Mellon University, 1976.
Google Scholar
T.E. Morton, Undiscounted Markov Renewal Programming via Modified Successive Approximations, Opns. Res. Vol. 19, pp. 1081–1089, 1971.
Article Google Scholar
J.M. Norman, D.J. White, A Method for Approximate Solutions to Stochastic Dynamic Programming Problems Using Expectations, Opns. Res. 16, 1968, pp. 296–306.
Google Scholar
A.R. Odoni, On Finding The Maximal Gain For Markov Decision Processes, Opns. Res. 17, 1969, pp. 857–860.
Google Scholar
E. Porteus, Some Bounds for Discounted Sequential Decision Processes, Man. Sci. 18, 1971, pp. 7–11.
Google Scholar
E. Porteus, Bounds and Transformations for Discounted Finite Markov Decision Chains, Opns. Res. 23, 1975, pp. 761–765.
Google Scholar
E.L. Porteus, J.C. Totten, Accelerated Computations of the Expected Discounted Return in a Markov Chain, Opns. Res. 26, 1978, pp. 350–357.
Google Scholar
E.L. Porteus, Overview of Iterative Methods for Discounted Markov Decision Chains, International Conference on Markov Decision Processes, Manchester, July 1978.
Google Scholar
D. Reetz, Approximate Solutions of Discounted Markovian Decision Processes, Institut für Gesellschafts-and Wirtschaftswissenschaften, Wirtschaftstheoretische Abteilung, University of Bonn, 1971.
Google Scholar
U. Rieder, Estimates for Dynamic Programs with Lower and Upper Bounding Functions, Institut für Mathematische Statistik, University of Karlsruhe, Germany, 1977.
Google Scholar
I.V. Romananskii, On The Solvability of Bellman’s Functional Equations for a Markovian Decision Process, J.M.A.A. 42, 1973, pp. 485–498.
Google Scholar
S.M. Ross, Non-Discounted Denumerable Markovian Decision Models, Ann. Math. Stat. 39, pp. 412–423, 1968.
Article Google Scholar
U.G. Rothblum, Normalised Markov Decision Chains, Opns. Res. 23, 1975, pp. 785–795.
Article Google Scholar
H.E. Scarf, The Approximation of Fixed Points of a Continuous Mapping, SIAM J. Appl. Maths. 15, 1967, pp. 1328–1343.
Google Scholar
P.J. Schweitzer, Iterative Solution of the Functional Equations of Un-discounted Markov Renewal Programming, J.M.A.A. 34, 1971, pp. 495–501.
Google Scholar
P.J. Schweitzer, Multiple Policy Improvements in Undiscounted Markov Renewal Programming, Opns. Res. 19, 1971, pp. 784–793.
Google Scholar
P.J. Schweitzer, Perturbation Theory and Undiscounted Markov Renewal Programming, Opns. Res. 17, 1969, pp. 716–727.
Article Google Scholar
P.J. Schweitzer, Annotated Bibliograpiy on Markov Decision Processes, IBM Watson Research Center, P.O. Box 218, Yorktown Heights, New York.
Google Scholar
P.J. Schweitzer, An Overview of Undiscounted Markovian Decision Processes, International Conference on Markov Decision Processes, Manchester, July, 1978.
Google Scholar
J.F. Shapiro, Brouwer’s Fixed Point Theorem and Finite State Space Markovian Decision Theory, J. Math. Ann. Appl., 49, 1975, pp. 710–712.
Article Google Scholar
J.F. Shapiro, Turnpike Planning Horizons for a Markovian Decision Model, Man. Sci. 14, 1968, pp. 292–300.
Google Scholar
J.C. Totten, Computational Methods for Finite State Finite Valued Markovian Decision Problems, Report ORC 71–9, Operations Research Center, University of California, Berkeley, 1971.
Google Scholar
J.A.E.E. Van Nunen, A Set of Successive Approximation Methods for Discounted Markovian Decision Problems, Zeitschr. f. Ops. Res. 20, 1976, pp. 203–209.
Google Scholar
J.A.E.E. Van Nunen, Improved Successive Approximation Methods for Discounted Markov Decision Processes, in Colloquia Mathematica Societatis Janos Bolyai 12, pp. 667–682, ( A. Prekopa Ed. ), North-Holland, 1976.
Google Scholar
J.A.E.E. Van Nunen, J. Wessels, A Principle for Generating Optimisation Procedures for Discounted Markov Decision Processes, in: Colloquia Mathematica Societatis Janos Bolyai 12, pp. 683–695, ( A. Prekopa Ed. ), North-Holland, 1976.
Google Scholar
H. Wagner, Principles of Operations Research, Prentice Hall, 1969.
Google Scholar
D.J. White, Dynamic Programming, Oliver and Boyd, 1969.
Google Scholar
D.J. White, Elimination of Non-Optimal Actions in Markov Decision Processes, Notes in Decision Theory, No. 31, Department of Decision Theory, Manchester University, April 1977.
Google Scholar
D.J. White, Finite Dynamic Programming, Wiley - forthcoming.
Google Scholar
D.J. White, Dynamic Programming, Markov Chains and the Method of Successive Approximations, J.M.A.A. 6, 1963, pp. 373–376.
Google Scholar
D.J. White, Dynamic Programming and Systems of Uncertain Duration, J.M.A.A 29, 1970, pp. 419–423.
Google Scholar
D.J. White, Dynamic Programming and Systems of Uncertain Duration, Man. Sci. 12, 1965, pp. 37–67.
Article Google Scholar
D.J. White, Approximating Bounds and Policies in Markov Decision Pro-cesses, Notes in Decision Theory, No. 54, Department of Decision Theory, Manchester University, June 1978.
Google Scholar
D. Yamada, Duality Theorem in Markov Decision Problems, J.M.A.A. 50, 1975, pp. 579–595.
Google Scholar

Additional References

L.C. Thomas, Connectedness Conditions Used in Finite State Markov Decision Processes, to appear in J.M.A.A.
Google Scholar
L.C. Thomas, Connectedness Conditions Used in Denumerable State Markov Decision Processes, International Conference on Markov Decision Processes, Manchester, July, 1978.
Google Scholar
M.L. Puterman,-S. Brumelli, On the Convergence of Policy Iteration in Stationary Dynamic Programming, Working Paper No. 392, September 1977, Faculty of Commerce, University of British Columbia.
Google Scholar
M.L. Puterman, M.C. Shiu, Modified Policy Iteration Algorithms for Discounted Markov Decision Problems, Working Paper No. 481, November 1977, Faculty of Commerce, University of British Columbia.
Google Scholar
H.C. Tijms, An Overview of Non-Finite State Semi-Markov Decision Problems with the Average Cost Criterion, International Conference on Markov Decision Processes, Manchester, July, 1978.
Google Scholar
H.W. Kuhn, A Simplicial Approximation of Fixed Points, Proc. Nat. Acad. Sci., U.S.A., 61, 1968, 1238–1242.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Manchester, UK
D. J. White

Authors

D. J. White
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

K.-W. Gaede D. B. Pressmar Ch. Schneeweiß K.-P. Schuster O. Seifert

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

White, D.J. (1979). A survey of algorithms for some restricted classes of Markov decision problems. In: Gaede, KW., Pressmar, D.B., Schneeweiß, C., Schuster, KP., Seifert, O. (eds) Papers of the 8th DGOR Annual Meeting / Vorträge der 8. DGOR Jahrestagung. Proceedings in Operations Research 8, vol 1978. Physica, Heidelberg. https://doi.org/10.1007/978-3-642-99749-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-99749-5_13
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-7908-0212-2
Online ISBN: 978-3-642-99749-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics