Improved Procedures for Eliminating Suboptimal Actions in Markov Programming by the Use of Contraction Properties

Hübner, Gerhard

doi:10.1007/978-94-010-9910-3_27

Gerhard Hübner¹

Part of the book series: Transactions of the Seventh Prague Conference on Information Theory, Statistical Decision Functions, Random Processes and of the 1974 European Meeting of Statisticians ((TPCI,volume 7A))

308 Accesses
9 Citations

Abstract

In applying the method of value iteration one frequently observes that the relative values for the n-th stage converge very rapidly with increasing n, whereas the absolute values converge slowly (discounting factor β near one) or even diverge (β ≧ 1). This fact is used by MacQueen [10], [11] and others to give good bounds for the value of the infinite horizon problem and, in addition, for the elimination of suboptimal actions in the early stages. This elimination can be improved by the use of an upper bound S for the convergence rate. In case of δ < β the improvement has two effects: it reduces computing time and it allows the application to the finite horizon case with some β ≧ 1.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Y. M. I. Dirickx: Deterministic discrete dynamic programming with discount factor greater than one: structure of optimal policies. Man. Sci. 20 (1973), 32–43.
Article MathSciNet MATH Google Scholar
R. C. Grinold: Elimination of suboptimal actions in Markov decision problems. Operations Res. 21 (1973), 848–851.
Article MathSciNet MATH Google Scholar
J. Hajnal: Weak ergodicity in non-homogeneous Markov chains. Proc. Cambr. Phil. Soc. 54 (1958), 233–246.
Article MathSciNet MATH Google Scholar
N. A. J. Hastings, J. C. M. Mello: Tests for suboptimal actions in discounted Markov programming. Man. Sci. 19 (1973), 1019–1022.
Article MathSciNet MATH Google Scholar
K. Hinderer: Estimates for finite state dynamic programs. J. Math. Anal. Appl. 55 (1976), 207–238.
Article MathSciNet MATH Google Scholar
K. Hinderer, G. Hübner: An improvement of J. F. Shapiro’s turnpike theorem for the horizon of finite stage discrete dynamic programs. This volume 245–255.
Google Scholar
R. A. Howard: Dynamic programming and Markov processes. Wiley, New York 1960.
MATH Google Scholar
W. S. Jewell: Markov renewal programming I + II. Operations Res. 11 (1963), 938–971.
Article MathSciNet MATH Google Scholar
J. G. Kemeny, J. L. Snell: Finite Markov chains. Van Nostrand, Princeton, N. J. 1960.
MATH Google Scholar
J. MacQueen: A modified dynamic programming method for Markovian decision problems. J. Math. Anal. Appl. 14 (1966), 38–43.
Article MathSciNet MATH Google Scholar
J. MacQueen: A test for suboptimal actions in Markovian decision problems. Operations Res. 15 (1967), 559–561.
Article MATH Google Scholar
Th. E. Morton: On the asymptotic convergence rate of cost differences for Markovian decision processes. Operations Res. 19 (1971), 244–248.
Article MathSciNet MATH Google Scholar
J. L. Mott: Conditions for the ergodicity of non-homogeneous finite Markov chains. Proc. Roy. Soc. Edinburgh Sec. A 64 (1951), 369–380.
MathSciNet Google Scholar
E. L. Porteus: Some bounds for discounted sequential decision processes. Man. Sci. 18 (1971), 7–11.
Article MathSciNet MATH Google Scholar
D. Reetz: A decision exclusion algorithm for a class of Markovian decision processes. Zeitschr. Operations Res. 20 (1976), 125–131.
MathSciNet MATH Google Scholar
T. A. Sarymsakov: On inhomogeneous Markov chains. Doklady A.N.S.S.S.R. 120 (1958), 465–467.
MathSciNet MATH Google Scholar
H. Schellhaas: Zur Extrapolation in Markoffschen Entscheidungsmodellen mit Diskontierung. Zeitschr. Operations Res. 18 (1974), 91–104.
MathSciNet MATH Google Scholar
E. Seneta: Non-negative matrices. Allen & Unwin, London 1973.
MATH Google Scholar
J. F. Shapiro: Turnpike planning horizons for a Markovian decision model. Man. Sci. 14 (1968), 292–300.
Article MATH Google Scholar
D. J. White: Dynamic programming, Markov chains and the method of successive approximations. J. Math. Anal. Appl. 6 (1963), 373–376.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Mathematische Stochastik, Universität Hamburg, Hamburg, Germany
Gerhard Hübner

Authors

Gerhard Hübner
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

J. Kožešnik

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hübner, G. (1977). Improved Procedures for Eliminating Suboptimal Actions in Markov Programming by the Use of Contraction Properties. In: Kožešnik, J. (eds) Transactions of the Seventh Prague Conference on Information Theory, Statistical Decision Functions, Random Processes and of the 1974 European Meeting of Statisticians. Transactions of the Seventh Prague Conference on Information Theory, Statistical Decision Functions, Random Processes and of the 1974 European Meeting of Statisticians, vol 7A. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-9910-3_27

Download citation

DOI: https://doi.org/10.1007/978-94-010-9910-3_27
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-9912-7
Online ISBN: 978-94-010-9910-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics