Skip to main content

Abstract

In applying the method of value iteration one frequently observes that the relative values for the n-th stage converge very rapidly with increasing n, whereas the absolute values converge slowly (discounting factor β near one) or even diverge (β ≧ 1). This fact is used by MacQueen [10], [11] and others to give good bounds for the value of the infinite horizon problem and, in addition, for the elimination of suboptimal actions in the early stages. This elimination can be improved by the use of an upper bound S for the convergence rate. In case of δ < β the improvement has two effects: it reduces computing time and it allows the application to the finite horizon case with some β ≧ 1.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Y. M. I. Dirickx: Deterministic discrete dynamic programming with discount factor greater than one: structure of optimal policies. Man. Sci. 20 (1973), 32–43.

    Article  MathSciNet  MATH  Google Scholar 

  2. R. C. Grinold: Elimination of suboptimal actions in Markov decision problems. Operations Res. 21 (1973), 848–851.

    Article  MathSciNet  MATH  Google Scholar 

  3. J. Hajnal: Weak ergodicity in non-homogeneous Markov chains. Proc. Cambr. Phil. Soc. 54 (1958), 233–246.

    Article  MathSciNet  MATH  Google Scholar 

  4. N. A. J. Hastings, J. C. M. Mello: Tests for suboptimal actions in discounted Markov programming. Man. Sci. 19 (1973), 1019–1022.

    Article  MathSciNet  MATH  Google Scholar 

  5. K. Hinderer: Estimates for finite state dynamic programs. J. Math. Anal. Appl. 55 (1976), 207–238.

    Article  MathSciNet  MATH  Google Scholar 

  6. K. Hinderer, G. Hübner: An improvement of J. F. Shapiro’s turnpike theorem for the horizon of finite stage discrete dynamic programs. This volume 245–255.

    Google Scholar 

  7. R. A. Howard: Dynamic programming and Markov processes. Wiley, New York 1960.

    MATH  Google Scholar 

  8. W. S. Jewell: Markov renewal programming I + II. Operations Res. 11 (1963), 938–971.

    Article  MathSciNet  MATH  Google Scholar 

  9. J. G. Kemeny, J. L. Snell: Finite Markov chains. Van Nostrand, Princeton, N. J. 1960.

    MATH  Google Scholar 

  10. J. MacQueen: A modified dynamic programming method for Markovian decision problems. J. Math. Anal. Appl. 14 (1966), 38–43.

    Article  MathSciNet  MATH  Google Scholar 

  11. J. MacQueen: A test for suboptimal actions in Markovian decision problems. Operations Res. 15 (1967), 559–561.

    Article  MATH  Google Scholar 

  12. Th. E. Morton: On the asymptotic convergence rate of cost differences for Markovian decision processes. Operations Res. 19 (1971), 244–248.

    Article  MathSciNet  MATH  Google Scholar 

  13. J. L. Mott: Conditions for the ergodicity of non-homogeneous finite Markov chains. Proc. Roy. Soc. Edinburgh Sec. A 64 (1951), 369–380.

    MathSciNet  Google Scholar 

  14. E. L. Porteus: Some bounds for discounted sequential decision processes. Man. Sci. 18 (1971), 7–11.

    Article  MathSciNet  MATH  Google Scholar 

  15. D. Reetz: A decision exclusion algorithm for a class of Markovian decision processes. Zeitschr. Operations Res. 20 (1976), 125–131.

    MathSciNet  MATH  Google Scholar 

  16. T. A. Sarymsakov: On inhomogeneous Markov chains. Doklady A.N.S.S.S.R. 120 (1958), 465–467.

    MathSciNet  MATH  Google Scholar 

  17. H. Schellhaas: Zur Extrapolation in Markoffschen Entscheidungsmodellen mit Diskontierung. Zeitschr. Operations Res. 18 (1974), 91–104.

    MathSciNet  MATH  Google Scholar 

  18. E. Seneta: Non-negative matrices. Allen & Unwin, London 1973.

    MATH  Google Scholar 

  19. J. F. Shapiro: Turnpike planning horizons for a Markovian decision model. Man. Sci. 14 (1968), 292–300.

    Article  MATH  Google Scholar 

  20. D. J. White: Dynamic programming, Markov chains and the method of successive approximations. J. Math. Anal. Appl. 6 (1963), 373–376.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

J. Kožešnik

Rights and permissions

Reprints and permissions

Copyright information

© 1977 ACADEMIA, Publishing House of the Czechoslovak Academy of Sciences, Prague

About this chapter

Cite this chapter

Hübner, G. (1977). Improved Procedures for Eliminating Suboptimal Actions in Markov Programming by the Use of Contraction Properties. In: Kožešnik, J. (eds) Transactions of the Seventh Prague Conference on Information Theory, Statistical Decision Functions, Random Processes and of the 1974 European Meeting of Statisticians. Transactions of the Seventh Prague Conference on Information Theory, Statistical Decision Functions, Random Processes and of the 1974 European Meeting of Statisticians, vol 7A. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-9910-3_27

Download citation

  • DOI: https://doi.org/10.1007/978-94-010-9910-3_27

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-9912-7

  • Online ISBN: 978-94-010-9910-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics