Skip to main content

Part of the book series: Control Engineering ((CONTRENGIN))

Abstract

This chapter is about approximate dynamic programming (ADP), which has been referred to by many different names, such as “reinforcement learning,” “adaptive critics,” “neuro-dynamic programming,” and “adaptive dynamic programming.” The fundamental issue under consideration is optimization over time by using learning and approximation to handle problems that severely challenge conventional methods due to their very large scale and/or lack of sufficient prior knowledge. In this chapter we discuss the relationships, results, and challenges of various approaches under the theme of ADP. We also introduce the fundamental principles of our direct neural dynamic programming (NDP). We demonstrate its application for a continuous state control problem using an industrial scale Apache helicopter model. This is probably one of the first studies where an ADP type of algorithm has been applied to a complex, realistic, continuous state problem, which is a major challenge in machine learning when dealing with scalability or generalization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. Artificial Intelligence Research vol. 4, pp. 237–285, 1996.

    Google Scholar 

  2. R. Bellman Dynamic Programming Princeton University Press Princeton, NJ, 1957

    MATH  Google Scholar 

  3. D. P. Bertsekas Dynamic Programming: Deterministic and Stochastic Models Prentice-Hall, Englewood Cliffs, NJ, 1987.

    Google Scholar 

  4. D. P. Bertsekas and D. A. Castanon, `Adaptive aggregation for infinite horizon dynamic programming,“ IEEE Trans. Automatic Control vol. 34, no. 6, pp. 589–598, 1989.

    Article  MathSciNet  MATH  Google Scholar 

  5. M. L. Littman, T. L. Dean, and L. P. Kaelbling, “On the complexity of solving Markov decision problems,” Proc. Eleventh Annual Conference on Uncertainty in Artificial Intelligence pp. 394–402, Montreal, Quebec, Canada, 1995.

    Google Scholar 

  6. M. L. Puterman Markov Decision Processes—Discrete Stochastic Dynamic Programming John Wiley, New York, 1994.

    MATH  Google Scholar 

  7. R. S. Sutton“Learning to predict by the methods of temporal difference,” Machine Learning vol. 3, pp. 9–44, 1988.

    Google Scholar 

  8. A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuron like adaptive elements that can solve difficult learning control problems,” IEEE Trans. Systems Man and Cybernetics vol. 13, pp. 834–847, 1983.

    Article  Google Scholar 

  9. G. Tesauro“Practical issues in temporal difference learning,” Machine Learning vol. 8, pp. 257–277, 1992.

    MATH  Google Scholar 

  10. G. Tesauro“TD-Gammon, a self-teaching backgammon program, achieves master-level play,” Neural Computation vol. 6, no. 2, pp. 215–219,1994.

    Article  Google Scholar 

  11. G. Tesauro“Temporal difference learning and TD-Gammon,” Commun. ACM vol. 38, no. 3, pp. 58–67,1995.

    Article  Google Scholar 

  12. C. J. Watkins Learning from Delayed Rewards Ph.D. Thesis, King’s College, Cambridge, UK, 1989.

    Google Scholar 

  13. J. Si and Y. Wang, “Online learning control by association and reinforcement,” IEEE Trans. Neural Networks vol. 12, no. 2, pp. 349–360,2000.

    Google Scholar 

  14. R. Enns and J. Si, “Helicopter trimming and tracking control using direct neural dynamic programming,” submitted to IEEE Trans. Neural Networks.

    Google Scholar 

  15. P. Werbos, “Advanced forecasting methods for global crisis warning and models of intelligence,” General System Yearbook vol. 22, pp. 25–38, 1977.

    Google Scholar 

  16. P. Werbos, “A menu of design for reinforcement learning over time,” in Neural Networks for Control, W.T. Miller III, R. S. Sutton, and P.J.Werbos, Eds., Chapter 3, MIT Press, Cambridge, MA, 1990.

    Google Scholar 

  17. P. Werbos, “Neuro-control and supervised learning: An overview and valuation,” in Handbook of Intelligent Control D. White and D. Sofge, Eds., Chapter 3, Van Nostrand, New York, 1992.

    Google Scholar 

  18. P. Werbos“Approximate dynamic programming for real-time control and neural modeling,” in Handbook of Intelligent Control D. White and D. Sofge, Eds., Chapter 13, Van Nostrand. New York, 1992.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media New York

About this chapter

Cite this chapter

Yang, L., Enns, R., Wang, YT., Si, J. (2003). Direct Neural Dynamic Programming. In: Liu, D., Antsaklis, P.J. (eds) Stability and Control of Dynamical Systems with Applications. Control Engineering. Birkhäuser, Boston, MA. https://doi.org/10.1007/978-1-4612-0037-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4612-0037-6_10

  • Publisher Name: Birkhäuser, Boston, MA

  • Print ISBN: 978-1-4612-6583-2

  • Online ISBN: 978-1-4612-0037-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics