Direct Neural Dynamic Programming

  • Lei Yang
  • Russell Enns
  • Yu-Tsung Wang
  • Jennie Si
Part of the Control Engineering book series (CONTRENGIN)


This chapter is about approximate dynamic programming (ADP), which has been referred to by many different names, such as “reinforcement learning,” “adaptive critics,” “neuro-dynamic programming,” and “adaptive dynamic programming.” The fundamental issue under consideration is optimization over time by using learning and approximation to handle problems that severely challenge conventional methods due to their very large scale and/or lack of sufficient prior knowledge. In this chapter we discuss the relationships, results, and challenges of various approaches under the theme of ADP. We also introduce the fundamental principles of our direct neural dynamic programming (NDP). We demonstrate its application for a continuous state control problem using an industrial scale Apache helicopter model. This is probably one of the first studies where an ADP type of algorithm has been applied to a complex, realistic, continuous state problem, which is a major challenge in machine learning when dealing with scalability or generalization.


Artificial Neural Network Optimal Policy Reinforcement Learning Action Network Main Rotor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. Artificial Intelligence Research vol. 4, pp. 237–285, 1996.Google Scholar
  2. [2]
    R. Bellman Dynamic Programming Princeton University Press Princeton, NJ, 1957MATHGoogle Scholar
  3. [3]
    D. P. Bertsekas Dynamic Programming: Deterministic and Stochastic Models Prentice-Hall, Englewood Cliffs, NJ, 1987.Google Scholar
  4. [4]
    D. P. Bertsekas and D. A. Castanon, `Adaptive aggregation for infinite horizon dynamic programming,“ IEEE Trans. Automatic Control vol. 34, no. 6, pp. 589–598, 1989.MathSciNetMATHCrossRefGoogle Scholar
  5. [5]
    M. L. Littman, T. L. Dean, and L. P. Kaelbling, “On the complexity of solving Markov decision problems,” Proc. Eleventh Annual Conference on Uncertainty in Artificial Intelligence pp. 394–402, Montreal, Quebec, Canada, 1995.Google Scholar
  6. [6]
    M. L. Puterman Markov Decision Processes—Discrete Stochastic Dynamic Programming John Wiley, New York, 1994.MATHGoogle Scholar
  7. [7]
    R. S. Sutton“Learning to predict by the methods of temporal difference,” Machine Learning vol. 3, pp. 9–44, 1988.Google Scholar
  8. [8]
    A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuron like adaptive elements that can solve difficult learning control problems,” IEEE Trans. Systems Man and Cybernetics vol. 13, pp. 834–847, 1983.CrossRefGoogle Scholar
  9. [9]
    G. Tesauro“Practical issues in temporal difference learning,” Machine Learning vol. 8, pp. 257–277, 1992.MATHGoogle Scholar
  10. [10]
    G. Tesauro“TD-Gammon, a self-teaching backgammon program, achieves master-level play,” Neural Computation vol. 6, no. 2, pp. 215–219,1994.CrossRefGoogle Scholar
  11. [11]
    G. Tesauro“Temporal difference learning and TD-Gammon,” Commun. ACM vol. 38, no. 3, pp. 58–67,1995.CrossRefGoogle Scholar
  12. [12]
    C. J. Watkins Learning from Delayed Rewards Ph.D. Thesis, King’s College, Cambridge, UK, 1989.Google Scholar
  13. [13]
    J. Si and Y. Wang, “Online learning control by association and reinforcement,” IEEE Trans. Neural Networks vol. 12, no. 2, pp. 349–360,2000.Google Scholar
  14. [14]
    R. Enns and J. Si, “Helicopter trimming and tracking control using direct neural dynamic programming,” submitted to IEEE Trans. Neural Networks. Google Scholar
  15. [15]
    P. Werbos, “Advanced forecasting methods for global crisis warning and models of intelligence,” General System Yearbook vol. 22, pp. 25–38, 1977.Google Scholar
  16. [16]
    P. Werbos, “A menu of design for reinforcement learning over time,” in Neural Networks for Control, W.T. Miller III, R. S. Sutton, and P.J.Werbos, Eds., Chapter 3, MIT Press, Cambridge, MA, 1990.Google Scholar
  17. [17]
    P. Werbos, “Neuro-control and supervised learning: An overview and valuation,” in Handbook of Intelligent Control D. White and D. Sofge, Eds., Chapter 3, Van Nostrand, New York, 1992.Google Scholar
  18. [18]
    P. Werbos“Approximate dynamic programming for real-time control and neural modeling,” in Handbook of Intelligent Control D. White and D. Sofge, Eds., Chapter 13, Van Nostrand. New York, 1992.Google Scholar

Copyright information

© Springer Science+Business Media New York 2003

Authors and Affiliations

  • Lei Yang
  • Russell Enns
  • Yu-Tsung Wang
  • Jennie Si

There are no affiliations available

Personalised recommendations