Abstract
This chapter is about approximate dynamic programming (ADP), which has been referred to by many different names, such as “reinforcement learning,” “adaptive critics,” “neuro-dynamic programming,” and “adaptive dynamic programming.” The fundamental issue under consideration is optimization over time by using learning and approximation to handle problems that severely challenge conventional methods due to their very large scale and/or lack of sufficient prior knowledge. In this chapter we discuss the relationships, results, and challenges of various approaches under the theme of ADP. We also introduce the fundamental principles of our direct neural dynamic programming (NDP). We demonstrate its application for a continuous state control problem using an industrial scale Apache helicopter model. This is probably one of the first studies where an ADP type of algorithm has been applied to a complex, realistic, continuous state problem, which is a major challenge in machine learning when dealing with scalability or generalization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. Artificial Intelligence Research vol. 4, pp. 237–285, 1996.
R. Bellman Dynamic Programming Princeton University Press Princeton, NJ, 1957
D. P. Bertsekas Dynamic Programming: Deterministic and Stochastic Models Prentice-Hall, Englewood Cliffs, NJ, 1987.
D. P. Bertsekas and D. A. Castanon, `Adaptive aggregation for infinite horizon dynamic programming,“ IEEE Trans. Automatic Control vol. 34, no. 6, pp. 589–598, 1989.
M. L. Littman, T. L. Dean, and L. P. Kaelbling, “On the complexity of solving Markov decision problems,” Proc. Eleventh Annual Conference on Uncertainty in Artificial Intelligence pp. 394–402, Montreal, Quebec, Canada, 1995.
M. L. Puterman Markov Decision Processes—Discrete Stochastic Dynamic Programming John Wiley, New York, 1994.
R. S. Sutton“Learning to predict by the methods of temporal difference,” Machine Learning vol. 3, pp. 9–44, 1988.
A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuron like adaptive elements that can solve difficult learning control problems,” IEEE Trans. Systems Man and Cybernetics vol. 13, pp. 834–847, 1983.
G. Tesauro“Practical issues in temporal difference learning,” Machine Learning vol. 8, pp. 257–277, 1992.
G. Tesauro“TD-Gammon, a self-teaching backgammon program, achieves master-level play,” Neural Computation vol. 6, no. 2, pp. 215–219,1994.
G. Tesauro“Temporal difference learning and TD-Gammon,” Commun. ACM vol. 38, no. 3, pp. 58–67,1995.
C. J. Watkins Learning from Delayed Rewards Ph.D. Thesis, King’s College, Cambridge, UK, 1989.
J. Si and Y. Wang, “Online learning control by association and reinforcement,” IEEE Trans. Neural Networks vol. 12, no. 2, pp. 349–360,2000.
R. Enns and J. Si, “Helicopter trimming and tracking control using direct neural dynamic programming,” submitted to IEEE Trans. Neural Networks.
P. Werbos, “Advanced forecasting methods for global crisis warning and models of intelligence,” General System Yearbook vol. 22, pp. 25–38, 1977.
P. Werbos, “A menu of design for reinforcement learning over time,” in Neural Networks for Control, W.T. Miller III, R. S. Sutton, and P.J.Werbos, Eds., Chapter 3, MIT Press, Cambridge, MA, 1990.
P. Werbos, “Neuro-control and supervised learning: An overview and valuation,” in Handbook of Intelligent Control D. White and D. Sofge, Eds., Chapter 3, Van Nostrand, New York, 1992.
P. Werbos“Approximate dynamic programming for real-time control and neural modeling,” in Handbook of Intelligent Control D. White and D. Sofge, Eds., Chapter 13, Van Nostrand. New York, 1992.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer Science+Business Media New York
About this chapter
Cite this chapter
Yang, L., Enns, R., Wang, YT., Si, J. (2003). Direct Neural Dynamic Programming. In: Liu, D., Antsaklis, P.J. (eds) Stability and Control of Dynamical Systems with Applications. Control Engineering. Birkhäuser, Boston, MA. https://doi.org/10.1007/978-1-4612-0037-6_10
Download citation
DOI: https://doi.org/10.1007/978-1-4612-0037-6_10
Publisher Name: Birkhäuser, Boston, MA
Print ISBN: 978-1-4612-6583-2
Online ISBN: 978-1-4612-0037-6
eBook Packages: Springer Book Archive