Direct Neural Dynamic Programming

Yang, Lei; Enns, Russell; Wang, Yu-Tsung; Si, Jennie

doi:10.1007/978-1-4612-0037-6_10

Lei Yang,
Russell Enns,
Yu-Tsung Wang &
…
Jennie Si

Part of the book series: Control Engineering ((CONTRENGIN))

483 Accesses
7 Citations

Abstract

This chapter is about approximate dynamic programming (ADP), which has been referred to by many different names, such as “reinforcement learning,” “adaptive critics,” “neuro-dynamic programming,” and “adaptive dynamic programming.” The fundamental issue under consideration is optimization over time by using learning and approximation to handle problems that severely challenge conventional methods due to their very large scale and/or lack of sufficient prior knowledge. In this chapter we discuss the relationships, results, and challenges of various approaches under the theme of ADP. We also introduce the fundamental principles of our direct neural dynamic programming (NDP). We demonstrate its application for a continuous state control problem using an industrial scale Apache helicopter model. This is probably one of the first studies where an ADP type of algorithm has been applied to a complex, realistic, continuous state problem, which is a major challenge in machine learning when dealing with scalability or generalization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. Artificial Intelligence Research vol. 4, pp. 237–285, 1996.
Google Scholar
R. Bellman Dynamic Programming Princeton University Press Princeton, NJ, 1957
MATH Google Scholar
D. P. Bertsekas Dynamic Programming: Deterministic and Stochastic Models Prentice-Hall, Englewood Cliffs, NJ, 1987.
Google Scholar
D. P. Bertsekas and D. A. Castanon, `Adaptive aggregation for infinite horizon dynamic programming,“ IEEE Trans. Automatic Control vol. 34, no. 6, pp. 589–598, 1989.
Article MathSciNet MATH Google Scholar
M. L. Littman, T. L. Dean, and L. P. Kaelbling, “On the complexity of solving Markov decision problems,” Proc. Eleventh Annual Conference on Uncertainty in Artificial Intelligence pp. 394–402, Montreal, Quebec, Canada, 1995.
Google Scholar
M. L. Puterman Markov Decision Processes—Discrete Stochastic Dynamic Programming John Wiley, New York, 1994.
MATH Google Scholar
R. S. Sutton“Learning to predict by the methods of temporal difference,” Machine Learning vol. 3, pp. 9–44, 1988.
Google Scholar
A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuron like adaptive elements that can solve difficult learning control problems,” IEEE Trans. Systems Man and Cybernetics vol. 13, pp. 834–847, 1983.
Article Google Scholar
G. Tesauro“Practical issues in temporal difference learning,” Machine Learning vol. 8, pp. 257–277, 1992.
MATH Google Scholar
G. Tesauro“TD-Gammon, a self-teaching backgammon program, achieves master-level play,” Neural Computation vol. 6, no. 2, pp. 215–219,1994.
Article Google Scholar
G. Tesauro“Temporal difference learning and TD-Gammon,” Commun. ACM vol. 38, no. 3, pp. 58–67,1995.
Article Google Scholar
C. J. Watkins Learning from Delayed Rewards Ph.D. Thesis, King’s College, Cambridge, UK, 1989.
Google Scholar
J. Si and Y. Wang, “Online learning control by association and reinforcement,” IEEE Trans. Neural Networks vol. 12, no. 2, pp. 349–360,2000.
Google Scholar
R. Enns and J. Si, “Helicopter trimming and tracking control using direct neural dynamic programming,” submitted to IEEE Trans. Neural Networks.
Google Scholar
P. Werbos, “Advanced forecasting methods for global crisis warning and models of intelligence,” General System Yearbook vol. 22, pp. 25–38, 1977.
Google Scholar
P. Werbos, “A menu of design for reinforcement learning over time,” in Neural Networks for Control, W.T. Miller III, R. S. Sutton, and P.J.Werbos, Eds., Chapter 3, MIT Press, Cambridge, MA, 1990.
Google Scholar
P. Werbos, “Neuro-control and supervised learning: An overview and valuation,” in Handbook of Intelligent Control D. White and D. Sofge, Eds., Chapter 3, Van Nostrand, New York, 1992.
Google Scholar
P. Werbos“Approximate dynamic programming for real-time control and neural modeling,” in Handbook of Intelligent Control D. White and D. Sofge, Eds., Chapter 13, Van Nostrand. New York, 1992.
Google Scholar

Download references

Authors

Lei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Russell Enns
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Tsung Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jennie Si
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, Illinois, 60607, USA
Derong Liu
Department of Electrical Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
Panos J. Antsaklis

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yang, L., Enns, R., Wang, YT., Si, J. (2003). Direct Neural Dynamic Programming. In: Liu, D., Antsaklis, P.J. (eds) Stability and Control of Dynamical Systems with Applications. Control Engineering. Birkhäuser, Boston, MA. https://doi.org/10.1007/978-1-4612-0037-6_10

Download citation

DOI: https://doi.org/10.1007/978-1-4612-0037-6_10
Publisher Name: Birkhäuser, Boston, MA
Print ISBN: 978-1-4612-6583-2
Online ISBN: 978-1-4612-0037-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics