Direct Neural Dynamic Programming
This chapter is about approximate dynamic programming (ADP), which has been referred to by many different names, such as “reinforcement learning,” “adaptive critics,” “neuro-dynamic programming,” and “adaptive dynamic programming.” The fundamental issue under consideration is optimization over time by using learning and approximation to handle problems that severely challenge conventional methods due to their very large scale and/or lack of sufficient prior knowledge. In this chapter we discuss the relationships, results, and challenges of various approaches under the theme of ADP. We also introduce the fundamental principles of our direct neural dynamic programming (NDP). We demonstrate its application for a continuous state control problem using an industrial scale Apache helicopter model. This is probably one of the first studies where an ADP type of algorithm has been applied to a complex, realistic, continuous state problem, which is a major challenge in machine learning when dealing with scalability or generalization.
KeywordsArtificial Neural Network Optimal Policy Reinforcement Learning Action Network Main Rotor
Unable to display preview. Download preview PDF.
- L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. Artificial Intelligence Research vol. 4, pp. 237–285, 1996.Google Scholar
- D. P. Bertsekas Dynamic Programming: Deterministic and Stochastic Models Prentice-Hall, Englewood Cliffs, NJ, 1987.Google Scholar
- M. L. Littman, T. L. Dean, and L. P. Kaelbling, “On the complexity of solving Markov decision problems,” Proc. Eleventh Annual Conference on Uncertainty in Artificial Intelligence pp. 394–402, Montreal, Quebec, Canada, 1995.Google Scholar
- R. S. Sutton“Learning to predict by the methods of temporal difference,” Machine Learning vol. 3, pp. 9–44, 1988.Google Scholar
- C. J. Watkins Learning from Delayed Rewards Ph.D. Thesis, King’s College, Cambridge, UK, 1989.Google Scholar
- J. Si and Y. Wang, “Online learning control by association and reinforcement,” IEEE Trans. Neural Networks vol. 12, no. 2, pp. 349–360,2000.Google Scholar
- R. Enns and J. Si, “Helicopter trimming and tracking control using direct neural dynamic programming,” submitted to IEEE Trans. Neural Networks. Google Scholar
- P. Werbos, “Advanced forecasting methods for global crisis warning and models of intelligence,” General System Yearbook vol. 22, pp. 25–38, 1977.Google Scholar
- P. Werbos, “A menu of design for reinforcement learning over time,” in Neural Networks for Control, W.T. Miller III, R. S. Sutton, and P.J.Werbos, Eds., Chapter 3, MIT Press, Cambridge, MA, 1990.Google Scholar
- P. Werbos, “Neuro-control and supervised learning: An overview and valuation,” in Handbook of Intelligent Control D. White and D. Sofge, Eds., Chapter 3, Van Nostrand, New York, 1992.Google Scholar
- P. Werbos“Approximate dynamic programming for real-time control and neural modeling,” in Handbook of Intelligent Control D. White and D. Sofge, Eds., Chapter 13, Van Nostrand. New York, 1992.Google Scholar