Neural Network-Based Adaptive Optimal Controller – A Continuous-Time Formulation

  • Draguna Vrabie
  • Frank Lewis
  • Daniel Levine
Part of the Communications in Computer and Information Science book series (CCIS, volume 15)


We present a new online adaptive control scheme, for partially unknown nonlinear systems, which converges to the optimal state-feedback control solution for affine in the input nonlinear systems. The main features of the algorithm map on the characteristics of the rewards-based decision making process in the mammal brain.

The derivation of the optimal adaptive control algorithm is presented in a continuous-time framework. The optimal control solution will be obtained in a direct fashion, without system identification. The algorithm is an online approach to policy iterations based on an adaptive critic structure to find an approximate solution to the state feedback, infinite-horizon, optimal control problem.


Direct Adaptive Optimal Control Reinforcement Learning Policy Iteration Adaptive Critics Continuous-Time Nonlinear Systems Neural Networks 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abu-Khalaf, M., Lewis, F.L.: Nearly Optimal Control Laws for Nonlinear Systems with Saturating Actuators Using a Neural Network HJB Approach. Automatica 41(5), 779–791 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Beard, R., Saridis, G., Wen, J.: Galerkin Approximations of the Generalized Hamilton-Jacobi-Bellman Equation. Automatica 33(12), 2159–2177 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Beard, R., Saridis, G., Wen, J.: Approximate Solutions to the Time-Invariant Hamilton-Jacobi-Bellman Equation. Journal of Optimization Theory and Application 96(3), 589–626 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, MA (1996)zbMATHGoogle Scholar
  5. 5.
    Bradtke, S.J., Ydestie, B.E., Barto, A.G.: Adaptive Linear Quadratic Control Using Policy Iteration. In: Proc. of ACC, pp. 3475–3476, Baltimore (June 1994)Google Scholar
  6. 6.
    Brown, J., Bullock, D., Grossberg, S.: How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues. J. Neuroscience 19, 10502–10511 (1999)Google Scholar
  7. 7.
    Doya, K.: Reinforcement Learning In Continuous Time and Space. Neural Computation 12(1), 219–245 (2000)CrossRefGoogle Scholar
  8. 8.
    Feldbaum, A.A.: Dual control theory I-II, Autom. Remote Control 21, 874–880, 1033–1039 (1960)MathSciNetGoogle Scholar
  9. 9.
    Filatov, N.M., Unbehauen, H.: Survey of adaptive dual control methods. IEE Proc. Control Theory and Applications 147(1), 118–128 (2000)CrossRefGoogle Scholar
  10. 10.
    Hanselmann, T., Noakes, L., Zaknich, A.: Continuous-Time Adaptive Critics. IEEE Trans. on Neural Networks 18(3), 631–647 (2007)CrossRefGoogle Scholar
  11. 11.
    Hewer, G.: An Iterative Technique for the Computation of the Steady State Gains for the Discrete Optimal Regulator. IEEE Trans. on Automatic Control 16, 382–384 (1971)CrossRefGoogle Scholar
  12. 12.
    Hornik, K., Stinchcombe, M., White, H.: Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Networks 3, 551–560 (1990)CrossRefGoogle Scholar
  13. 13.
    Howard, R.A.: Dynamic Programming and Markov Processes. MIT Press, Cambridge (1960)zbMATHGoogle Scholar
  14. 14.
    Kleinman, D.: On an Iterative Technique for Riccati Equation Computations. IEEE Trans. on Automatic Control 13, 114–115 (1968)CrossRefGoogle Scholar
  15. 15.
    Levine, D.S., Brown, V.R., Shirey, V.T. (eds.): Oscillations in Neural Systems. Lawrence Erlbaum Associates, Mahwah (2000)Google Scholar
  16. 16.
    Lewis, F., Syrmos, V.: Optimal Control. Wiley, New York (1995)Google Scholar
  17. 17.
    Li, Z.H., Krstic, M.: Optimal design of adaptive tracking controllers for nonlinear systems. In: Proc. of ACC, pp. 1191–1197 (1997) Google Scholar
  18. 18.
    Miller, W.T., Sutton, R., Werbos, P.: Neural networks for control. MIT Press, Cambridge (1990)Google Scholar
  19. 19.
    Murray, J.J., Cox, C.J., Lendaris, G.G., Saeks, R.: Adaptive Dynamic Programming. IEEE Trans. on Systems, Man and Cybernetics 32(2), 140–153 (2002)CrossRefGoogle Scholar
  20. 20.
    Prokhorov, D., Wunsch, D.: Adaptive critic designs. IEEE Trans. on Neural Networks 8(5), 997–1007 (1997)CrossRefGoogle Scholar
  21. 21.
    Saridis, G., Lee, C.S.: An Approximation Theory of Optimal Control for Trainable Manipulators. IEEE Trans. on Systems, Man and Cybernetics 9(3), 152–159 (1979)zbMATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Schultz, W., Dayan, P., Read Montague, P.: A Neural Substrate of Prediction and Reward. Science 275, 1593–1599 (1997)CrossRefGoogle Scholar
  23. 23.
    Schultz, W.: Neural coding of basic reward terms of animal learning theory, game theory, microeconomics and behavioral ecology. Current Opinion in Neurobiology 14, 139–147 (2004)CrossRefGoogle Scholar
  24. 24.
    Slotine, J.J., Li, W.: Applied Nonlinear Control. Prentice-Hall, Englewood Cliffs (1991)zbMATHGoogle Scholar
  25. 25.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning – An introduction. MIT Press, Cambridge (1998)Google Scholar
  26. 26.
    Vrabie, D., Pastravanu, O., Lewis, F.L.: Policy Iteration for Continuous-time Systems with Unknown Internal Dynamics. In: Proc. of MED (2007)Google Scholar
  27. 27.
    Watkins, C.J.C.H.: Learning from delayed rewards. PhD Thesis, University of Cambridge, England (1989)Google Scholar
  28. 28.
    Werbos P.: Neural networks for control and system identification. In: IEEE Proc. CDC 1989 (1989) Google Scholar
  29. 29.
    Werbos, P.: Approximate dynamic programming for real-time control and neural modeling. In: White, D.A., Sofge, D.A. (eds.) Handbook of Intelligent Control, Van Nostrand Reinhold, New York (1992)Google Scholar
  30. 30.
    Wittenmark, B.: Adaptive dual control methods: An overview. In: 5th IFAC Symp. on Adaptive Systems in Control and Signal Processing, pp. 67–73 (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Draguna Vrabie
    • 1
    • 2
  • Frank Lewis
    • 1
    • 2
  • Daniel Levine
    • 1
    • 2
  1. 1.Automation and Robotics Research InstituteUniversity of Texas at ArlingtonUSA
  2. 2.Department of PsychologyUniversity of Texas at ArlingtonArlingtonUSA

Personalised recommendations