Skip to main content

Computational Considerations

  • Chapter
  • First Online:
Book cover Reinforcement Learning for Optimal Feedback Control

Part of the book series: Communications and Control Engineering ((CCE))

  • 3980 Accesses

Abstract

Motivated by issues arising in adaptive dynamic programming for optimal control, a function approximation method is developed that aims to approximate a function in a small neighborhood of a state that travels within a compact set. The development is based on the theory of universal reproducing kernel Hilbert spaces over the n-dimensional Euclidean space. Several theorems are introduced that support the development of this State Following (StaF) method. In particular, it is shown that there is a bound on the number of kernel functions required for the maintenance of an accurate function approximation as a state moves through a compact set. Additionally, a weight update law, based on gradient descent, is introduced where good accuracy can be achieved provided the weight update law is iterated at a high enough frequency, as detailed in Theorem 7.5. Simulation results are presented that demonstrate the utility of the StaF methodology for the maintenance of accurate function approximation as well as solving an infinite horizon optimal regulation problem. The results of the simulation indicate that fewer basis functions are required to guarantee stability and approximate optimality than are required when a global approximation approach is used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For \(z \in \mathbb {C}\) the quantity Re(z) is the real part of z, and \({\overline{z}}\) represents the complex conjugate of z.

References

  1. Kirk D (2004) Optimal control theory: an introduction. Dover, Mineola

    Google Scholar 

  2. Liberzon D (2012) Calculus of variations and optimal control theory: a concise introduction. Princeton University Press, Princeton

    MATH  Google Scholar 

  3. Christmann A, Steinwart I (2010) Universal kernels on non-standard input spaces. In: Advances in neural information processing, pp 406–414

    Google Scholar 

  4. Micchelli CA, Xu Y, Zhang H (2006) Universal kernels. J Mach Learn Res 7:2651–2667

    MathSciNet  MATH  Google Scholar 

  5. Park J, Sanberg I (1991) Universal approximation using radial-basis-function networks. Neural Comput 3(2):246–257

    Article  Google Scholar 

  6. Folland GB (1999) Real analysis: modern techniques and their applications, 2nd edn. Pure and applied mathematics, Wiley, New York

    MATH  Google Scholar 

  7. Steinwart I, Christmann A (2008) Support vector machines. Information science and statistics, Springer, New York

    MATH  Google Scholar 

  8. Gaggero M, Gnecco G, Sanguineti M (2013) Dynamic programming and value-function approximation in sequential decision problems: error analysis and numerical results. J Optim Theory Appl 156

    Google Scholar 

  9. Gaggero M, Gnecco G, Sanguineti M (2014) Approximate dynamic programming for stochastic n-stage optimization with application to optimal consumption under uncertainty. Comput Optim Appl 58(1):31–85

    Article  MathSciNet  Google Scholar 

  10. Zoppoli R, Sanguineti M, Parisini T (2002) Approximating networks and extended Ritz method for the solution of functional optimization problems. J Optim Theory Appl 112(2):403–440

    Article  MathSciNet  Google Scholar 

  11. Kamalapurkar R, Walters P, Dixon WE (2013) Concurrent learning-based approximate optimal regulation. In: Proceedings of the IEEE conference on decision and control, Florence, IT, pp 6256–6261

    Google Scholar 

  12. Kamalapurkar R, Andrews L, Walters P, Dixon WE (2014) Model-based reinforcement learning for infinite-horizon approximate optimal tracking. In: Proceedings of the IEEE conference on decision and control, Los Angeles, CA, pp 5083–5088

    Google Scholar 

  13. Kamalapurkar R, Klotz J, Dixon WE (2014) Concurrent learning-based online approximate feedback Nash equilibrium solution of N-player nonzero-sum differential games. IEEE/CAA J Autom Sin 1(3):239–247

    Article  Google Scholar 

  14. Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404

    Article  MathSciNet  Google Scholar 

  15. Zhu K (2012) Analysis on fock spaces, vol 263. Graduate texts in mathematics, Springer, New York

    MATH  Google Scholar 

  16. Pinkus A (2004) Strictly positive definite functions on a real inner product space. Adv Comput Math 20:263–271

    Article  MathSciNet  Google Scholar 

  17. Rosenfeld JA, Kamalapurkar R, Dixon WE (2015) State following (StaF) kernel functions for function approximation part I: theory and motivation. In: Proceedings of the American control conference, pp 1217–1222

    Google Scholar 

  18. Beylkin G, Monzon L (2005) On approximation of functions by exponential sums. Appl Comput Harmon Anal 19(1):17–48

    Article  MathSciNet  Google Scholar 

  19. Bertsekas DP (1999) Nonlinear programming. Athena Scientific, Belmont

    MATH  Google Scholar 

  20. Pedersen GK (1989) Analysis now, vol 118. Graduate texts in mathematics, Springer, New York

    MATH  Google Scholar 

  21. Kamalapurkar R, Rosenfeld J, Dixon WE (2016) Efficient model-based reinforcement learning for approximate online optimal control. Automatica 74:247–258

    Article  MathSciNet  Google Scholar 

  22. Lorentz GG (1986) Bernstein polynomials, 2nd edn. Chelsea Publishing Co., New York

    MATH  Google Scholar 

  23. Ioannou P, Sun J (1996) Robust adaptive control. Prentice Hall, Upper Saddle River

    MATH  Google Scholar 

  24. Khalil HK (2002) Nonlinear systems, 3rd edn. Prentice Hall, Upper Saddle River

    MATH  Google Scholar 

  25. Doya K (2000) Reinforcement learning in continuous time and space. Neural Comput 12(1):219–245

    Article  Google Scholar 

  26. Padhi R, Unnikrishnan N, Wang X, Balakrishnan S (2006) A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Netw 19(10):1648–1660

    Article  Google Scholar 

  27. Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern Part B Cybern 38:943–949

    Article  Google Scholar 

  28. Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50

    Article  Google Scholar 

  29. Dierks T, Thumati B, Jagannathan S (2009) Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw 22(5–6):851–860

    Article  Google Scholar 

  30. Mehta P, Meyn S (2009) Q-learning and pontryagin’s minimum principle. In: Proceedings of the IEEE conference on decision and control, pp 3598–3605

    Google Scholar 

  31. Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888

    Article  MathSciNet  Google Scholar 

  32. Zhang H, Cui L, Zhang X, Luo Y (2011) Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans Neural Netw 22(12):2226–2236

    Article  Google Scholar 

  33. Sadegh N (1993) A perceptron network for functional identification and control of nonlinear systems. IEEE Trans Neural Netw 4(6):982–988

    Article  Google Scholar 

  34. Chowdhary G, Yucelen T, Mühlegg M, Johnson EN (2013) Concurrent learning adaptive control of linear systems with exponentially convergent bounds. Int J Adapt Control Signal Process 27(4):280–301

    Article  MathSciNet  Google Scholar 

  35. Savitzky A, Golay MJE (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36(8):1627–1639

    Article  Google Scholar 

  36. Bertsekas D, Tsitsiklis J (1996) Neuro-dynamic programming. Athena Scientific, Belmont

    MATH  Google Scholar 

  37. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    Google Scholar 

  38. Konda V, Tsitsiklis J (2004) On actor-critic algorithms. SIAM J Control Optim 42(4):1143–1166

    Article  MathSciNet  Google Scholar 

  39. Bertsekas D (2007) Dynamic programming and optimal control, vol 2, 3rd edn. Athena Scientific, Belmont

    Google Scholar 

  40. Szepesvári C (2010) Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San Rafael

    Article  Google Scholar 

  41. Vamvoudakis KG, Lewis FL (2009) Online synchronous policy iteration method for optimal control. In: Yu W (ed) Recent advances in intelligent control systems. Springer, Berlin, pp 357–374

    Chapter  Google Scholar 

  42. Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE (2013) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):89–92

    Article  MathSciNet  Google Scholar 

  43. Chowdhary G (2010) Concurrent learning for convergence in adaptive control without persistency of excitation. Ph.D. thesis, Georgia Institute of Technology

    Google Scholar 

  44. Chowdhary G, Johnson E (2011) A singular value maximizing data recording algorithm for concurrent learning. In: Proceedings of the American control conference, pp 3547–3552

    Google Scholar 

  45. Modares H, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202

    Article  MathSciNet  Google Scholar 

  46. Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216

    Article  Google Scholar 

  47. Zhang H, Liu D, Luo Y, Wang D (2013) Adaptive dynamic programming for control algorithms and stability. Communications and control engineering, Springer, London

    Book  Google Scholar 

  48. Luo B, Wu HN, Huang T, Liu D (2014) Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica

    Article  MathSciNet  Google Scholar 

  49. Yang X, Liu D, Wei Q (2014) Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming. IET Control Theory Appl 8(16):1676–1688

    Article  MathSciNet  Google Scholar 

  50. Ge SS, Zhang J (2003) Neural-network control of nonaffine nonlinear system with zero dynamics by state and output feedback. IEEE Trans Neural Netw 14(4):900–918

    Article  Google Scholar 

  51. Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8):1825–1832

    Article  MathSciNet  Google Scholar 

  52. Zhang X, Zhang H, Sun Q, Luo Y (2012) Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence. Neurocomputing 91:48–55

    Article  Google Scholar 

  53. Liu D, Huang Y, Wang D, Wei Q (2013) Neural-network-observer-based optimal control for unknown nonlinear systems using adaptive dynamic programming. Int J Control 86(9):1554–1566

    Article  MathSciNet  Google Scholar 

  54. Bian T, Jiang Y, Jiang ZP (2014) Adaptive dynamic programming and optimal control of nonlinear nonaffine systems. Automatica 50(10):2624–2632

    Article  MathSciNet  Google Scholar 

  55. Yang X, Liu D, Wei Q, Wang D (2015) Direct adaptive control for a class of discrete-time unknown nonaffine nonlinear systems using neural networks. Int J Robust Nonlinear Control 25(12):1844–1861

    Article  MathSciNet  Google Scholar 

  56. Kiumarsi B, Kang W, Lewis FL (2016) H-\(\infty \) control of nonaffine aerial systems using off-policy reinforcement learning. Unmanned Syst 4(1):1–10

    Google Scholar 

  57. Song R, Wei Q, Xiao W (2016) Off-policy neuro-optimal control for unknown complex-valued nonlinear systems based on policy iteration. Neural Comput Appl 46(1):85–95

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rushikesh Kamalapurkar .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kamalapurkar, R., Walters, P., Rosenfeld, J., Dixon, W. (2018). Computational Considerations. In: Reinforcement Learning for Optimal Feedback Control. Communications and Control Engineering. Springer, Cham. https://doi.org/10.1007/978-3-319-78384-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-78384-0_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-78383-3

  • Online ISBN: 978-3-319-78384-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics