Computational Considerations

Kamalapurkar, Rushikesh; Walters, Patrick; Rosenfeld, Joel; Dixon, Warren

doi:10.1007/978-3-319-78384-0_7

Computational Considerations

Rushikesh Kamalapurkar⁹,
Patrick Walters¹⁰,
Joel Rosenfeld¹¹ &
…
Warren Dixon¹²

Chapter
First Online: 11 May 2018

3980 Accesses

Part of the book series: Communications and Control Engineering ((CCE))

Abstract

Motivated by issues arising in adaptive dynamic programming for optimal control, a function approximation method is developed that aims to approximate a function in a small neighborhood of a state that travels within a compact set. The development is based on the theory of universal reproducing kernel Hilbert spaces over the n-dimensional Euclidean space. Several theorems are introduced that support the development of this State Following (StaF) method. In particular, it is shown that there is a bound on the number of kernel functions required for the maintenance of an accurate function approximation as a state moves through a compact set. Additionally, a weight update law, based on gradient descent, is introduced where good accuracy can be achieved provided the weight update law is iterated at a high enough frequency, as detailed in Theorem 7.5. Simulation results are presented that demonstrate the utility of the StaF methodology for the maintenance of accurate function approximation as well as solving an infinite horizon optimal regulation problem. The results of the simulation indicate that fewer basis functions are required to guarantee stability and approximate optimality than are required when a global approximation approach is used.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
For \(z \in \mathbb {C}\) the quantity Re(z) is the real part of z, and \({\overline{z}}\) represents the complex conjugate of z.

References

Kirk D (2004) Optimal control theory: an introduction. Dover, Mineola
Google Scholar
Liberzon D (2012) Calculus of variations and optimal control theory: a concise introduction. Princeton University Press, Princeton
MATH Google Scholar
Christmann A, Steinwart I (2010) Universal kernels on non-standard input spaces. In: Advances in neural information processing, pp 406–414
Google Scholar
Micchelli CA, Xu Y, Zhang H (2006) Universal kernels. J Mach Learn Res 7:2651–2667
MathSciNet MATH Google Scholar
Park J, Sanberg I (1991) Universal approximation using radial-basis-function networks. Neural Comput 3(2):246–257
Article Google Scholar
Folland GB (1999) Real analysis: modern techniques and their applications, 2nd edn. Pure and applied mathematics, Wiley, New York
MATH Google Scholar
Steinwart I, Christmann A (2008) Support vector machines. Information science and statistics, Springer, New York
MATH Google Scholar
Gaggero M, Gnecco G, Sanguineti M (2013) Dynamic programming and value-function approximation in sequential decision problems: error analysis and numerical results. J Optim Theory Appl 156
Google Scholar
Gaggero M, Gnecco G, Sanguineti M (2014) Approximate dynamic programming for stochastic n-stage optimization with application to optimal consumption under uncertainty. Comput Optim Appl 58(1):31–85
Article MathSciNet Google Scholar
Zoppoli R, Sanguineti M, Parisini T (2002) Approximating networks and extended Ritz method for the solution of functional optimization problems. J Optim Theory Appl 112(2):403–440
Article MathSciNet Google Scholar
Kamalapurkar R, Walters P, Dixon WE (2013) Concurrent learning-based approximate optimal regulation. In: Proceedings of the IEEE conference on decision and control, Florence, IT, pp 6256–6261
Google Scholar
Kamalapurkar R, Andrews L, Walters P, Dixon WE (2014) Model-based reinforcement learning for infinite-horizon approximate optimal tracking. In: Proceedings of the IEEE conference on decision and control, Los Angeles, CA, pp 5083–5088
Google Scholar
Kamalapurkar R, Klotz J, Dixon WE (2014) Concurrent learning-based online approximate feedback Nash equilibrium solution of N-player nonzero-sum differential games. IEEE/CAA J Autom Sin 1(3):239–247
Article Google Scholar
Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404
Article MathSciNet Google Scholar
Zhu K (2012) Analysis on fock spaces, vol 263. Graduate texts in mathematics, Springer, New York
MATH Google Scholar
Pinkus A (2004) Strictly positive definite functions on a real inner product space. Adv Comput Math 20:263–271
Article MathSciNet Google Scholar
Rosenfeld JA, Kamalapurkar R, Dixon WE (2015) State following (StaF) kernel functions for function approximation part I: theory and motivation. In: Proceedings of the American control conference, pp 1217–1222
Google Scholar
Beylkin G, Monzon L (2005) On approximation of functions by exponential sums. Appl Comput Harmon Anal 19(1):17–48
Article MathSciNet Google Scholar
Bertsekas DP (1999) Nonlinear programming. Athena Scientific, Belmont
MATH Google Scholar
Pedersen GK (1989) Analysis now, vol 118. Graduate texts in mathematics, Springer, New York
MATH Google Scholar
Kamalapurkar R, Rosenfeld J, Dixon WE (2016) Efficient model-based reinforcement learning for approximate online optimal control. Automatica 74:247–258
Article MathSciNet Google Scholar
Lorentz GG (1986) Bernstein polynomials, 2nd edn. Chelsea Publishing Co., New York
MATH Google Scholar
Ioannou P, Sun J (1996) Robust adaptive control. Prentice Hall, Upper Saddle River
MATH Google Scholar
Khalil HK (2002) Nonlinear systems, 3rd edn. Prentice Hall, Upper Saddle River
MATH Google Scholar
Doya K (2000) Reinforcement learning in continuous time and space. Neural Comput 12(1):219–245
Article Google Scholar
Padhi R, Unnikrishnan N, Wang X, Balakrishnan S (2006) A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Netw 19(10):1648–1660
Article Google Scholar
Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern Part B Cybern 38:943–949
Article Google Scholar
Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50
Article Google Scholar
Dierks T, Thumati B, Jagannathan S (2009) Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw 22(5–6):851–860
Article Google Scholar
Mehta P, Meyn S (2009) Q-learning and pontryagin’s minimum principle. In: Proceedings of the IEEE conference on decision and control, pp 3598–3605
Google Scholar
Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888
Article MathSciNet Google Scholar
Zhang H, Cui L, Zhang X, Luo Y (2011) Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans Neural Netw 22(12):2226–2236
Article Google Scholar
Sadegh N (1993) A perceptron network for functional identification and control of nonlinear systems. IEEE Trans Neural Netw 4(6):982–988
Article Google Scholar
Chowdhary G, Yucelen T, Mühlegg M, Johnson EN (2013) Concurrent learning adaptive control of linear systems with exponentially convergent bounds. Int J Adapt Control Signal Process 27(4):280–301
Article MathSciNet Google Scholar
Savitzky A, Golay MJE (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36(8):1627–1639
Article Google Scholar
Bertsekas D, Tsitsiklis J (1996) Neuro-dynamic programming. Athena Scientific, Belmont
MATH Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Google Scholar
Konda V, Tsitsiklis J (2004) On actor-critic algorithms. SIAM J Control Optim 42(4):1143–1166
Article MathSciNet Google Scholar
Bertsekas D (2007) Dynamic programming and optimal control, vol 2, 3rd edn. Athena Scientific, Belmont
Google Scholar
Szepesvári C (2010) Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San Rafael
Article Google Scholar
Vamvoudakis KG, Lewis FL (2009) Online synchronous policy iteration method for optimal control. In: Yu W (ed) Recent advances in intelligent control systems. Springer, Berlin, pp 357–374
Chapter Google Scholar
Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE (2013) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):89–92
Article MathSciNet Google Scholar
Chowdhary G (2010) Concurrent learning for convergence in adaptive control without persistency of excitation. Ph.D. thesis, Georgia Institute of Technology
Google Scholar
Chowdhary G, Johnson E (2011) A singular value maximizing data recording algorithm for concurrent learning. In: Proceedings of the American control conference, pp 3547–3552
Google Scholar
Modares H, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202
Article MathSciNet Google Scholar
Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216
Article Google Scholar
Zhang H, Liu D, Luo Y, Wang D (2013) Adaptive dynamic programming for control algorithms and stability. Communications and control engineering, Springer, London
Book Google Scholar
Luo B, Wu HN, Huang T, Liu D (2014) Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica
Article MathSciNet Google Scholar
Yang X, Liu D, Wei Q (2014) Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming. IET Control Theory Appl 8(16):1676–1688
Article MathSciNet Google Scholar
Ge SS, Zhang J (2003) Neural-network control of nonaffine nonlinear system with zero dynamics by state and output feedback. IEEE Trans Neural Netw 14(4):900–918
Article Google Scholar
Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8):1825–1832
Article MathSciNet Google Scholar
Zhang X, Zhang H, Sun Q, Luo Y (2012) Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence. Neurocomputing 91:48–55
Article Google Scholar
Liu D, Huang Y, Wang D, Wei Q (2013) Neural-network-observer-based optimal control for unknown nonlinear systems using adaptive dynamic programming. Int J Control 86(9):1554–1566
Article MathSciNet Google Scholar
Bian T, Jiang Y, Jiang ZP (2014) Adaptive dynamic programming and optimal control of nonlinear nonaffine systems. Automatica 50(10):2624–2632
Article MathSciNet Google Scholar
Yang X, Liu D, Wei Q, Wang D (2015) Direct adaptive control for a class of discrete-time unknown nonaffine nonlinear systems using neural networks. Int J Robust Nonlinear Control 25(12):1844–1861
Article MathSciNet Google Scholar
Kiumarsi B, Kang W, Lewis FL (2016) H-\(\infty \) control of nonaffine aerial systems using off-policy reinforcement learning. Unmanned Syst 4(1):1–10
Google Scholar
Song R, Wei Q, Xiao W (2016) Off-policy neuro-optimal control for unknown complex-valued nonlinear systems based on policy iteration. Neural Comput Appl 46(1):85–95
Google Scholar

Download references

Author information

Authors and Affiliations

Mechanical and Aerospace Engineering, Oklahoma State University, Stillwater, OK, USA
Rushikesh Kamalapurkar
Naval Surface Warfare Center, Panama City, FL, USA
Patrick Walters
Electrical Engineering, Vanderbilt University, Nashville, TN, USA
Joel Rosenfeld
Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville, FL, USA
Warren Dixon

Authors

Rushikesh Kamalapurkar
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Walters
View author publications
You can also search for this author in PubMed Google Scholar
Joel Rosenfeld
View author publications
You can also search for this author in PubMed Google Scholar
Warren Dixon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rushikesh Kamalapurkar .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kamalapurkar, R., Walters, P., Rosenfeld, J., Dixon, W. (2018). Computational Considerations. In: Reinforcement Learning for Optimal Feedback Control. Communications and Control Engineering. Springer, Cham. https://doi.org/10.1007/978-3-319-78384-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-78384-0_7
Published: 11 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78383-3
Online ISBN: 978-3-319-78384-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics