Abstract
Motivated by issues arising in adaptive dynamic programming for optimal control, a function approximation method is developed that aims to approximate a function in a small neighborhood of a state that travels within a compact set. The development is based on the theory of universal reproducing kernel Hilbert spaces over the n-dimensional Euclidean space. Several theorems are introduced that support the development of this State Following (StaF) method. In particular, it is shown that there is a bound on the number of kernel functions required for the maintenance of an accurate function approximation as a state moves through a compact set. Additionally, a weight update law, based on gradient descent, is introduced where good accuracy can be achieved provided the weight update law is iterated at a high enough frequency, as detailed in Theorem 7.5. Simulation results are presented that demonstrate the utility of the StaF methodology for the maintenance of accurate function approximation as well as solving an infinite horizon optimal regulation problem. The results of the simulation indicate that fewer basis functions are required to guarantee stability and approximate optimality than are required when a global approximation approach is used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For \(z \in \mathbb {C}\) the quantity Re(z) is the real part of z, and \({\overline{z}}\) represents the complex conjugate of z.
References
Kirk D (2004) Optimal control theory: an introduction. Dover, Mineola
Liberzon D (2012) Calculus of variations and optimal control theory: a concise introduction. Princeton University Press, Princeton
Christmann A, Steinwart I (2010) Universal kernels on non-standard input spaces. In: Advances in neural information processing, pp 406–414
Micchelli CA, Xu Y, Zhang H (2006) Universal kernels. J Mach Learn Res 7:2651–2667
Park J, Sanberg I (1991) Universal approximation using radial-basis-function networks. Neural Comput 3(2):246–257
Folland GB (1999) Real analysis: modern techniques and their applications, 2nd edn. Pure and applied mathematics, Wiley, New York
Steinwart I, Christmann A (2008) Support vector machines. Information science and statistics, Springer, New York
Gaggero M, Gnecco G, Sanguineti M (2013) Dynamic programming and value-function approximation in sequential decision problems: error analysis and numerical results. J Optim Theory Appl 156
Gaggero M, Gnecco G, Sanguineti M (2014) Approximate dynamic programming for stochastic n-stage optimization with application to optimal consumption under uncertainty. Comput Optim Appl 58(1):31–85
Zoppoli R, Sanguineti M, Parisini T (2002) Approximating networks and extended Ritz method for the solution of functional optimization problems. J Optim Theory Appl 112(2):403–440
Kamalapurkar R, Walters P, Dixon WE (2013) Concurrent learning-based approximate optimal regulation. In: Proceedings of the IEEE conference on decision and control, Florence, IT, pp 6256–6261
Kamalapurkar R, Andrews L, Walters P, Dixon WE (2014) Model-based reinforcement learning for infinite-horizon approximate optimal tracking. In: Proceedings of the IEEE conference on decision and control, Los Angeles, CA, pp 5083–5088
Kamalapurkar R, Klotz J, Dixon WE (2014) Concurrent learning-based online approximate feedback Nash equilibrium solution of N-player nonzero-sum differential games. IEEE/CAA J Autom Sin 1(3):239–247
Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404
Zhu K (2012) Analysis on fock spaces, vol 263. Graduate texts in mathematics, Springer, New York
Pinkus A (2004) Strictly positive definite functions on a real inner product space. Adv Comput Math 20:263–271
Rosenfeld JA, Kamalapurkar R, Dixon WE (2015) State following (StaF) kernel functions for function approximation part I: theory and motivation. In: Proceedings of the American control conference, pp 1217–1222
Beylkin G, Monzon L (2005) On approximation of functions by exponential sums. Appl Comput Harmon Anal 19(1):17–48
Bertsekas DP (1999) Nonlinear programming. Athena Scientific, Belmont
Pedersen GK (1989) Analysis now, vol 118. Graduate texts in mathematics, Springer, New York
Kamalapurkar R, Rosenfeld J, Dixon WE (2016) Efficient model-based reinforcement learning for approximate online optimal control. Automatica 74:247–258
Lorentz GG (1986) Bernstein polynomials, 2nd edn. Chelsea Publishing Co., New York
Ioannou P, Sun J (1996) Robust adaptive control. Prentice Hall, Upper Saddle River
Khalil HK (2002) Nonlinear systems, 3rd edn. Prentice Hall, Upper Saddle River
Doya K (2000) Reinforcement learning in continuous time and space. Neural Comput 12(1):219–245
Padhi R, Unnikrishnan N, Wang X, Balakrishnan S (2006) A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Netw 19(10):1648–1660
Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern Part B Cybern 38:943–949
Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50
Dierks T, Thumati B, Jagannathan S (2009) Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw 22(5–6):851–860
Mehta P, Meyn S (2009) Q-learning and pontryagin’s minimum principle. In: Proceedings of the IEEE conference on decision and control, pp 3598–3605
Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888
Zhang H, Cui L, Zhang X, Luo Y (2011) Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans Neural Netw 22(12):2226–2236
Sadegh N (1993) A perceptron network for functional identification and control of nonlinear systems. IEEE Trans Neural Netw 4(6):982–988
Chowdhary G, Yucelen T, Mühlegg M, Johnson EN (2013) Concurrent learning adaptive control of linear systems with exponentially convergent bounds. Int J Adapt Control Signal Process 27(4):280–301
Savitzky A, Golay MJE (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36(8):1627–1639
Bertsekas D, Tsitsiklis J (1996) Neuro-dynamic programming. Athena Scientific, Belmont
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Konda V, Tsitsiklis J (2004) On actor-critic algorithms. SIAM J Control Optim 42(4):1143–1166
Bertsekas D (2007) Dynamic programming and optimal control, vol 2, 3rd edn. Athena Scientific, Belmont
Szepesvári C (2010) Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San Rafael
Vamvoudakis KG, Lewis FL (2009) Online synchronous policy iteration method for optimal control. In: Yu W (ed) Recent advances in intelligent control systems. Springer, Berlin, pp 357–374
Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE (2013) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):89–92
Chowdhary G (2010) Concurrent learning for convergence in adaptive control without persistency of excitation. Ph.D. thesis, Georgia Institute of Technology
Chowdhary G, Johnson E (2011) A singular value maximizing data recording algorithm for concurrent learning. In: Proceedings of the American control conference, pp 3547–3552
Modares H, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202
Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216
Zhang H, Liu D, Luo Y, Wang D (2013) Adaptive dynamic programming for control algorithms and stability. Communications and control engineering, Springer, London
Luo B, Wu HN, Huang T, Liu D (2014) Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica
Yang X, Liu D, Wei Q (2014) Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming. IET Control Theory Appl 8(16):1676–1688
Ge SS, Zhang J (2003) Neural-network control of nonaffine nonlinear system with zero dynamics by state and output feedback. IEEE Trans Neural Netw 14(4):900–918
Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8):1825–1832
Zhang X, Zhang H, Sun Q, Luo Y (2012) Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence. Neurocomputing 91:48–55
Liu D, Huang Y, Wang D, Wei Q (2013) Neural-network-observer-based optimal control for unknown nonlinear systems using adaptive dynamic programming. Int J Control 86(9):1554–1566
Bian T, Jiang Y, Jiang ZP (2014) Adaptive dynamic programming and optimal control of nonlinear nonaffine systems. Automatica 50(10):2624–2632
Yang X, Liu D, Wei Q, Wang D (2015) Direct adaptive control for a class of discrete-time unknown nonaffine nonlinear systems using neural networks. Int J Robust Nonlinear Control 25(12):1844–1861
Kiumarsi B, Kang W, Lewis FL (2016) H-\(\infty \) control of nonaffine aerial systems using off-policy reinforcement learning. Unmanned Syst 4(1):1–10
Song R, Wei Q, Xiao W (2016) Off-policy neuro-optimal control for unknown complex-valued nonlinear systems based on policy iteration. Neural Comput Appl 46(1):85–95
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Kamalapurkar, R., Walters, P., Rosenfeld, J., Dixon, W. (2018). Computational Considerations. In: Reinforcement Learning for Optimal Feedback Control. Communications and Control Engineering. Springer, Cham. https://doi.org/10.1007/978-3-319-78384-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-78384-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78383-3
Online ISBN: 978-3-319-78384-0
eBook Packages: EngineeringEngineering (R0)