Abstract
This chapter establishes an optimal control of unknown complex-valued system. Policy iteration (PI) is used to obtain the solution of the Hamilton–Jacobi–Bellman (HJB) equation. Off-policy learning allows the iterative performance index and iterative control to be obtained by completely unknown dynamics. Critic and action networks are used to get the iterative control and iterative performance index, which execute policy evaluation and policy improvement. Asymptotic stability of the closed-loop system and the convergence of the iterative performance index function are proven. By Lyapunov technique, the uniformly ultimately bounded (UUB) of the weight error is proven. Simulation study demonstrates the effectiveness of the proposed optimal control method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction, A Bradford Book. The MIT Press, Cambridge (2005)
Lewis, F., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9(3), 32–50 (2009)
Al-Tamimi, A., Lewis, F., Abu-Khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans. Syst. Man Cybern. B Cybern. 38(4), 943–949 (2008)
Murray, J., Cox, C., Lendaris, G., Saeks, R.: Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern. Syst. 32(2), 140–153 (2002)
Modares, H., Lewis, F., Jiang, Z.: \(H_\infty \) tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 26(10), 2550–2562 (2015)
Song, R., Xiao, W., Zhang, H., Sun, C.: Adaptive dynamic programming for a class of complex-valued nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(9), 1733–1739 (2014)
Wang, J., Xu, X., Liu, D., Sun, Z., Chen, Q.: Self-learning cruise control using kernel-based least squares policy iteration. IEEE Trans. Control Syst. Technol. 22(3), 1078–1087 (2014)
Luo, B., Wu, H., Huang, T., Liu, D.: Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica 50(12), 3281–3290 (2014)
Modares, H., Lewis, F.: Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans. Autom. Control 59, 3051–3056 (2014)
Kiumarsi, B., Lewis, F., Modares, H., Karimpur, A., Naghibi-Sistani, M.: Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)
Abu-Khalaf, M., Lewis, F.: Nearly optimal control laws for nonlinear systems withsaturating actuators using a neural network HJB approach. Automatica 41, 779–791 (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Science Press, Beijing and Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Song, R., Wei, Q., Li, Q. (2019). Off-Policy Neuro-Optimal Control for Unknown Complex-Valued Nonlinear Systems. In: Adaptive Dynamic Programming: Single and Multiple Controllers. Studies in Systems, Decision and Control, vol 166. Springer, Singapore. https://doi.org/10.1007/978-981-13-1712-5_7
Download citation
DOI: https://doi.org/10.1007/978-981-13-1712-5_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1711-8
Online ISBN: 978-981-13-1712-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)