Abstract
This chapter is concerned with discrete-time policy iteration adaptive dynamic programming (ADP) methods for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use a policy iteration ADP technique to obtain the iterative control laws which minimize the iterative value functions. The main contribution of this chapter is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems. It shows that the iterative value function is nonincreasingly convergent to the optimal solution of the Bellman equation. It is also proven that any of the iterative control laws can stabilize the nonlinear system. Neural networks are used to approximate the iterative value functions and compute the iterative control laws, respectively, for facilitating the implementation of the iterative ADP algorithm, where the convergence of the weight matrices is analyzed. Finally, numerical results and analysis are presented to illustrate the performance of the present method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791
Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern-Part B: Cybern 38(4):943–949
Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE (2013) A novel actorcriticidentifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):82–92
Gupta SK (1995) Numerical methods for engineers. New Age International, India
Heydari A, Balakrishnan SN (2013) Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics. IEEE Trans Neural Netw Learn Syst 24(1):145–157
Huang T, Liu D (2013) A self-learning scheme for residential energy system control and management. Neural Comput Appl 22(2):259–269
Lewis FL, Syrmos VL (1995) Optimal control. Wiley, New York
Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50
Lewis FL, Vrabie D, Vamvoudakis KG (2012) Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst 32(6):76–105
Li H, Liu D (2012) Optimal control for discrete-time affine non-linear systems using general value iteration. IET Control Theory Appl 6(18):2725–2736
Liu D, Wei Q (2013) Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems. IEEE Trans Cybern 43(2):779–789
Liu D, Wei Q (2014) Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 25(3):621–634
Modares H, Lewis FL, Naghibi-Sistani MB (2013) Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learn Syst 24(10):1513–1525
Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern-Part C: Appl Rev 32(2):140–153
Si J, Wang YT (2001) Online learning control by association and reinforcement. IEEE Trans Neural Netw 12(2):264–276
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Vamvoudakis KG, Lewis FL, Hudas GR (2012) Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8):1598–1611
Vrabie D, Vamvoudakis KG, Lewis FL (2013) Optimal adaptive control and differential games by reinforcement learning principles. IET, London
Wang D, Liu D, Wei Q (2012) Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach. Neurocomputing 78(1):14–22
Wang FY, Jin N, Liu D, Wei Q (2011) Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with \(\epsilon \)-error bound. IEEE Trans Neural Netw 22(1):24–36
Wei Q, Zhang H, Dai J (2009) Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions. Neurocomputing 72(7):1839–1848
Wertz JR (1978) Spacecraft attitude determination and control. Kluwer, Netherlands
Zhang H, Cui L, Zhang X, Luo Y (2011) Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans Neural Netw 22(12):2226–2236
Zhang H, Song R, Wei Q, Zhang T (2011) Optimal tracking control for a class of nonlinear discrete-time systems with time delays based on heuristic dynamic programming. IEEE Trans Neural Netw 22(12):1851–1862
Zhang H, Wei Q, Liu D (2011) An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 47(1):207–214
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Liu, D., Wei, Q., Wang, D., Yang, X., Li, H. (2017). Policy Iteration for Optimal Control of Discrete-Time Nonlinear Systems. In: Adaptive Dynamic Programming with Applications in Optimal Control. Advances in Industrial Control. Springer, Cham. https://doi.org/10.1007/978-3-319-50815-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-50815-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50813-9
Online ISBN: 978-3-319-50815-3
eBook Packages: EngineeringEngineering (R0)