Policy Iteration for Optimal Control of Discrete-Time Nonlinear Systems

Liu, Derong; Wei, Qinglai; Wang, Ding; Yang, Xiong; Li, Hongliang

doi:10.1007/978-3-319-50815-3_4

Derong Liu¹⁸,
Qinglai Wei¹⁸,
Ding Wang¹⁸,
Xiong Yang¹⁹ &
…
Hongliang Li²⁰

Part of the book series: Advances in Industrial Control ((AIC))

3930 Accesses

Abstract

This chapter is concerned with discrete-time policy iteration adaptive dynamic programming (ADP) methods for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use a policy iteration ADP technique to obtain the iterative control laws which minimize the iterative value functions. The main contribution of this chapter is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems. It shows that the iterative value function is nonincreasingly convergent to the optimal solution of the Bellman equation. It is also proven that any of the iterative control laws can stabilize the nonlinear system. Neural networks are used to approximate the iterative value functions and compute the iterative control laws, respectively, for facilitating the implementation of the iterative ADP algorithm, where the convergence of the weight matrices is analyzed. Finally, numerical results and analysis are presented to illustrate the performance of the present method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791
Article MathSciNet MATH Google Scholar
Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern-Part B: Cybern 38(4):943–949
Article Google Scholar
Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE (2013) A novel actorcriticidentifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):82–92
Article MathSciNet MATH Google Scholar
Gupta SK (1995) Numerical methods for engineers. New Age International, India
Google Scholar
Heydari A, Balakrishnan SN (2013) Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics. IEEE Trans Neural Netw Learn Syst 24(1):145–157
Article Google Scholar
Huang T, Liu D (2013) A self-learning scheme for residential energy system control and management. Neural Comput Appl 22(2):259–269
Article Google Scholar
Lewis FL, Syrmos VL (1995) Optimal control. Wiley, New York
Google Scholar
Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50
Article MathSciNet Google Scholar
Lewis FL, Vrabie D, Vamvoudakis KG (2012) Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst 32(6):76–105
Article MathSciNet Google Scholar
Li H, Liu D (2012) Optimal control for discrete-time affine non-linear systems using general value iteration. IET Control Theory Appl 6(18):2725–2736
Article MathSciNet Google Scholar
Liu D, Wei Q (2013) Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems. IEEE Trans Cybern 43(2):779–789
Article Google Scholar
Liu D, Wei Q (2014) Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 25(3):621–634
Article Google Scholar
Modares H, Lewis FL, Naghibi-Sistani MB (2013) Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learn Syst 24(10):1513–1525
Article Google Scholar
Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern-Part C: Appl Rev 32(2):140–153
Article Google Scholar
Si J, Wang YT (2001) Online learning control by association and reinforcement. IEEE Trans Neural Netw 12(2):264–276
Article MathSciNet Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Google Scholar
Vamvoudakis KG, Lewis FL, Hudas GR (2012) Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8):1598–1611
Article MathSciNet MATH Google Scholar
Vrabie D, Vamvoudakis KG, Lewis FL (2013) Optimal adaptive control and differential games by reinforcement learning principles. IET, London
MATH Google Scholar
Wang D, Liu D, Wei Q (2012) Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach. Neurocomputing 78(1):14–22
Article Google Scholar
Wang FY, Jin N, Liu D, Wei Q (2011) Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with \(\epsilon \)-error bound. IEEE Trans Neural Netw 22(1):24–36
Article Google Scholar
Wei Q, Zhang H, Dai J (2009) Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions. Neurocomputing 72(7):1839–1848
Article Google Scholar
Wertz JR (1978) Spacecraft attitude determination and control. Kluwer, Netherlands
Book Google Scholar
Zhang H, Cui L, Zhang X, Luo Y (2011) Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans Neural Netw 22(12):2226–2236
Article Google Scholar
Zhang H, Song R, Wei Q, Zhang T (2011) Optimal tracking control for a class of nonlinear discrete-time systems with time delays based on heuristic dynamic programming. IEEE Trans Neural Netw 22(12):1851–1862
Article Google Scholar
Zhang H, Wei Q, Liu D (2011) An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 47(1):207–214
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Derong Liu, Qinglai Wei & Ding Wang
Tianjin University, Tianjin, China
Xiong Yang
Tencent Inc., Shenzhen, China
Hongliang Li

Authors

Derong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qinglai Wei
View author publications
You can also search for this author in PubMed Google Scholar
Ding Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hongliang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Derong Liu .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Liu, D., Wei, Q., Wang, D., Yang, X., Li, H. (2017). Policy Iteration for Optimal Control of Discrete-Time Nonlinear Systems. In: Adaptive Dynamic Programming with Applications in Optimal Control. Advances in Industrial Control. Springer, Cham. https://doi.org/10.1007/978-3-319-50815-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-50815-3_4
Published: 05 January 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50813-9
Online ISBN: 978-3-319-50815-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics