Skip to main content

Policy Iteration for Optimal Control of Discrete-Time Nonlinear Systems

  • Chapter
  • First Online:
Adaptive Dynamic Programming with Applications in Optimal Control

Part of the book series: Advances in Industrial Control ((AIC))

  • 3930 Accesses

Abstract

This chapter is concerned with discrete-time policy iteration adaptive dynamic programming (ADP) methods for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use a policy iteration ADP technique to obtain the iterative control laws which minimize the iterative value functions. The main contribution of this chapter is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems. It shows that the iterative value function is nonincreasingly convergent to the optimal solution of the Bellman equation. It is also proven that any of the iterative control laws can stabilize the nonlinear system. Neural networks are used to approximate the iterative value functions and compute the iterative control laws, respectively, for facilitating the implementation of the iterative ADP algorithm, where the convergence of the weight matrices is analyzed. Finally, numerical results and analysis are presented to illustrate the performance of the present method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791

    Article  MathSciNet  MATH  Google Scholar 

  2. Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern-Part B: Cybern 38(4):943–949

    Article  Google Scholar 

  3. Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE (2013) A novel actorcriticidentifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):82–92

    Article  MathSciNet  MATH  Google Scholar 

  4. Gupta SK (1995) Numerical methods for engineers. New Age International, India

    Google Scholar 

  5. Heydari A, Balakrishnan SN (2013) Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics. IEEE Trans Neural Netw Learn Syst 24(1):145–157

    Article  Google Scholar 

  6. Huang T, Liu D (2013) A self-learning scheme for residential energy system control and management. Neural Comput Appl 22(2):259–269

    Article  Google Scholar 

  7. Lewis FL, Syrmos VL (1995) Optimal control. Wiley, New York

    Google Scholar 

  8. Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50

    Article  MathSciNet  Google Scholar 

  9. Lewis FL, Vrabie D, Vamvoudakis KG (2012) Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst 32(6):76–105

    Article  MathSciNet  Google Scholar 

  10. Li H, Liu D (2012) Optimal control for discrete-time affine non-linear systems using general value iteration. IET Control Theory Appl 6(18):2725–2736

    Article  MathSciNet  Google Scholar 

  11. Liu D, Wei Q (2013) Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems. IEEE Trans Cybern 43(2):779–789

    Article  Google Scholar 

  12. Liu D, Wei Q (2014) Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 25(3):621–634

    Article  Google Scholar 

  13. Modares H, Lewis FL, Naghibi-Sistani MB (2013) Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learn Syst 24(10):1513–1525

    Article  Google Scholar 

  14. Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern-Part C: Appl Rev 32(2):140–153

    Article  Google Scholar 

  15. Si J, Wang YT (2001) Online learning control by association and reinforcement. IEEE Trans Neural Netw 12(2):264–276

    Article  MathSciNet  Google Scholar 

  16. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    Google Scholar 

  17. Vamvoudakis KG, Lewis FL, Hudas GR (2012) Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8):1598–1611

    Article  MathSciNet  MATH  Google Scholar 

  18. Vrabie D, Vamvoudakis KG, Lewis FL (2013) Optimal adaptive control and differential games by reinforcement learning principles. IET, London

    MATH  Google Scholar 

  19. Wang D, Liu D, Wei Q (2012) Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach. Neurocomputing 78(1):14–22

    Article  Google Scholar 

  20. Wang FY, Jin N, Liu D, Wei Q (2011) Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with \(\epsilon \)-error bound. IEEE Trans Neural Netw 22(1):24–36

    Article  Google Scholar 

  21. Wei Q, Zhang H, Dai J (2009) Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions. Neurocomputing 72(7):1839–1848

    Article  Google Scholar 

  22. Wertz JR (1978) Spacecraft attitude determination and control. Kluwer, Netherlands

    Book  Google Scholar 

  23. Zhang H, Cui L, Zhang X, Luo Y (2011) Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans Neural Netw 22(12):2226–2236

    Article  Google Scholar 

  24. Zhang H, Song R, Wei Q, Zhang T (2011) Optimal tracking control for a class of nonlinear discrete-time systems with time delays based on heuristic dynamic programming. IEEE Trans Neural Netw 22(12):1851–1862

    Article  Google Scholar 

  25. Zhang H, Wei Q, Liu D (2011) An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 47(1):207–214

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Derong Liu .

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Liu, D., Wei, Q., Wang, D., Yang, X., Li, H. (2017). Policy Iteration for Optimal Control of Discrete-Time Nonlinear Systems. In: Adaptive Dynamic Programming with Applications in Optimal Control. Advances in Industrial Control. Springer, Cham. https://doi.org/10.1007/978-3-319-50815-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50815-3_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50813-9

  • Online ISBN: 978-3-319-50815-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics