Skip to main content

Generalized Policy Iteration ADP for Discrete-Time Nonlinear Systems

  • Chapter
  • First Online:
  • 3876 Accesses

Part of the book series: Advances in Industrial Control ((AIC))

Abstract

In this chapter , generalized policy iteration (GPI) algorithms are developed to solve infinite-horizon optimal control problems for discrete-time nonlinear systems. GPI algorithms use the idea of interacting policy iteration and value iteration algorithms of adaptive dynamic programming (ADP). They permit an arbitrary positive semidefinite function to initialize the algorithm, where two revolving iterations are used for policy evaluation and policy improvement, respectively. Then, the monotonicity, convergence, admissibility, and optimality properties of the present GPI algorithms for discrete-time nonlinear systems are analyzed. For implementation of the GPI algorithms, neural networks are employed for approximating the iterative value functions and computing the iterative control laws, respectively, to obtain the approximate optimal control law. Simulation examples are included to verify the effectiveness of the present algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791

    Article  MathSciNet  MATH  Google Scholar 

  2. Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern Part B Cybern 38(4):943–949

    Article  Google Scholar 

  3. Apostol TM (1974) Mathematical analysis, 2nd edn. Addison-Wesley, Boston

    MATH  Google Scholar 

  4. Beard RW (1995) Improving the closed-loop performance of nonlinear systems. Ph.D. thesis, Rensselaer Polytechnic Institute, Troy, NY

    Google Scholar 

  5. Bertsekas DP (2007) Dynamic programming and optimal control, 3rd edn. Athena Scientific, Belmont

    MATH  Google Scholar 

  6. Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont

    MATH  Google Scholar 

  7. Dorf RC, Bishop RH (2011) Modern control systems, 12th edn. Prentice-Hall, Upper Saddle River

    MATH  Google Scholar 

  8. Lewis FL, Vrabie D, Vamvoudakis KG (2012) Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst Mag 32(6):76–105

    Article  MathSciNet  Google Scholar 

  9. Lincoln B, Rantzer A (2006) Relaxing dynamic programming. IEEE Trans Autom Control 51(8):1249–1260

    Article  MathSciNet  Google Scholar 

  10. Liu D, Wei Q (2014) Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans Neural Network Learn Syst 25(3):621–634

    Article  Google Scholar 

  11. Liu D, Wei Q, Yan P (2015) Generalized policy iteration adaptive dynamic programming for discrete-time nonlinear systems. IEEE Trans Syst Man Cybern Syst 45(12):1577–1591

    Article  Google Scholar 

  12. Rantzer A (2006) Relaxed dynamic programming in switching systems. IEE Proc Control Theory Appl 153(5):567–574

    Article  MathSciNet  Google Scholar 

  13. Si J, Wang YT (2001) Online learning control by association and reinforcement. IEEE Trans Neural Network 12(2):264–276

    Article  MathSciNet  Google Scholar 

  14. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    Google Scholar 

  15. Vrabie D, Vamvoudakis KG, Lewis FL (2013) Optimal adaptive control and differential games by reinforcement learning principles. IET, London

    MATH  Google Scholar 

  16. Wei Q, Liu D (2014) A novel iterative \(\theta \)-adaptive dynamic programming for discrete-time nonlinear systems. IEEE Trans Autom Sci Eng 11(4):1176–1190

    Article  Google Scholar 

  17. Wei Q, Liu D, Lin H (2016) Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Trans Cybern 46(3):840–853

    Article  Google Scholar 

  18. Wei Q, Liu D, Yang X (2015) Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Trans Neural Network Learn Syst 26(4):866–879

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Derong Liu .

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Liu, D., Wei, Q., Wang, D., Yang, X., Li, H. (2017). Generalized Policy Iteration ADP for Discrete-Time Nonlinear Systems. In: Adaptive Dynamic Programming with Applications in Optimal Control. Advances in Industrial Control. Springer, Cham. https://doi.org/10.1007/978-3-319-50815-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50815-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50813-9

  • Online ISBN: 978-3-319-50815-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics