Advertisement

Soft Computing

, Volume 23, Issue 12, pp 4131–4144 | Cite as

Adaptive cruise control via adaptive dynamic programming with experience replay

  • Bin WangEmail author
  • Dongbin Zhao
  • Jin Cheng
Methodologies and Application
  • 260 Downloads

Abstract

The adaptive cruise control (ACC) problem can be transformed to an optimal tracking control problem for complex nonlinear systems. In this paper, a novel highly efficient model-free adaptive dynamic programming (ADP) approach with experience replay technology is proposed to design the ACC controller. Experience replay increases the data efficiency by recording the available driving data and repeatedly presenting them to the learning procedure of the acceleration controller in the ACC system. The learning framework that combines ADP with experience replay is described in detail. The distinguishing feature of the algorithm is that when estimating parameters of the critic network and the actor network with gradient rules, the gradients of historical data and current data are used to update parameters concurrently. It is proved with Lyapunov theory that the weight estimation errors of the actor network and the critic network are uniformly ultimately bounded under the novel weight update rules. The learning performance of the ACC controller implemented by this ADP algorithm is clearly demonstrated that experience replay can increase data efficiency significantly, and the approximate optimality and adaptability of the learned control policy are tested with typical driving scenarios.

Keywords

Adaptive cruise control Adaptive dynamic programming Experience replay Reinforcement learning Neural networks 

Notes

Acknowledgements

This work was supported partly by National Natural Science Foundation of China (Nos. 61603150, 61273136, 61573353 and 61533017), the National Key Research and Development Plan (No. 2016YFB0101000), and Doctoral Foundation of University of Jinan (No. XBS1605).

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest. This article does not contain any studies with human participants or animals performed by any of the authors.

References

  1. Adam S, Busoniu L, Babuska R (2012) Experience replay for real-time reinforcement learning control. IEEE Trans Syst Man Cybern C 42(2):201–212CrossRefGoogle Scholar
  2. Busoniu L, Babuska R, De Schutter B, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators, vol 39. CRC Press, Boca RatonzbMATHGoogle Scholar
  3. Chowdhary G, Johnson E (2010) Concurrent learning for convergence in adaptive control without persistency of excitation. In: 2010 49th IEEE conference decision and control (CDC), pp 3674–3679Google Scholar
  4. Dierks T, Thumati B, Jagannathan S (2009) Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw 22(5):851–860CrossRefzbMATHGoogle Scholar
  5. Grondman I, Busoniu L, Lopes GA, Babuska R (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern C 42(6):1291–1307CrossRefGoogle Scholar
  6. Guvenc B, Kural E (2006) Adaptive cruise control simulator: a low-cost, multiple-driver-in-the-loop simulator. IEEE Control Syst Mag 26(3):42–55CrossRefGoogle Scholar
  7. Kang F, Han S, Salgado R, Li J (2015) System probabilistic stability analysis of soil slopes using gaussian process regression with latin hypercube sampling. Comput Geotech 63:13–25CrossRefGoogle Scholar
  8. Kang F, Xu Q, Li J (2016) Slope reliability analysis using surrogate models via new support vector machines with swarm intelligence. Appl Math Model 40(11):6105–6120MathSciNetCrossRefGoogle Scholar
  9. Kiumarsi B, Lewis FL (2015) Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Trans Neural Netw Learn Syst 26(1):140–151MathSciNetCrossRefGoogle Scholar
  10. Kyongsu Y, Ilki M (2004) A driver-adaptive stop-and-go cruise control strategy. In: 2004 IEEE International Conference on Network Sensor Control, pp 601–606Google Scholar
  11. Lin LJ (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3–4):293–321Google Scholar
  12. Liu D, Wang D, Zhao D, Wei Q, Jin N (2012a) Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans Autom Sci Eng 9(3):628–634CrossRefGoogle Scholar
  13. Liu F, Sun J, Si J, Guo W, Mei S (2012b) A boundedness result for the direct heuristic dynamic programming. Neural Netw 32:229–235CrossRefzbMATHGoogle Scholar
  14. Mnih V, Kavukcuoglu K, Silver D, Rusu et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRefGoogle Scholar
  15. Modares H, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202MathSciNetCrossRefzbMATHGoogle Scholar
  16. Moon S, Moon I, Yi K (2009) Design, tuning, and evaluation of a full-range adaptive cruise control system with collision avoidance. Control Eng Pract 17(4):442–455CrossRefGoogle Scholar
  17. Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166, Department of Engineering, University of CambridgeGoogle Scholar
  18. Shakouri P, Ordys A (2014) Nonlinear model predictive control approach in design of adaptive cruise control with automated switching to cruise control. Control Eng Pract 26:160–177CrossRefGoogle Scholar
  19. Si J, Wang YT (2001) Online learning control by association and reinforcement. IEEE Trans Neural Netw 12(2):264–276CrossRefGoogle Scholar
  20. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT press, CambridgezbMATHGoogle Scholar
  21. Tapani A (2012) Vehicle trajectory effects of adaptive cruise control. J Intell Transp Syst 16(1):36–44CrossRefGoogle Scholar
  22. Tsai CC, Hsieh SM, Chen CT (2010) Fuzzy longitudinal controller design and experimentation for adaptive cruise control and stop&go. J Intell Robot Syst 59(2):167–189CrossRefzbMATHGoogle Scholar
  23. Wang B, Zhao D, Li C, Dai Y (2015) Design and implementation of an adaptive cruise control system based on supervised actor-critic learning. In: 2015 5th international conference on information science technology (ICIST), pp 243–248Google Scholar
  24. Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292zbMATHGoogle Scholar
  25. Wawrzyński P, Tanwani AK (2013) Autonomous reinforcement learning with experience replay. Neural Netw 41:156–167CrossRefzbMATHGoogle Scholar
  26. Xiao L, Gao F (2011) Practical string stability of platoon of adaptive cruise control vehicles. IEEE Trans Intell Transp Syst 12:1184–1194CrossRefGoogle Scholar
  27. Yang X, Liu D, Wang D, Wei Q (2014) Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning. Neural Netw 55:30–41CrossRefzbMATHGoogle Scholar
  28. Zhang Q, Zhao D, Zhu Y (2016) Event-triggered H\(\infty \) control for continuous-time nonlinear system via concurrent learning. IEEE Trans Syst Man Cybern Syst 47(7):1071–1081CrossRefGoogle Scholar
  29. Zhao D, Bai X, Wang F, Xu J, Yu W (2011) DHP for coordinated freeway ramp metering. IEEE Trans Intell Transp Syst 12(4):990–999CrossRefGoogle Scholar
  30. Zhao D, Zhang Z, Dai Y (2012) Self-teaching adaptive dynamic programming for go-moku. Neurocomputing 78(1):23–29CrossRefGoogle Scholar
  31. Zhao D, Wang B, Liu D (2013) A supervised actor-critic approach for adaptive cruise control. Soft Comput 17(11):2089–2099CrossRefGoogle Scholar
  32. Zhao D, Xia Z, Wang D (2015a) Model-free optimal control for affine nonlinear systems with convergence analysis. IEEE Trans Autom Sci Eng 12(4):1461–1468CrossRefGoogle Scholar
  33. Zhao D, Zhang Q, Wang D, Zhu Y (2015b) Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans Cybern 46(3):854–865CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Electrical EngineeringUniversity of JinanJinanChina
  2. 2.The State Key Laboratory of Management and Control for Complex Systems, Institute of AutomationChinese Academy of SciencesBeijingChina

Personalised recommendations