Safe Deep Reinforcement Learning Hybrid Electric Vehicle Energy Management

  • Roman LiessnerEmail author
  • Ansgar Malte Dietermann
  • Bernard Bäker
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11352)


The optimality-based design of the energy management of a hybrid electric vehicle is a challenging task due to the extensive and complex nonlinear reciprocal effects in the system, as well as the unknown vehicle use in real traffic. The optimization has to consider multiple continuous values of sensor and control variables and has to handle uncertain knowledge. The resulting decision making agent directly influences the objectives like fuel consumption. This contribution presents a concept which solves the energy management using a Deep Reinforcement Learning algorithm which simultaneously permits inadmissible actions during the learning process. Additionally, this approach can include further state variables like the battery temperature, which is not considered in classic energy management approaches. The contribution focuses on the used environment and the interaction with the Deep Reinforcement Learning algorithm.


Energy management Deep learning Safe reinforcement learning Hybrid electric vehicle 


  1. 1.
    Guzzella, L., Sciarretta, A.: Vehicle Propulsion Systems: Introduction to Modeling and Optimization, 3rd edn. Springer, Heidelberg (2013). Scholar
  2. 2.
    Banvait, H., Anwar, S., Chen, Y.: A rule-based energy management strategy for plugin hybrid electric vehicle (PHEV). In: American Control Conference (2009)Google Scholar
  3. 3.
    Lee, H.-D., Koo, E.-S., Sul, S.-K., Kim, J.-S.: Torque control strategy for a parallel-hybrid vehicle using fuzzy logic. In: IEEE Industry Applications Magazine (2000)Google Scholar
  4. 4.
    Sivertsson, M., Sundström, C., Eriksson, L.: Adaptive control of a hybrid powertrain with map-based ECMS. In: IFAC World Congress (2011)Google Scholar
  5. 5.
    Foellinger, O.: Optimale Regelung und Steuerung. Oldenbourg (1994). ISBN: 3486231162Google Scholar
  6. 6.
    Sutton, R.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2018)Google Scholar
  7. 7.
    Kirschbaum, F., Back, M., Hart, M.: Determination of the fuel-optimal trajectory for a vehicle along a known route. IFAC Proc. Vol. 35(1), 235–239 (2002)CrossRefGoogle Scholar
  8. 8.
    Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-dynamic programming (1996). ISBN: 1886529108Google Scholar
  9. 9.
    Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. CoRR, abs/1509.02971 (2015)Google Scholar
  10. 10.
    Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe Reinforcement Learning via Shielding, CoRR, abs/1708.08611 (2017)Google Scholar
  11. 11.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)CrossRefGoogle Scholar
  12. 12.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  13. 13.
    Mnih, V., et al.: Playing Atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)Google Scholar
  14. 14.
    Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014). Hrsg. von Tony Jebara und Eric P. Xing. JMLR Workshop und Conference Proceedings, pp. 387–395 (2014)Google Scholar
  15. 15.
    Liessner, R., Dietermann, A., Bäker, B., Lüpkes, K.: Generation of replacement vehicle speed cycles based on extensive customer data by means of Markov models and threshold accepting. SAE Int. J. Altern. Powertrains 6(1), 165–173 (2017)CrossRefGoogle Scholar
  16. 16.
    Liessner, R., Dietermann, A., Bäker, B., Lüpkes, K.: Derivation of real-world driving cycles corresponding to traffic situation and driving style on the basis of Markov models and cluster analyses. In: 6th Conference on Hybrid and Electric Vehicles, (HEVC 2016) (2016)Google Scholar
  17. 17.
    Onori, S., Serrao, L., Rizzoni, G.: Hybrid Electric Vehicles: Energy Management Strategies. Springer, London (2016)CrossRefGoogle Scholar
  18. 18.
    Helbing, M., Bäker, B., Schiffer, S.: Total vehicle concept design using computational intelligence. In: 6th Conference on Future Automotive Technology, Fürstenfeldbruck (2017)Google Scholar
  19. 19.
    Pillas, J.: Modellbasierte Optimierung dynamischer Fahrmanöver mittels Prüfständen, Dissertation, Technischen Universität Darmstadt (2017)Google Scholar
  20. 20.
    Engelhardt, T.: Derating-Strategien für elektrisch angetriebene Sportwagen, Wissenschaftliche Reihe Fahrzeugtchnik Universität Stuttgart (2017)Google Scholar
  21. 21.
    Wei, L.: Introduction to Hybrid Vehicle System Modeling and Control. Wiley, Hoboken (2013). ISBN 978-1-118-30840-0Google Scholar
  22. 22.
    Duan, Y.: Benchmarking deep reinforcement learning for continuous control. In: Proceedings of the 33rd International Conference on Machine Learning (ICML) (2016)Google Scholar
  23. 23.
    Plappert, M., et al.: Parameter Space Noise for Exploration, CoRR, abs/1706.01905 (2017)Google Scholar
  24. 24.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization, CoRR, abs/1412.6980 (2014)Google Scholar
  25. 25.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR W and CP Volume, vol. 15, pp. 315–323 (2011)Google Scholar
  26. 26.
    Uhlenbeck, G., Ornstein, L.: On the theory of the brownian motion. Phys. Rev. 36(5), 823 (1930)CrossRefGoogle Scholar
  27. 27.
    Liessner, R., Dietermann, A., Schroer C., Bäker, B.: Deep reinforcement learning for advanced energy management of hybrid electric vehicles. In: Proceedings of the 10th International Conference on Agents and Artificial Intelligence - (Volume 2) (2018)Google Scholar
  28. 28.
    Wu, Y., Mansimov, E., Liao, S., Grosse, R., Ba, J.: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation, CoRR, abs/1708.05144 (2017)Google Scholar
  29. 29.
    Mnih, V., et al.: Asynchronous Methods for Deep Reinforcement Learning, CoRR, abs/1602.01783 (2016)Google Scholar
  30. 30.
    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms, CoRR, abs/1707.06347 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Roman Liessner
    • 1
    Email author
  • Ansgar Malte Dietermann
    • 1
  • Bernard Bäker
    • 1
  1. 1.Dresden Institute of Automobile EngineeringTechnische Universität DresdenDresdenGermany

Personalised recommendations