Optimal Constrained Neuro-Dynamic Programming Based Self-learning Battery Management in Microgrids

  • Qinglai WeiEmail author
  • Derong Liu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9949)


In this paper, a novel optimal self-learning battery sequential control scheme is investigated for smart home energy systems. Using the iterative adaptive dynamic programming (ADP) technique, the optimal battery control can be obtained iteratively. Considering the power constraints of the battery, a new non-quadratic form performance index function is established, which guarantees the value of the iterative control law not to exceed the maximum charging/discharging power of the battery to extend the service life of the battery. Simulation results are given to illustrate the performance of the presented method.


Adaptive dynamic programming Approximate dynamic programming Energy management system Smart home Optimal control 


  1. 1.
    Abu-Khalaf, M., Lewis, F.L.: Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41, 779–791 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Boaro, M., Fuselli, D., Angelis, F.D., Liu, D., Wei, Q., Piazza, F.: Adaptive dynamic programming algorithm for renewable energy scheduling and battery management. Cogn. Comput. 5, 264–277 (2013)CrossRefGoogle Scholar
  3. 3.
    Fuselli, D., Angelis, F.D., Boaro, M., Liu, D., Wei, Q., Squartini, S., Piazza, F.: Action dependent heuristic dynamic programming for home energy resource scheduling. Int. J. Electr. Power Energy Syst. 48, 148–160 (2013)CrossRefGoogle Scholar
  4. 4.
    Huang, T., Liu, D.: A self-learning scheme for residential energy system control and management. Neural Comput. Appl. 22, 259–269 (2013)CrossRefGoogle Scholar
  5. 5.
    Jiang, Y., Jiang, Z.P.: Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25, 882–893 (2014)CrossRefGoogle Scholar
  6. 6.
    Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst. 32, 76–105 (2012)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Lincoln, B., Rantzer, A.: Relaxing dynamic programming. IEEE Trans. Autom. Control 51, 1249–1260 (2006)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Modares, H., Lewis, F.L.: Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50, 1780–1792 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Song, R., Xiao, W., Zhang, H., Sun, C.: Adaptive dynamic programming for a class of complex-valued nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25, 1733–1739 (2014)CrossRefGoogle Scholar
  10. 10.
    Song, R., Lewis, F.L., Wei, Q., Zhang, H., Jiang, Z.P., Levine, D.: Multiple actor-critic structures for continuous-time optimal control using input-output data. IEEE Trans. Neural Netw. Learn. Syst. 26, 851–865 (2015)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Song, R., Lewis, F.L., Wei, Q., Zhang, H.: Off-policy actor-critic structure for optimal control of unknown systems with disturbances. IEEE Transactions on Cybernetics (2015, in press). doi: 10.1109/TCYB.2015.2421338 Google Scholar
  12. 12.
    Wei, Q., Liu, D., Yang, X.: Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 26, 866–879 (2015)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Wei, Q., Wang, F., Liu, D., Yang, X.: Finite-approximation-error based discrete-time iterative adaptive dynamic programming. IEEE Trans. Cybern. 44, 2820–2833 (2014)CrossRefGoogle Scholar
  14. 14.
    Wei, Q., Liu, D.: Data-driven neuro-optimal temperature control of water gas shift reaction using stable iterative adaptive dynamic programming. IEEE Trans. Ind. Electron. 61, 6399–6408 (2014)CrossRefGoogle Scholar
  15. 15.
    Wei, Q., Liu, D., Shi, G., Liu, Y.: Optimal multi-battery coordination control for home energy management systems via distributed iterative adaptive dynamic programming. IEEE Trans. Ind. Electron. 42, 4203–4214 (2015)CrossRefGoogle Scholar
  16. 16.
    Wei, Q., Song, R., Yan, P.: Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP. IEEE Trans. Neural Netw. Learn. Syst. 27, 444–458 (2016)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Wei, Q., Liu, D., Lin, H.: Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Trans. Cybern. 46, 840–853 (2016)CrossRefGoogle Scholar
  18. 18.
    Wei, Q., Liu, D.: A novel iterative \(\theta \)-adaptive dynamic programming for discrete-time nonlinear systems. IEEE Trans. Autom. Sci. Eng. 11, 1176–1190 (2014)CrossRefGoogle Scholar
  19. 19.
    Wei, Q., Liu, D.: Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Trans. Autom. Sci. Eng. 11, 1020–1036 (2014)CrossRefGoogle Scholar
  20. 20.
    Wei, Q., Liu, D., Shi, G.: A novel dual iterative \(Q\)-learning method for optimal battery management in smart residential environments. IEEE Trans. Ind. Electron. 62, 2509–2518 (2015)CrossRefGoogle Scholar
  21. 21.
    Wei, Q., Liu, D., Lewis, F.L.: Optimal distributed synchronization control for continuous-time heterogeneous multi-agent differential graphical games. Inf. Sci. 317, 96–113 (2015)CrossRefGoogle Scholar
  22. 22.
    Wei, Q., Liu, D.: A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems. Sci. China Inf. Sci. 58, 1–15 (2015)CrossRefGoogle Scholar
  23. 23.
    Wei, Q., Lewis, F.L., Sun, Q., Yan, P., Song, R.: Discrete-time deterministic \(Q\)-learning: a novel convergence analysis. IEEE Transactions on Cybernetics (2016, in press)Google Scholar
  24. 24.
    Wei, Q., Liu, D., Lin, Q., Song, R.: Discrete-time optimal control via local policy iteration adaptive dynamic programming. IEEE Transactions on Cybernetics (2016, in press)Google Scholar
  25. 25.
    Werbos, P.J.: Advanced forecasting methods for global crisis warning and models of intelligence. General Syst. Yearbook 22, 25–38 (1977)Google Scholar
  26. 26.
    Werbos, P.J.: A menu of designs for reinforcement learning over time. In: Miller, W.T., Sutton, R.S., Werbos, P.J. (eds.) Neural Networks for Control, pp. 67–95. MIT Press, Cambridge (1991)Google Scholar
  27. 27.
    Yau, T., Walker, L.N., Graham, H.L., Raithel, R.: Effects of battery storage devices on power system dispatch. IEEE Trans. Power Apparatus Syst. 100, 375–383 (1981)CrossRefGoogle Scholar
  28. 28.
    Zhang, H., Qing, C., Luo, Y.: Neural-network-based constrained optimal control scheme for discrete-time switched nonlinear system using dual heuristic programming. IEEE Trans. Autom. Sci. Eng. 11, 839–849 (2014)CrossRefGoogle Scholar
  29. 29.
    Zhao, Q., Xu, H., Jagannathan, S.: Near optimal output feedback control of nonlinear discrete-time systems based on reinforcement neural network learning. IEEE/CAA J. Automatica Sin. 1, 372–384 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.The State Key Laboratory of Management and Control for Complex SystemsInstitute of Automation, Chinese Academy of SciencesBeijingChina
  2. 2.School of Automation and Electrical EngineeringUniversity of Science and TechnologyBeijingChina

Personalised recommendations