Skip to main content

Reinforcement Learning Policy with Proportional-Integral Control

  • Conference paper
  • First Online:
Book cover Neural Information Processing (ICONIP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11303))

Included in the following conference series:

Abstract

Deep Reinforcement Learning has made impressive advances in sequential decision making problems recently. Constructive reinforcement learning (RL) algorithms have been proposed to focus on the policy optimization process, while further research on different network architectures of the policy has not been fully explored. MLPs, LSTMs and linear layer are complementary in their controlling capabilities, as MLPs are appropriate for global control, LSTMs are able to exploit history information and linear layer is good at stabilizing system dynamics. In this paper, we propose a “Proportional-Integral” (PI) neural network architecture that could be easily combined with popular optimization algorithms. This PI-patterned policy network obtains the advantages of integral control and linear control that are widely applied in classic control systems, improving the sample efficiency and training performance on most RL tasks. Experimental results on public RL simulation platforms demonstrate the proposed architecture could achieve better performance than generally used MLP and other existing applied models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ang, K.H., Chong, G., Li, Y.: PID control system analysis, design, and technology. IEEE Trans. Control. Syst. Technol. 13(4), 559–576 (2005)

    Article  Google Scholar 

  2. Bakker, B.: Reinforcement learning with long short-term memory. In: Advances in Neural Information Processing Systems, pp. 1475–1482 (2002)

    Google Scholar 

  3. Brockman, G., et al.: OpenAI Gym (2016)

    Google Scholar 

  4. Dhariwal, P., et al.: OpenAI Baselines (2017). https://github.com/openai/baselines

  5. Haarnoja, T., Pong, V., Zhou, A., Dalal, M., Abbeel, P., Levine, S.: Composable deep reinforcement learning for robotic manipulation. arXiv preprint arXiv:1803.06773 (2018)

  6. Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters. arXiv preprint arXiv:1709.06560 (2017)

  7. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)

    Article  Google Scholar 

  8. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980

  9. Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)

    MathSciNet  MATH  Google Scholar 

  10. Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9(3), 32–50 (2009)

    Article  Google Scholar 

  11. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. US Patent App. 15/217,758, 26 January 2017

    Google Scholar 

  12. Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015)

  13. Mahmood, A.R., Korenkevych, D., Komer, B.J., Bergstra, J.: Setting up a reinforcement learning task with a real-world robot. arXiv preprint arXiv:1803.07067 (2018)

  14. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)

    Google Scholar 

  15. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    Article  Google Scholar 

  16. Plappert, M., et al.: Multi-goal reinforcement learning: challenging robotics environments and request for research (2018)

    Google Scholar 

  17. Rajeswaran, A., Lowrey, K., Todorov, E.V., Kakade, S.M.: Towards generalization and simplicity in continuous control. In: Advances in Neural Information Processing Systems, pp. 6553–6564 (2017)

    Google Scholar 

  18. Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems, vol. 37. University of Cambridge, Department of Engineering (1994)

    Google Scholar 

  19. Salih, A.L., Moghavvemi, M., Mohamed, H.A., Gaeid, K.S.: Modelling and PID controller design for a quadrotor unmanned air vehicle. In: 2010 IEEE International Conference on Automation Quality and Testing Robotics (AQTR), vol. 1, pp. 1–5. IEEE (2010)

    Google Scholar 

  20. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)

    Google Scholar 

  21. Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)

  22. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  23. Shalev-Shwartz, S., Ben-Zrihem, N., Cohen, A., Shashua, A.: Long-term planning by short-term prediction. arXiv preprint arXiv:1602.01580 (2016)

  24. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  25. Srouji, M., Zhang, J., Salakhutdinov, R.: Structured control nets for deep reinforcement learning. arXiv preprint arXiv:1802.08311 (2018)

  26. Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)

    Google Scholar 

  27. Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581 (2015)

  28. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    MATH  Google Scholar 

  29. Williams, J.D., Zweig, G.: End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning. arXiv preprint arXiv:1606.01269 (2016)

  30. Wu, C., et al.: Variance reduction for policy gradient with action-dependent factorized baselines. arXiv preprint arXiv:1803.07246 (2018)

  31. Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In: Advances in Neural Information Processing Systems, pp. 5285–5294 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chaochen Gu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, Y., Gu, C., Wu, K., Guan, X. (2018). Reinforcement Learning Policy with Proportional-Integral Control. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11303. Springer, Cham. https://doi.org/10.1007/978-3-030-04182-3_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04182-3_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04181-6

  • Online ISBN: 978-3-030-04182-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics