Abstract
Deep Reinforcement Learning has made impressive advances in sequential decision making problems recently. Constructive reinforcement learning (RL) algorithms have been proposed to focus on the policy optimization process, while further research on different network architectures of the policy has not been fully explored. MLPs, LSTMs and linear layer are complementary in their controlling capabilities, as MLPs are appropriate for global control, LSTMs are able to exploit history information and linear layer is good at stabilizing system dynamics. In this paper, we propose a “Proportional-Integral” (PI) neural network architecture that could be easily combined with popular optimization algorithms. This PI-patterned policy network obtains the advantages of integral control and linear control that are widely applied in classic control systems, improving the sample efficiency and training performance on most RL tasks. Experimental results on public RL simulation platforms demonstrate the proposed architecture could achieve better performance than generally used MLP and other existing applied models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ang, K.H., Chong, G., Li, Y.: PID control system analysis, design, and technology. IEEE Trans. Control. Syst. Technol. 13(4), 559–576 (2005)
Bakker, B.: Reinforcement learning with long short-term memory. In: Advances in Neural Information Processing Systems, pp. 1475–1482 (2002)
Brockman, G., et al.: OpenAI Gym (2016)
Dhariwal, P., et al.: OpenAI Baselines (2017). https://github.com/openai/baselines
Haarnoja, T., Pong, V., Zhou, A., Dalal, M., Abbeel, P., Levine, S.: Composable deep reinforcement learning for robotic manipulation. arXiv preprint arXiv:1803.06773 (2018)
Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters. arXiv preprint arXiv:1709.06560 (2017)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9(3), 32–50 (2009)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. US Patent App. 15/217,758, 26 January 2017
Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015)
Mahmood, A.R., Korenkevych, D., Komer, B.J., Bergstra, J.: Setting up a reinforcement learning task with a real-world robot. arXiv preprint arXiv:1803.07067 (2018)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Plappert, M., et al.: Multi-goal reinforcement learning: challenging robotics environments and request for research (2018)
Rajeswaran, A., Lowrey, K., Todorov, E.V., Kakade, S.M.: Towards generalization and simplicity in continuous control. In: Advances in Neural Information Processing Systems, pp. 6553–6564 (2017)
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems, vol. 37. University of Cambridge, Department of Engineering (1994)
Salih, A.L., Moghavvemi, M., Mohamed, H.A., Gaeid, K.S.: Modelling and PID controller design for a quadrotor unmanned air vehicle. In: 2010 IEEE International Conference on Automation Quality and Testing Robotics (AQTR), vol. 1, pp. 1–5. IEEE (2010)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Shalev-Shwartz, S., Ben-Zrihem, N., Cohen, A., Shashua, A.: Long-term planning by short-term prediction. arXiv preprint arXiv:1602.01580 (2016)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Srouji, M., Zhang, J., Salakhutdinov, R.: Structured control nets for deep reinforcement learning. arXiv preprint arXiv:1802.08311 (2018)
Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581 (2015)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Williams, J.D., Zweig, G.: End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning. arXiv preprint arXiv:1606.01269 (2016)
Wu, C., et al.: Variance reduction for policy gradient with action-dependent factorized baselines. arXiv preprint arXiv:1803.07246 (2018)
Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In: Advances in Neural Information Processing Systems, pp. 5285–5294 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, Y., Gu, C., Wu, K., Guan, X. (2018). Reinforcement Learning Policy with Proportional-Integral Control. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11303. Springer, Cham. https://doi.org/10.1007/978-3-030-04182-3_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-04182-3_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04181-6
Online ISBN: 978-3-030-04182-3
eBook Packages: Computer ScienceComputer Science (R0)