State Representation Learning for Multi-agent Deep Deterministic Policy Gradient

  • Zhipeng Li
  • Xuesong JiangEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 891)


Multi-Agent Deterministic Policy Gradient (MADDPG) is a very useful algorithm in multi-agent domains. We analyze the algorithm and find that MADDPG uses deep neural networks (DNNs) as a Q function model. One advantage of DNNs is that they can build very complex processing functions to handle high-dimensional input. However, the disadvantage of this end-to-end learning is that it usually requires a lot of data, which is not always available for real-world control applications. In this paper, a new algorithm, State Representation Learning Multi-Agent Deep Deterministic Policy Gradient (SRL-MADPPG), is proposed that combines MADDPG with state representation learning which uses DNNs as a function fitting. i.e., model learning network is used to pre-train the first layer of the actor and critic networks, then the actor and critic learn from the state representation instead of the raw observations. Simulation result shows that the SRL-MADDPG algorithm improves the final performance in comparison with the end-to-end learning.


MADDPG DNNs State representation learning SRL-MADDPG 



This work was supported by Key Research and Development Plan Project of Shandong Province, China (No. 2017CXGC0614)


  1. 1.
    Sze, V., Chen, Y.H., Yang, T.J.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 5(12), 2295–2329 (2017)CrossRefGoogle Scholar
  2. 2.
    Riedmiller, M.: Neural fitted Q iteration – first experiences with a data efficient neural reinforcement learning method. In: European Conference on Machine Learning, pp. 317–328. Springer (2015).
  3. 3.
    Mnih, V., Kavukcuoglu, K., Silver, D.: Human-level control through deep reinforcement learning. Nature. 518, 529–533 (2015). Scholar
  4. 4.
    Lillicrap, T.P.: Continuous control with deep reinforcement learning. Comput. Sci. 8(6), A187 (2015)Google Scholar
  5. 5.
    Lowe, R.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Neural Information Processing Systems (NIPS) (2017)Google Scholar
  6. 6.
    Foerster, J.N.: Learning to communicate with deep multi-agent reinforcement learning. In: NIPS 2016: Proceedings of the Thirtieth Annual Conference on Neural Information Processing Systems (2016)Google Scholar
  7. 7.
    Jonschkowski, R., Brock, O.: State representation learning in robotics: using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014)Google Scholar
  8. 8.
    Wiskott, L., Sejnowski, T.J.: Slow feature analysis: unsupervised learning of invariances. Neural Comput. 14(4), 715–770 (2002)CrossRefGoogle Scholar
  9. 9.
    Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Workshop on Autonomous Learning, International Conference on Robotics and Automation (ICRA) (2013)Google Scholar
  10. 10.
    Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: Decision and Control IEEE (2016).
  11. 11.
    Miao, Y., Gowayyed, M., Metze, F.: End-to-end speech recognition using deep RNN models and WFST-based decoding. In: Proceedings of ASRU (2015).
  12. 12.
    Lesort, T.: State Representation Learning for Control: An Overview. arXiv:1802.04181 (2018)CrossRefGoogle Scholar
  13. 13.
    Jonschkowski, R., Brock, O.: Learning state representations with robotic priors. Auton. Robots 39(3), 407–428 (2015). Scholar
  14. 14.
    Finn, C.: Learning Visual Feature Spaces for Robotic Manipulation with Deep Spatial Autoencoders (2015)Google Scholar
  15. 15.
    Lange, S., Riedmiller, M., Voigtlander, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: International Joint Conference on Neural Networks, pp. 1–8 (2012).
  16. 16.
    Vincent, P., Larochelle, H., Bengio, Y.: Extracting and composing robust features with denoising autoencoders. In: International Conference on Machine Learning, pp. 1096–1103. ACM (2008)Google Scholar
  17. 17.
    Klyubin, A.S., Polani, D., Nehaniv, C.L.: Empowerment: a universal agent-centric measure of control. In: Congress on Evolutionary Computation, pp. 128–135 (2005).
  18. 18.
    Pathak, D.: Curiosity-driven exploration by self-supervised prediction, pp. 488–489 (2017).
  19. 19.
    Watter, M.: Embed to control: a locally linear latent dynamics model for control from raw images. In: International Conference on Neural Information Processing Systems, pp. 2746–2754. MIT Press (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.College of InformationQilu University of Technology (Shandong Academy of Sciences)JinanChina

Personalised recommendations