Multi-robot Cooperation Strategy in a Partially Observable Markov Game Using Enhanced Deep Deterministic Policy Gradient

Tang, Qirong; Zhang, Jingtao; Yu, Fangchao; Xu, Pengjie; Zhang, Zhongqun

doi:10.1007/978-3-030-26354-6_1

Qirong Tang¹¹,
Jingtao Zhang¹¹,
Fangchao Yu¹¹,
Pengjie Xu¹¹ &
…
Zhongqun Zhang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11656))

Included in the following conference series:

International Conference on Swarm Intelligence

975 Accesses
2 Citations

Abstract

Deep reinforcement learning (DRL) has been applied to solve challenging problems in robotic domains. However, since non-stationary of the environment and the difficulty of long-term interaction between robots, traditional DRL is poorly suitable for multi-robot. Thus, an enhanced deep deterministic policy gradient algorithm is proposed in this study to explore the application of DRL in multi-robot domains. The algorithm ensures a cooperation strategy for multi-robot, which merely uses partially observed state of each robot, named a partially observable Markov game, realize global optimality in executing process. It is achieved by eliminating non-stationary of the environment in training process and a centralized critic for decentralized multi-robot. Simulations with increasingly complex environments are performed to validate the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Nuovo, A.D., et al.: The multi-modal interface of robot-era multi-robot services tailored for the elderly. Intell. Serv. Rob. 11(1), 109–126 (2018)
Article Google Scholar
Schmuck, P., Chli, M.: Multi-UAV collaborative monocular SLAM. In: International Conference on Robotics and Automation, pp. 3863–3870. Singapore (2017)
Google Scholar
Luo, W., Tang, Q., Fu, C., Eberhard, P.: Deep-sarsa based multi-UAV path planning and obstacle avoidance in a dynamic environment. In: Tan, Y., Shi, Y., Tang, Q. (eds.) ICSI 2018. LNCS, vol. 10942, pp. 102–111. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93818-9_10
Chapter Google Scholar
Milad, N., Esmaeel, K., Samira, D.: Multi-objective multi-robot path planning in continuous environment using an enhanced genetic algorithm. Expert Syst. Appl. 115, 106–120 (2019)
Article Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2015)
MathSciNet MATH Google Scholar
Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: International Conference on Machine Learning, Amherst, USA, pp. 330–337 (1993)
Google Scholar
Matignon, L., Laurent, G.J., Fort-Piat, N.L.: Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. Knowl. Eng. Rev. 27(1), 1–31 (2012)
Article Google Scholar
Hao, J., Huang, D., Yi, C., Leung, H.F.: The dynamics of reinforcement social learning in networked cooperative multiagent systems. Eng. Appl. Artif. Intell. 58, 111–122 (2017)
Article Google Scholar
Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: Sukthankar, G., Rodriguez-Aguilar, J.A. (eds.) AAMAS 2017. LNCS (LNAI), vol. 10642, pp. 66–83. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71682-4_5
Chapter Google Scholar
Fan, B., Pan, Q., Zhang, H.C.: A multi-agent coordination method based on Markov game and application to robot soccer. Robotics 182(4), 357–366 (2005)
Google Scholar
Foerster, J.N., Assael, Y.M., Freitas, N.D., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: International Conference on Neural Information Processing Systems, Barcelo, Spain, pp. 2137–2145 (2016)
Google Scholar
Olsder, G.J., Papavassilopoulos, G.P.: A Markov chain game with dynamic information. J. Optim. Theor. Appl. 59(3), 467–486 (1988)
Article MathSciNet Google Scholar
Foerster, J., Nardelli, N., Farquhar, G., Torr, P.H.S., Kohli, P., Whiteson, S.: Stabilising experience replay for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 1146–1155. PMLR, Singapore (2017)
Google Scholar
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. J. Mach. Learn. Res. 32, 387–395 (2014)
Google Scholar

Download references

Acknowledgements

This work is supported by the projects of National Natural Science Foundation of China(No. 61603277, No. 61873192), the Key Pre-Research Project of the 13th-Five-Year-Plan on Common Technology (No. 41412050101), and Field Fund (No. 61403120407). Meanwhile, this work is also partially supported by the Fundamental Research Funds for the Central Universities, and the Youth 1000 program project. It is also partially sponsored by the Key Basic Research Project of Shanghai Science and Technology Innovation Plan (No. 15JC1403300), as well as the projects supported by China Academy of Space Technology, and Launch Vehicle Technology. All these supports are highly appreciated.

Author information

Authors and Affiliations

Laboratory of Robotics and Multibody System, School of Mechanical Engineering, Tongji University, No. 4800, Cao An Rd., Shanghai, 201804, People’s Republic of China
Qirong Tang, Jingtao Zhang, Fangchao Yu, Pengjie Xu & Zhongqun Zhang

Authors

Qirong Tang
View author publications
You can also search for this author in PubMed Google Scholar
Jingtao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fangchao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Pengjie Xu
View author publications
You can also search for this author in PubMed Google Scholar
Zhongqun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qirong Tang .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Ying Tan
Southern University of Science and Technology, Shenzhen, China
Yuhui Shi
Shenzhen University, Shenzhen, China
Ben Niu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, Q., Zhang, J., Yu, F., Xu, P., Zhang, Z. (2019). Multi-robot Cooperation Strategy in a Partially Observable Markov Game Using Enhanced Deep Deterministic Policy Gradient. In: Tan, Y., Shi, Y., Niu, B. (eds) Advances in Swarm Intelligence. ICSI 2019. Lecture Notes in Computer Science(), vol 11656. Springer, Cham. https://doi.org/10.1007/978-3-030-26354-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-26354-6_1
Published: 19 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26353-9
Online ISBN: 978-3-030-26354-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics