Skip to main content
Log in

Cooperative Multi-Agent Reinforcement Learning for Multi-Component Robotic Systems: guidelines for future research

  • Review Article
  • Published:
Paladyn

Abstract

Reinforcement Learning (RL) as a paradigm aims to develop algorithms that allow to train an agent to optimally achieve a goal with minimal feedback information about the desired behavior, which is not precisely specified. Scalar rewards are returned to the agent as response to its actions endorsing or opposing them. RL algorithms have been successfully applied to robot control design. The extension of the RL paradigm to cope with the design of control systems for Multi-Component Robotic Systems (MCRS) poses new challenges, mainly related to coping with scaling up of complexity due to the exponential state space growth, coordination issues, and the propagation of rewards among agents. In this paper, we identify the main issues which offer opportunities to develop innovative solutions towards fully-scalable cooperative multi-agent systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. David W. Aha, Dennis Kibler, and Marc K. Albert. Instance-based learning algorithms. In Machine Learning, pages 37–66, 1991.

  2. R. Aragues, J. Cortes, and C. Sagues. Distributed consensus algorithms for merging feature-based maps with limited communication. Robotics and Autonomous Systems, 59(3–4):163–180, 2011.

    Article  Google Scholar 

  3. Andrew G. Barto. Using relative novelty to identify useful temporal abstractions in reinforcement learning. In In Procedings of the Twenty-First International Conference on Machine Learning, pages 751–758. ACM Press, 2004.

  4. Hamid Berenji. Fuzzy reinforcement learning and dynamic programming. In Anca Ralescu, editor, Fuzy Logic in Artificial Inteligence, volume 847 of Lecture Notes in Computer Science, pages 1–9. Springer Berlin / Heidelberg, 1994.

  5. H.R. Berenji. Fuzzy Q-learning for generalization of reinforcement learning. In IEEE Press, editor, Proc. of the Fifth IEEE International Conference on Fuzy Systems, volume 3, pages 2208–2214, 1996.

  6. Daniel S. Bernstein. Dynamic programming for partially observable stochastic games. In In Procedings of the Ninetenth National Conferenceon Artificial Inteligence, pages 709–715, 2004.

  7. Daniel S. Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. The complexity of decentralized control of markov decision processes. In Mathematics of Operations Research, page 2002, 2000.

  8. Michael Bowling and Manuela Veloso. Scalable learning in stochastic games. In In: AAAI Workshop on Game The oretic and Decision Theoretic Agents, pages 11–18, 2002.

  9. Justin A. Boyan and Andrew W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Procesing Systems 7, pages 369–376. MIT Press, 1995.

  10. Steven J. Bradtke and Michael O. Duff. Reinforcement learning methods for continuous-time markov decision problems. In Advances in Neural Information Procesing Systems, pages 393–400. MIT Press, 1994.

  11. L. Busoniu, R. Babuska, and B. De Schutter. Comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics. Part C: Applications and Reviews, 38(2):pp. 156–172, 2008.

    Article  Google Scholar 

  12. D. Chapman and L.P. Kaelbling. Input generalization in delayed reinforcement learning: An algorithm and performance comparisons. In Learning and Knowledge Acquisition, IJCAI 1991, pages 726–731. Morgan Kaufmann, 1991.

  13. Chung-Cheng Chiu and Von-Wun Soo. Subgoal identification for reinforcement learning and planning in multiagent problem solving. In Paolo Petta, Jrg Mller, Matthias Klusch, and Michael Georgeff, editors, Multiagent System Technologies, volume 4687 of Lecture Notesin Computer Science, pages pp. 37–48. Springer Berlin / Heidelberg, 2007.

  14. Chung-Cheng Chiu and Von-Wun Soo. Automatic complexity reduction in reinforcement learning. Computational Inteligence, 26(1):pp. 1–25, 2010.

    Article  MathSciNet  Google Scholar 

  15. Chung-Cheng Chiu and Von-Wun Soo. Advancesin Reinforcement Learning, chapter Subgoal Identifications in Reinforcement Learning: A Survey, pages pp.181–188. InTech, 2011.

  16. Caroline Claus and Craig Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In InProcedings of the Fiftenth National Conference on Artificial Inteligence, pages 746–752. AAAI Press, 1997.

  17. Robert Crites and Andrew Barto. Improving elevator performance using reinforcement learning. In Advances in Neural Information Processing Systems 8, pages 1017–1023. MIT Press, 1996.

  18. Thomas Dietterich. An overview of maxq hierarchical reinforcement learning. In Berthe Choueiry and Toby Walsh, editors, Abstraction, Reformulation, and Approximation, volume 1864 of Lecture Notes in Computer Science, pages pp. 26–44. Springer Berlin / Heidelberg, 2000.

  19. Thomas G. Dietterich. Hierarchical reinforcement learning with the maxq value function decomposition. Journal of Artificial Intelligence Research, 13:pp. 227–303, 2000.

    MathSciNet  MATH  Google Scholar 

  20. Bruce Digney. Learning hierarchical control structures for multiple tasks and changing environments. In In Procedings of the Fifth Conference on the Simulation of Adaptive Behavior: SAB98. MIT Press, 1998.

  21. Y. Duan and X. Hexu. Fuzzy reinforcement learning and its application in robot navigation. In Machine Learning and Cybernetics,2005. Proceedings of 2005 International Conference on, volume 2, pages 899–904 Vol. 2, 18–21 2005.

    Article  Google Scholar 

  22. R.J. Duro, Manuel Graña, and J. de Lope. On the potential contributions of hybrid intelligent approaches to multicomponen robotic system development. Information Sciences, 180(14):2635–2648, 2010.

    Article  Google Scholar 

  23. Z. Echegoyen, I. Villaverde, R. Moreno, M. Graña, and A. d’Anjou. Linked multi-component mobile robots: modeling, simulation and control. Robotics and Autonomous Systems, 58(12):1292–1305, 2010.

    Article  Google Scholar 

  24. B. Fernandez-Gauna, J.M. Lopez-Guede, E. Zulueta, Z. Echegoyen, and M. Graña. Basic results and experiments on robotic multi-agent system for hose deployment and transportation. International Journal of Artificial Inteligence, 6(S11):183–202, 2011.

    Google Scholar 

  25. Robert Fitch, Bernhard Hengst, Dorian Suc, Greg Calbert, and Jason Scholz. Structural abstraction experiments in reinforcement learning. In Shichao Zhang and Ray Jarvis, editors, AI 2005: Advances in Artificial Inteligence, volume 3809 of Lecture Notes in Computer Science, pages pp. 164–175. Springer Berlin / Heidelberg, 2005.

  26. Mohammad Ghavamzadeh and Sridhar Mahadevan. Learning to communicate and act using hierarchical reinforcement learning. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multi Agent Systems, pages 1114–1121, 2004.

  27. Robert Givan, Thomas Dean, and Matthew Greig. Equivalence notions and model minimization in markov decision processes. Artif. Intel., 147:163–223, July 2003.

    Article  MathSciNet  MATH  Google Scholar 

  28. Carlos Guestrin, Daphne Koller, and Ronald Parr. Multiagent planning with factored mdps. In NIPS-14, pages pp. 1523–1530. The MIT Press, 2001.

  29. Carlos Guestrin, Michail Lagoudakis, and Ronald Parr. Coordinated reinforcement learning. In In Proceedings of the IXth ICML, pages 227–234, 2002.

  30. T. Hall, M. Humphrys, and M. Humphrys. Action selection methods using reinforcement learning. In Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, pages 135–144. MIT Press, 1996.

  31. Bernhard Hengst. Discovering hierarchy in reinforcement learning with hexq. In In Maching Learning: Proceedings of the Nineteenth International Conference on Machine Learning, pages pp. 243–250. Morgan Kaufmann, 2002.

  32. Pieter Hoen, Karl Tuyls, Liviu Panait, Sean Luke, and J.A. La Poutr. An overview of cooperative and competitive multiagent learning. In Karl Tuyls, Pieter Hoen, Katja Verbeeck, and Sandip Sen, editors, Learning and Adaption in Multi-Agent Systems, volume 3898 of Lecture Notes in Computer Science, pages 1–46. Springer Berlin / Heidelberg, 2006.

  33. Nicholas K. Jong. State abstraction discovery from irrelevant state variables. In In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, pages pp. 752–757, 2005.

  34. Anders Jonsson and Andrew Barto. A causal approach to hierarchical decomposition of factored mdps. In Advances in Neural Information Processing Systems, volume 13, pages pp.1054–1060, 2005.

    Google Scholar 

  35. Anders Jonsson and Andrew Barto. Causal graph based decomposition of factored mdps. J. Mach. Learn. Res., 7:pp. 2259–2301, December 2006.

    MathSciNet  MATH  Google Scholar 

  36. Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Inteligence, 101:99–134, 1998.

    Article  MathSciNet  MATH  Google Scholar 

  37. S. Kapetanakis and D. Kudenko. Reinforcement learning of coordination in cooperative multi-agent systems. In 18th National Conference on Artificial Intelligence and 14th Conference on Innovative Applications of Artificial Intelligence, pages pp. 326–331, 2002.

  38. Jelle R. Kok and Nikos Vlassis. Sparse cooperative q-learning. In Proceedings of the International Conference on Machine Learning, pages 481–488. ACM, 2004.

  39. Jelle R. Kok and Nikos Vlassis. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research, 7:1789–1828, 2006.

    MathSciNet  MATH  Google Scholar 

  40. Daphne Koller and Ronald Parr. Computing factored value functions for policies in structured mdps. In In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pages 1332–1339. Morgan Kaufmann, 1999.

  41. Martin Lauer and Martin A. Riedmiller. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML’ 00, pages 535–542, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc.

  42. C. Li, J. Zhang, and Y. Li. Application of artificial neural network based on q-learning for mobile robot path planning. In Information Acquisition, 2006 IEEE International Conference on, pages 978–982, 20–23 2006.

  43. Sridhar Mahadevan, Nicholas Marchalleck, Tapas K. Das, and A. Gosavi. Self-improving factory simulation using continuous-time average-reward reinforcement learning. In Proceedings of the 14th International Conference on Machine Learning, pages 202–210. Morgan Kaufmann, 1997.

  44. Rajbala Makar and Sridhar Mahadevan. Hierarchical multi-agent reinforcement learning. In Proceedings of the Fifth International Conference on Autonomous Agents, pages pp. 246–253. ACM Press, 2001.

  45. Shie Mannor, Ishai Menache, Amit Hoze, and Uri Klein. Dynamic abstraction in reinforcement learning via clustering. In In Proceedings of the Twenty-First International Conference on Machine Learning, pages pp. 560–567. ACM Press, 2004.

  46. Amy Mcgovern and Andrew G. Barto. Automatic discovery of subgoals in reinforcement learning using diverse density. In In Proceedings of the eighteenth international conference on machine learning, pages pp. 361–368. Morgan Kaufmann, 2001.

  47. Francisco Melo and M. Ribeiro. Coordinated learning in multiagent mdps with infinite state-space. Autonomous Agents and Multi-Agent Systems, 21:321–367, 2010. 10.1007/s10458-009-9104-y.

    Article  Google Scholar 

  48. Ishai Menache, Shie Mannor, and Nahum Shimkin. Q-cut: Dynamic discovery of sub-goals in reinforcement learning. In Tapio Elomaa, Heikki Mannila, and Hannu Toivonen, editors, Machine Learning: ECML 2002, volume 2430 of Lecture Notes in Computer Science, pages pp. 187–195. Springer Berlin / Heidelberg, 2002.

  49. N. Ono and K. Fukumoto. A modular approach to multi-agent reinforcement learning. In Gerhard Weiss, editor, Distributed Artificial Intelligence Mets Machine Learning Learning in Multi-Agent Environments, volume 1221 of Lecture Notesin Computer Science, pages 25–39. Springer Berlin / Heidelberg, 1997.

  50. L. Panait and S. Luke. Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems, 11(3):387–434, 2005.

    Article  Google Scholar 

  51. Ronald Parr and Stuart Russell. Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems 10, pages pp. 1043–1049. MIT Press, 1998.

  52. Ronald Edward Parr. Hierarchical control and learning for markov decision processes. Master’s thesis, University of California, Berkeley, 1998. AAI9902197.

    Google Scholar 

  53. Marc Ponsen, Matthew E. Taylor, and Karl Tuyls. Abstraction and generalization in reinforcement learning: A summary and framework. In ALA Workshop, Adaptive and Learning Agents (LNAI Journal), pages pp. 1–33, 2010.

  54. Wei Ren and R.W. Beard. Distributed Consensus in Multi-Vehicle Coperative Control: Theory and Applications. Springer Publishing Company, Incorporated, 2007.

    Google Scholar 

  55. Khashayar Rohanimanesh and Sridhar Mahadevan. Decisiontheoretic planning with concurrent temporally extended actions. In In UAI’01, pages pp. 472–479. Morgan Kaufmann Publishers, 2001.

  56. Kazuyuki Samejima, Kenji Doya, and Mitsuo Kawato. Inter-module credit assignment in modular reinforcement learning. Neural Netw., 16:985–994, September 2003.

    Article  Google Scholar 

  57. Anton Maximilian Schfer, Steffen Udluft, and Departement Neural Computation. Solving partially observable reinforcement learning problems with recurrent neural networks. In In Workshop Proc. Of the European Conference on Machine Learning, 2005.

  58. Je Schneider, Weng-Keen Wong, Andrew Moore, and Martin Riedmiller. Distributed value functions. In In Proceedings of the Sixteenth International Conference on Machine Learning, pages 371–378. Morgan Kaufmann, 1999.

  59. Jing Shen, Guochang Gu, and Haibo Liu. Multi-agent hierarchical reinforcement learning by integrating options into maxq. In Computer and Computational Sciences, 2006. IMSCCS’06. First International Multi-Symposiums on, volume 1, pages 676–682, 2006.

    Google Scholar 

  60. Ozgur Simsek, Alicia P. Wolfe, and Andrew G. Barto. Identifying useful subgoals in reinforcement learning by local graph partitioning. In In Proceedings of the Twenty-Second International Conference on Machine Learning, pages pp. 816–823, 2005.

  61. Satinder Singh, Tommi Jaakkola, Michael L. Littman, and Csaba Szepesv Ari. Convergence results for single-step on-policy reinforcement-learning algorithms. In Machine Learning, pages 287–308, 1998.

  62. Satinder P. Singh, Tommi Jaakkola, and Michael I. Jordan. Reinforcement learning with soft state aggregation. In Advances in Neural Information Processing Systems 7, pages 361–368. MIT Press, 1995.

  63. William D. Smart. Explicit manifold representations for value-function approximation in reinforcement learning. In Prceedings of the 8th International Symposium on Artificial Intelligence and mathematics, pages 25–2004, 2004.

  64. Richard Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:pp. 181–211, 1999.

    Article  MathSciNet  MATH  Google Scholar 

  65. Richard S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Procesing Systems 8, pages 1038–1044. MIT Press, 1996.

  66. R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.

  67. Prasad Tadepalli and Dokyeong Ok. Scaling up average reward reinforcement learning by approximating the domain models and the value function. In In Saita, pages 471–479. Morgan Kaufmann, 1996.

  68. Y. Takahashi and M. Asada. Reinforcement Learning: Theory and Applications, chapter Modular Learning Systems for Behavior Acquisition in Multi-Agent Environment, pages 225–238. I-Tech Education and Publishing, Vienna, 2008.

    Google Scholar 

  69. Yasutake Takahashi and Minoru Asada. Modular learning systems for soccer robot. In Proceedings of the Fourth International Symposium on Human and Artificial Intelligence Systems, pages pp.370–375, 2004.

  70. Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In InProceedings of the Tenth International Conference on Machine Learning, pages 330–337. Morgan Kaufmann, 1993.

  71. Matthew E. Taylor and Peter Stone. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(1):1633–1685, 2009.

    MathSciNet  Google Scholar 

  72. N. Vlassis, R. Elhorst, and J. R. Kok. Anytime algorithms for multiagent decision making using coordination graphs. In In Proc. Intl. Conf. on Systems, Manand Cybernetics, 2004.

  73. Xiaofeng Wang and Tuomas Sandholm. Reinforcement learning to play an optimal nash equilibrium in team markov games. In in Advances in Neural Information Processing Systems, pages 1571–1578. MIT Press, 2002.

  74. Christopher Watkins and Peter Dayan. Technical note: Q-learning. In MachineLearning, volume 8, pages pp. 279–292, May 1992.

    MATH  Google Scholar 

  75. S. Whitehead, J. Karlsson, and J. Tenenberg. Robot Learning, chapter Learning multiple goal behavior via task decomposition and dynamic policy merging, pages 45–78. Kluwer Academic Publisher, 1993.

  76. H. Xiao, L. Liao, and F. Zhou. Mobile robot path planning based on q-ann. In Automation and Logistics, 2007 IEEE International Conference on, pages 2650–2654, 18–21 2007.

  77. Pucheng Zhou and Bingrong Hong. A modular on-line profit sharing approach in multiagent domains. International Journal of Electrical and Computer Engineering, 1(6):424–431, 2006.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

About this article

Cite this article

Graña, M., Fernandez-Gauna, B. & Lopez-Guede, J.M. Cooperative Multi-Agent Reinforcement Learning for Multi-Component Robotic Systems: guidelines for future research. Paladyn 2, 71–81 (2011). https://doi.org/10.2478/s13230-011-0017-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.2478/s13230-011-0017-5

Keywords

Navigation