Multi-agent Reinforcement Learning for Control Systems: Challenges and Proposals

  • Manuel GrañaEmail author
  • Borja Fernandez-Gauna
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9375)


Multi-agent Reinforcement Learning (MARL) methods offer a promising alternative to traditional analytical approaches for the design of control systems. We review the most important MARL algorithms from a control perspective focusing on on-line and model-free methods. We review some of sophisticated developments in the state-of-the-art of single-agent Reinforcement Learning which may be transferred to MARL, listing the most important remaining challenges. We also propose some ideas for future research aiming to overcome some of these challenges.



This research has been partially funded by grant TIN2011-23823 of the Ministerio de Ciencia e Innovación of the Spanish Government (MINECO), and the Basque Government grant IT874-13 for the research group. Manuel Graña was supported by EC under FP7, Coordination and Support Action, Grant Agreement Number 316097, ENGINE European Research Centre of Network Intelligence for Innovation Enhancement.


  1. 1.
    Arel, I., Liu, C., Urbanik, T., Kohls, A.: Reinforcement learning-based multi-agent system for network traffic signal control. Intell. Transport Syst. IET 4(2), 128–135 (2010)CrossRefGoogle Scholar
  2. 2.
    Arokhlo, M., Selamat, A., Hashim, S., Selamat, M.: Route guidance system using multi-agent reinforcement learning. In: 2011 7th International Conference on Information Technology in Asia (CITA 2011), pp. 1–5, July 2011Google Scholar
  3. 3.
    Bagnell, J.A.D., Schneider, J.: Autonomous helicopter control using reinforcement learning policy search methods. In: 2001 Proceedings of the International Conference on Robotics and Automation. IEEE, May 2001Google Scholar
  4. 4.
    Bazzan, A.: Opportunities for multiagent systems and multiagent reinforcement learning in traffic control. Auton. Agents Multi-Agent Syst. 18(3), 342–375 (2009)CrossRefGoogle Scholar
  5. 5.
    Bhatnagar, S., Sutton, R., Ghavamzadeh, M., Lee, M.: Natural actor-critic algorithms. Automatica Int. Fed. Autom. Control 45(11), 2471–2482 (2009)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Boyan, J.A.: Technical update: least-squares temporal difference learning. Mach. Learn. 49, 233–246 (2002)CrossRefzbMATHGoogle Scholar
  7. 7.
    Bussoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton (2010)CrossRefGoogle Scholar
  8. 8.
    Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence, pp. 746–752. AAAI Press (1997)Google Scholar
  9. 9.
    Czibula, G., Bocicor, M.I., Czibula, I.G.: A distributed reinforcement learning approach for solving optimization problems. In: Proceedings of the 5th WSEAS International Conference on Communications and Information Technology, CIT 2011, pp. 25–30. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point (2011)Google Scholar
  10. 10.
    De Hauwere, Y.M., Vrancx, P., Nowé, A.: Learning multi-agent state space representations. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2010, vol. 1, pp. 715–722. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2010)Google Scholar
  11. 11.
    Dietterich, T.G.: An overview of MAXQ hierarchical reinforcement learning. In: Choueiry, B.Y., Walsh, T. (eds.) SARA 2000. LNCS (LNAI), vol. 1864, p. 26. Springer, Heidelberg (2000) CrossRefGoogle Scholar
  12. 12.
    Drugan, M., Nowe, A.: Designing multi-objective multi-armed bandits algorithms: a study. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, August 2013Google Scholar
  13. 13.
    Duro, R., Graña, M., de Lope, J.: On the potential contributions of hybrid intelligent approaches to multicomponen robotic system development. Inf. Sci. 180(14), 2635–2648 (2010)CrossRefGoogle Scholar
  14. 14.
    Fernandez-Gauna, B., Lopez-Guede, J., Graña, M.: Transfer learning with partially constrained models: application to reinforcement learning of linked multicomponent robot system control. Robot. Auton. Syst. 61(7), 694–703 (2013)CrossRefGoogle Scholar
  15. 15.
    Fernandez-Gauna, B., Ansoategui, I., Etxeberria-Agiriano, I., Graña, M.: Reinforcement learning of ball screw feed drive controllers. Eng. Appl. Artif. Intell. 30, 107–117 (2014)CrossRefGoogle Scholar
  16. 16.
    Fernandez-Gauna, B., Graña, M., Etxeberria-Agiriano, I.: Distributed round-robin q-learning. PLoS ONE 10(7), e0127129 (2015)CrossRefGoogle Scholar
  17. 17.
    Fernandez-Gauna, B., Marques, I., Graña, M.: Undesired state-action prediction in multi-agent reinforcement learning. application to multicomponent robotic system control. Inf. Sci. 232, 309–324 (2013)CrossRefzbMATHGoogle Scholar
  18. 18.
    Fernandez-Gauna, B., Osa, J.L., Graña, M.: Effect of initial conditioning of reinforcement learning agents on feedback control tasks over continuous state and action spaces. In: de la Puerta, J.G., Ferreira, I.G., Bringas, P.G., Klett, F., Abraham, A., de Carvalho, A.C.P.L.F., Herrero, Á., Baruque, B., Quintián, H., Corchado, E. (eds.) International Joint Conference SOCO’14-CISIS’14-ICEUTE’14. AISC, vol. 299, pp. 125–133. Springer, Heidelberg (2014) Google Scholar
  19. 19.
    Ghavamzadeh, M., Mahadevan, S., Makar, R.: Hierarchical multi-agent reinforcement learning. Auton. Agents Multi-Agent Syst. 13, 197–229 (2006)CrossRefGoogle Scholar
  20. 20.
    Guestrin, C., Lagoudakis, M., Parr, R.: Coordinated reinforcement learning. In: Proceedings of the IXth ICML, pp. 227–234 (2002)Google Scholar
  21. 21.
    van Hasselt, H.: Reinforcement Learning in Continuous State and Action Spaces. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning: State of the Art, pp. 207–246. Springer, Heidelberg (2011)Google Scholar
  22. 22.
    Hengst, B.: Discovering hierarchy in reinforcement learning with HEXQ. In: Maching Learning: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 243–250. Morgan Kaufmann (2002)Google Scholar
  23. 23.
    Kapetanakis, S., Kudenko, D.: Reinforcement learning of coordination in cooperative multi-agent systems. In: AAAI/IAAI 2002, pp. 326–331 (2002)Google Scholar
  24. 24.
    Kok, J.R., Vlassis, N.: Collaborative multiagent reinforcement learning by payoff propagation. J. Mach. Learn. Res. 7, 1789–1828 (2006)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Kuyer, L., Whiteson, S., Bakker, B., Vlassis, N.: Multiagent reinforcement learning for urban traffic control using coordination graphs. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 656–671. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  26. 26.
    Lauer, M., Riedmiller, M.A.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML 2000, pp. 535–542. Morgan Kaufmann Publishers Inc., San Francisco (2000)Google Scholar
  27. 27.
    Li, F.D., Wu, M., He, Y., Chen, X.: Optimal control in microgrid using multi-agent reinforcement learning. ISA Trans. 51(6), 743–751 (2012)CrossRefGoogle Scholar
  28. 28.
    Littman, M.L.: Value-function reinforcement learning in Markov games. Cogn. Syst. Res. 2(1), 55–66 (2001)CrossRefGoogle Scholar
  29. 29.
    Mehta, N., Ray, S., Tadepalli, P., Dietterich, T.: Automatic discovery and transfer of MAXQ hierarchies. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 648–655. ACM, New York (2008).
  30. 30.
    Melo, F., Ribeiro, M.: Coordinated learning in multiagent MDPS with infinite state-space. Auton. Agents Multi-Agent Syst. 21, 321–367 (2010)CrossRefGoogle Scholar
  31. 31.
    Nedic, A., Bertsekas, D.: Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dyn. Syst. 13(1–2), 79–110 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Peters, J., Schaal, S.: Policy gradient methods for robotics. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2006)Google Scholar
  33. 33.
    Ren, W., Beard, R.W.: Distributed Consensus in Multi-vehicle Cooperative Control: Theory and Applications. Springer, London (2007) zbMATHGoogle Scholar
  34. 34.
    Roberts, J.W., Manchester, I.R., Tedrake, R.: Feedback controller parameterizations for reinforcement learning. In: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (2011)Google Scholar
  35. 35.
    Salkham, A., Cunningham, R., Garg, A., Cahill, V.: A collaborative reinforcement learning approach to urban traffic control optimization. In: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2008, vol. 2, pp. 560–566. IEEE Computer Society, Washington, DC (2008)Google Scholar
  36. 36.
    Servin, A., Kudenko, D.: Multi-agent reinforcement learning for intrusion detection. In: Tuyls, K., Nowe, A., Guessoum, Z., Kudenko, D. (eds.) ALAMAS 2005, ALAMAS 2006, and ALAMAS 2007. LNCS (LNAI), vol. 4865, pp. 211–223. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  37. 37.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning I: Introduction. MIT Press, Cambridge (1998) Google Scholar
  38. 38.
    Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10(1), 1633–1685 (2009)MathSciNetzbMATHGoogle Scholar
  39. 39.
    Vlassis, N., Elhorst, R., Kok, J.R.: Anytime algorithms for multiagent decision making using coordination graphs. In: Proceedings of the International Conference on Systems, Man, and Cybernetics (2004)Google Scholar
  40. 40.
    Wang, X., Sandholm, T.: Reinforcement learning to play an optimal nash equilibrium in team Markov games. In: Advances in Neural Information Processing Systems, pp. 1571–1578. MIT Press (2002)Google Scholar
  41. 41.
    Wu, C., Chowdhury, K., Di Felice, M., Meleis, W.: Spectrum management of cognitive radio using multi-agent reinforcement learning. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: Industry Track, AAMAS 2010, pp. 1705–1712. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2010)Google Scholar
  42. 42.
    Xu, X., Zuo, L., Huang, Z.: Reinforcement learning algorithms with function approximation: recent advances and applications. Inf. Sci. 261, 1–31 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  43. 43.
    Zhao, G., Sun, R.: Application of multi-agent reinforcement learning to supply chain ordering management. In: 2010 Sixth International Conference on Natural Computation (ICNC), vol. 7, pp. 3830–3834, August 2010Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Grupo de Inteligencia Computacional (GIC)Universidad del País Vasco (UPV/EHU)San SebastiánSpain
  2. 2.ENGINE CentreWrocław University of TechnologyWrocławPoland

Personalised recommendations