Developmental Learning of Cooperative Robot Skills: A Hierarchical Multi-Agent Architecture

  • John N. Karigiannis
  • Theodoros Rekatsinas
  • Costas S. Tzafestas
Part of the Springer Series in Cognitive and Neural Systems book series (SSCNS)


Research activities targeting new methodologies, architectures and in general frameworks that will improve the design of intelligent robots attract significant attention from the research community. Self-organization problems, intrinsic behaviors as well as effective learning, and skill transfer processes in the context of robotic systems have been significantly investigated by researchers. This chapter presents a new framework of developmental skill learning process by introducing a hierarchical multi-agent architecture. More specifically, the methodology proposed is based on using reinforcement learning (RL) techniques in a fuzzified state-space, leading to a collaborative control scheme among the agents engaged in a continuous space, which enables the multi-agent system to learn, over a period of time, how to perform sequences of continuous actions in a cooperative manner without any prior task model. By organizing the agents in a nested architecture, as proposed in this work, a type of problem-specific recursive knowledge acquisition process is obtained. The agents may correspond in fact to independent degrees of freedom of the system and manage to gain experience over the task that they collaboratively perform by continuously exploring and exploiting their state-to-action mapping space. Two numerical experiments are presented, one related to dexterous manipulation and one simulated experiment concerning cooperative mobile robots. Two distinct problem settings are considered. The first one concerns the case of redundant and dextrous robot manipulation tasks, in the framework of which the problem of autonomously developing control skills is considered. Initially, a simulated redundant, four degrees-of-freedom (DoF) planar kinematic chain is considered, trying to develop the skill of accurately reaching a specified target position. In the same problem setting, a simulated three-finger manipulation example is subsequently presented, where each finger is comprised of 4 DoF performing a quasi-static grasp. For the second problem setting, the same theoretical framework is adapted in the case of two mobile robots performing a collaborative box-pushing task. This task involves two moving robots actively cooperating to jointly push an object on a plane to a specified goal location. In this case, the actuated wheels of the mobile robots are considered as the independent agents that have to build up cooperative skills over time, for the robot to demonstrate intelligent behavior. Our goal in this experimental study is to evaluate both the proposed hierarchical multi-agent architecture and the methodological control framework. Such a hierarchical multi-agent approach is envisioned to be highly scalable for the control of robotic systems that are kinematically more complex, comprising multiple DoF and redundancies in open or closed kinematic chains, particularly dexterous robot manipulators and complex biologically inspired robot locomotion systems.


Mobile Robot Reinforcement Learning Joint Action Action Selection Kinematic Chain 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. M. N. Ahmadabadi and E. Nakano, “A “Constrain and Move” Approach to Distributed Object Manipulation”, Robotics and Automation, IEEE Transactions on, 17(2), 157–172, 2001.CrossRefGoogle Scholar
  2. B. D. Argall et al., “A Survey of Robot Learning from Demonstration”, Robotics and Autonomous Systems, 2008, doi:10.1016/j.robot.2008.10.024.Google Scholar
  3. D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, MA, 1996.Google Scholar
  4. R. A. Brooks, “A Robust Layered Control System for Mobile Robots”, IEEE Journal of Robotic Automation, RA-2, 14–23, 1986.Google Scholar
  5. G. W. Brown, “Iterative Solution of Games by Fictitious Play.” In T. C. Koopmans editor, Activity Analysis of Production and Allocation, Wiley, New York, 1951.Google Scholar
  6. Y. Cao, A. S. Fukunaga, A. Kahng, and F. Meng, “Cooperative Mobile Robots: Antecedents and Directions”, In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 1, pp. 226–243, 1995.Google Scholar
  7. C. Claus and C. Boutilier, “The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems”, AAAI/IAAI, pp. 746–752, 1998.Google Scholar
  8. P. Dayan and L. F. Abbott, Theoretical Neuroscience, Computational and Mathematical Modeling of Neural Systems, MIT, Cambridge, MA, 2001.Google Scholar
  9. K. Doya, “Temporal Difference Learning in Continuous Time and Space”, Advances in Neural Information Processing Systems 8, MIT, Cambridge, MA, 1996.Google Scholar
  10. D. R. Donald, J. Jennings, and D. Rus, “Information Invariant for Distributed Manipulation”, International Journal of Robotics Research, 16(5), 673–702, 1997.CrossRefGoogle Scholar
  11. D. Fundenberg and D. M. Kreps, “Lectures on Learning and Equilibrium in Strategic Form Games”, CORE Foundation, Louvain-La-Neuve, Belgium, 1992.Google Scholar
  12. M. Iida, M. Sugisaka, and K. Shibata, “Application of Direct-Vision-Based Reinforcement Learning to a Real Mobile Robot”, Artificial Life and Robotics, 7(3), 102–106, 2004.CrossRefGoogle Scholar
  13. L. P. Kaelbling, M. L. Littman, and A. W. Moore, Reinforcement Learning: A Survey, Journal of Artificial Intelligence Research, 4, 237–285, 1996.Google Scholar
  14. O. Khatib et al., “Vehicle/Arm Coordination and Multiple Mobile Manipulator Decentralized Cooperation”, In Proceedings of the IEEE/RSJ International Conference on Intelligent Robotics and Systems, vol. 2, Osaka, Japan, pp. 546–553, 1996.Google Scholar
  15. J. R. Kok and N. Vlassis, “Sparse Tabular Multiagent Q-Learning”, Proceedings of Annual Machine Learning Conference of Benelearn 2004.Google Scholar
  16. T. Kondo and K. Ito, “A Reinforcement Learning using Adaptive State Space Construction Strategy for Real Autonomous Mobile Robots”, Robotics and Autonomous Systems, vol. 46, no.2 pp. 111–124, Elsevier, 2004.Google Scholar
  17. M. Lauer and M. Riedmiller, “Reinforcement Learning for Stochastic Cooperative Multi-Agent Systems,” aamas, pp. 1516–1517, Third International Joint Conference on Autonomous Agents and Multiagent Systems – Volume 3 (AAMAS’04), 2004.Google Scholar
  18. J. Liu et al., “Reinforcement Learning for Autonomous Robotic Fish”, Studies in Computational Intelligence (SCI), 50, 121–135, 2007.Google Scholar
  19. M. Lopes and J. Santos-Victor, “A Developmental Roadmap for Learning by Imitation in Robots”, Systems, Man, and Cybernetics Part B: Cybernetics, IEEE Transactions on, 37(2), 2007.Google Scholar
  20. M. Lungarella, G. Metta, R. Pfeifer, and G. Sandini, “Developmental Robotics: A Survey,” Connection Science, 15(4), 151–190, 2003.CrossRefGoogle Scholar
  21. T. Matsui, T. Omata, and Y. Kaniyoshi, “Multi-Agent Architecture for Controlling a Multi-finger Robot”, Proceedings of the 1992 IEEE/RSJ International Conference on Intelligent Robots and Systems, Raleigh, NC, 1992.Google Scholar
  22. M. McGlohon and S. Sen, “Learning to Cooperate in Multi-Agent Systems by Combining Q-Learning and Evolutionary Strategy”, World Conference on Lateral Computing, December 2004.Google Scholar
  23. R. B. Myerson, Game Theory: Analysis of Conflict, Harvard University Press, Cambridge, 1991.Google Scholar
  24. Y. Nakamura, Advanced Robotics: Redundancy and Optimization. Reading, MA, Addison-Wesley, 1990.Google Scholar
  25. D. Rus, “Coordinated Manipulation of Objects”, Algorithmica, 19(1), 129–147, 1997.CrossRefGoogle Scholar
  26. S. Schaal, “Is Imitation Learning the Route to Humanoid Robots”, Trends in Cognitive Sciences, 3(6), 233–242, 1999.CrossRefPubMedGoogle Scholar
  27. K. Shibata, M. Sugisaka, and K. Ito, “Fast and Stable Learning in Direct-Vision-Based Reinforcement Learning”, Proceedings of International Symposium On Artificial Life and Robotics (AROB) 6th, pp. 562–565, 2001.Google Scholar
  28. K. Shibata and Y. Okabe, “Smoothing-Evaluation Method in Delayed Reinforcement Learning”, 1995.Google Scholar
  29. K. Shibata and Y. Okabe, “A Robot that Learns an Evaluation Function for Acquiring of Appropriate Motions” World Congress on Neural Networks-San Diego, 1994 International Neural Network Society Annual Meeting, Vol. 2., pp. II. 29-II34, 1994.Google Scholar
  30. K. Shibata and K. Ito, “Effect of Force Load in Hand Reaching Movement Acquired by Reinforcement Learning”, ICONIP’02, Proceedings of the 9th International Conference on Neural Information Processing, Computational Intelligence for the E-Age, 2002.Google Scholar
  31. K. Shibata and K. Ito, “Hidden Representation After Reinforcement Learning of Hand Reaching Movement with Variable Link Length”, Proceedings of IJCNN(International Confernce on Neural Networks) 2003, 1475–674, pp. 2619–2624, 2003.7.Google Scholar
  32. Y. Shoham and M. Tennenholtz, “On the synthesis of useful social laws for artificial agent societies”, Proceedings AAAI-92, pp. 276–281, San Jose, 1992.Google Scholar
  33. R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT, Cambridge, MA, 1998.Google Scholar
  34. R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvari, and E. Wiewiora, “Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation”, 26th International Conference on Machine Learning, Montreal, Canada, 2009.Google Scholar
  35. T. Takahashi, T. Tanaka, K. Nishida, and T. Kurita, “Self-Organization of Place Cells and Reward-Based Navigation for a Mobile Robot”, ICONIP 2001.Google Scholar
  36. C. Watkins, “Learning from Delayed Rewards”, PhD Thesis, University of Cambidge, England, 1989.Google Scholar
  37. Y. Yoshikawa and X. Zheng, “Coordinated Dynamic Hybrid Position/Force Control for Multiple Robot Manipulators Handling One Constrained Object”, Int. J. Robot. Res., vol. 12, pp. 219–230, 1993.CrossRefGoogle Scholar
  38. J. Zamora, J. d. R. Millan, A. Murciano, “Specialization in Multi-Agent Systems Through Learning”, Biological Cybernetics, vol. 76, pp. 375–382, Springer, Berlin, 1997.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • John N. Karigiannis
  • Theodoros Rekatsinas
  • Costas S. Tzafestas
    • 1
  1. 1.School of Electrical and Computer Engineering, Division of Signals, Control and RoboticsNational Technical University of AthensAthensGreece

Personalised recommendations