Reinforcement learning in a continuum of agents
We present a decision-making framework for modeling the collective behavior of large groups of cooperatively interacting agents based on a continuum description of the agents’ joint state. The continuum model is derived from an agent-based system of locally coupled stochastic differential equations, taking into account that each agent in the group is only partially informed about the global system state. The usefulness of the proposed framework is twofold: (i) for multi-agent scenarios, it provides a computational approach to handling large-scale distributed decision-making problems and learning decentralized control policies. (ii) For single-agent systems, it offers an alternative approximation scheme for evaluating expectations of state distributions. We demonstrate our framework on a variant of the Kuramoto model using a variety of distributed control tasks, such as positioning and aggregation. As part of our experiments, we compare the effectiveness of the controllers learned by the continuum model and agent-based systems of different sizes, and we analyze how the degree of observability in the system affects the learning process.
KeywordsReinforcement learning Multi-agent systems Decentralized control Collective behavior Swarm intelligence Active particles Continuum mechanics
H. Koeppl gratefully acknowledges support from the German Research Foundation (DFG) within the Collaborative Research Center (CRC) 1053 - MAKI.
- Beal, J. (2005). Programming an amorphous computational medium. In J. P Banâtre, P. Fradet, J. L. Giavitto, & O. Michel (Eds.), Unconventional programming paradigms (pp. 121–136). Berlin: Springer.Google Scholar
- Correll, N., & Martinoli, A. (2006). System identification of self-organizing robotic swarms. In M. Gini & R. Voyles (Eds.) Distributed autonomous robotic systems 7 (pp. 31–40). Tokyo: Springer Japan.Google Scholar
- Deisenroth, M. P., Neumann, G., & Peters, J. (2013). A survey on policy search for robotics. Foundations and Trends in Robotics, 2(1–2), 1–142.Google Scholar
- Freitas, R. A. (2005). Current status of nanomedicine and medical nanorobotics. Journal of Computational and Theoretical Nanoscience, 2(1), 1–25.Google Scholar
- Hamann, H. (2014). Evolution of collective behaviors by minimizing surprise. In Proceedings of the 14th international conference on the synthesis and simulation of living systems (pp. 344–351). MIT Press.Google Scholar
- Hayes, A. T. (2002). How many robots? Group size and efficiency in collective search tasks. In H. Asama, T. Arai, T. Fukuda, & T. Hasegawa (Eds.), Distributed autonomous robotic systems 5 (pp. 289–298). Tokyo: Springer Japan.Google Scholar
- Hüttenrauch, M., Šošić, A., & Neumann, G. (2017). Guided deep reinforcement learning for swarm systems. In AAMAS workshop on autonomous robots and multirobot systems. arXiv:1709.06011.
- Krylov, N. V. (2008). Controlled diffusion processes. Berlin: Springer Science & Business Media.Google Scholar
- Kuramoto, Y. (1975). Self-entrainment of a population of coupled non-linear oscillators. In International symposium on mathematical problems in theoretical physics (pp. 420–422). Springer.Google Scholar
- Lerman, K., Martinoli, A., & Galstyan, A. (2005). A review of probabilistic macroscopic models for swarm robotic systems. In Swarm robotics: SAB 2004 international workshop (pp. 143–152). Berlin: Springer.Google Scholar
- MacLennan, B. J. (1990). Continuous spatial automata. Technical report, University of Tennessee, Computer Science Department.Google Scholar
- Michini, B., & How, J. P. (2012). Bayesian nonparametric inverse reinforcement learning. In P. A. Flach, T. De Bie, & N. Cristianini (Eds.), Machine learning and knowledge discovery in databases (pp. 148–163). Berlin: Springer.Google Scholar
- Risken, H. (1996). Fokker–Planck equation. In H. Haken (Ed.) The Fokker–Planck equation (pp. 63–95). Berlin, Heidelberg: Springer.Google Scholar
- Šošić, A., KhudaBukhsh, W. R., Zoubir, A. M., Koeppl, H. (2017). Inverse reinforcement learning in swarm systems. In Proceedings of the 16th international conference on autonomous agents and multiagent systems (pp. 1413–1421). International Foundation for Autonomous Agents and Multiagent Systems.Google Scholar
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.Google Scholar