Swarm Intelligence

, Volume 12, Issue 1, pp 23–51 | Cite as

Reinforcement learning in a continuum of agents

  • Adrian Šošić
  • Abdelhak M. Zoubir
  • Heinz Koeppl
Article

Abstract

We present a decision-making framework for modeling the collective behavior of large groups of cooperatively interacting agents based on a continuum description of the agents’ joint state. The continuum model is derived from an agent-based system of locally coupled stochastic differential equations, taking into account that each agent in the group is only partially informed about the global system state. The usefulness of the proposed framework is twofold: (i) for multi-agent scenarios, it provides a computational approach to handling large-scale distributed decision-making problems and learning decentralized control policies. (ii) For single-agent systems, it offers an alternative approximation scheme for evaluating expectations of state distributions. We demonstrate our framework on a variant of the Kuramoto model using a variety of distributed control tasks, such as positioning and aggregation. As part of our experiments, we compare the effectiveness of the controllers learned by the continuum model and agent-based systems of different sizes, and we analyze how the degree of observability in the system affects the learning process.

Keywords

Reinforcement learning Multi-agent systems Decentralized control Collective behavior Swarm intelligence Active particles Continuum mechanics 

Notes

Acknowledgements

H. Koeppl gratefully acknowledges support from the German Research Foundation (DFG) within the Collaborative Research Center (CRC) 1053 - MAKI.

References

  1. Abelson, H., Allen, D., Coore, D., Hanson, C., Homsy, G., Knight, T. F., et al. (2000). Amorphous computing. Communications of the ACM, 43(5), 74–82.CrossRefGoogle Scholar
  2. Aumann, R. J. (1964). Markets with a continuum of traders. Econometrica, 32(1), 39–50.MathSciNetCrossRefMATHGoogle Scholar
  3. Beal, J. (2005). Programming an amorphous computational medium. In J. P Banâtre, P. Fradet, J. L. Giavitto, & O. Michel (Eds.), Unconventional programming paradigms (pp. 121–136). Berlin: Springer.Google Scholar
  4. Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4), 819–840.MathSciNetCrossRefMATHGoogle Scholar
  5. Billingsley, P. (1999). Convergence of probability measures. New York: Wiley.CrossRefMATHGoogle Scholar
  6. Brambilla, M., Ferrante, E., Birattari, M., & Dorigo, M. (2013). Swarm robotics: A review from the swarm engineering perspective. Swarm Intelligence, 7(1), 1–41.CrossRefGoogle Scholar
  7. Correll, N., & Martinoli, A. (2006). System identification of self-organizing robotic swarms. In M. Gini & R. Voyles (Eds.) Distributed autonomous robotic systems 7 (pp. 31–40). Tokyo: Springer Japan.Google Scholar
  8. Couzin, I. D., Krause, J., James, R., Ruxton, G. D., & Franks, N. R. (2002). Collective memory and spatial sorting in animal groups. Journal of Theoretical Biology, 218(1), 1–11.MathSciNetCrossRefGoogle Scholar
  9. Crutchfield, J. P., & Mitchell, M. (1995). The evolution of emergent computation. Proceedings of the National Academy of Sciences, 92(23), 10742–10746.CrossRefMATHGoogle Scholar
  10. Dean, D. S. (1996). Langevin equation for the density of a system of interacting Langevin processes. Journal of Physics A: Mathematical and General, 29(24), L613.MathSciNetCrossRefGoogle Scholar
  11. Deisenroth, M. P., Neumann, G., & Peters, J. (2013). A survey on policy search for robotics. Foundations and Trends in Robotics, 2(1–2), 1–142.Google Scholar
  12. Doucet, A., Godsill, S., & Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10(3), 197–208.CrossRefGoogle Scholar
  13. Dubkov, A., & Spagnolo, B. (2005). Generalized Wiener process and Kolmogorov’s equation for diffusion induced by non-Gaussian noise source. Fluctuation and Noise Letters, 5(02), L267–L274.MathSciNetCrossRefGoogle Scholar
  14. Ermentrout, G. B., & Edelstein-Keshet, L. (1993). Cellular automata approaches to biological modeling. Journal of Theoretical Biology, 160(1), 97–133.CrossRefGoogle Scholar
  15. Fornberg, B., & Flyer, N. (2015). Solving PDEs with radial basis functions. Acta Numerica, 24, 215–258.MathSciNetCrossRefMATHGoogle Scholar
  16. Freitas, R. A. (2005). Current status of nanomedicine and medical nanorobotics. Journal of Computational and Theoretical Nanoscience, 2(1), 1–25.Google Scholar
  17. Grondman, I., Busoniu, L., Lopes, G. A. D., & Babuska, R. (2012). A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 1291–1307.CrossRefGoogle Scholar
  18. Hamann, H. (2014). Evolution of collective behaviors by minimizing surprise. In Proceedings of the 14th international conference on the synthesis and simulation of living systems (pp. 344–351). MIT Press.Google Scholar
  19. Hamann, H., & Wörn, H. (2008). A framework of space–time continuous models for algorithm design in swarm robotics. Swarm Intelligence, 2(2), 209–239.CrossRefGoogle Scholar
  20. Hayes, A. T. (2002). How many robots? Group size and efficiency in collective search tasks. In H. Asama, T. Arai, T. Fukuda, & T. Hasegawa (Eds.), Distributed autonomous robotic systems 5 (pp. 289–298). Tokyo: Springer Japan.Google Scholar
  21. Houchmandzadeh, B., & Vallade, M. (2015). Exact results for a noise-induced bistable system. Physical Review E, 91(2), 022115.MathSciNetCrossRefGoogle Scholar
  22. Hüttenrauch, M., Šošić, A., & Neumann, G. (2017). Guided deep reinforcement learning for swarm systems. In AAMAS workshop on autonomous robots and multirobot systems. arXiv:1709.06011.
  23. Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1), 99–134.MathSciNetCrossRefMATHGoogle Scholar
  24. Karatzas, I., & Shreve, S. (1998). Brownian motion and stochastic calculus. Berlin: Springer Science & Business Media.CrossRefMATHGoogle Scholar
  25. Krylov, N. V. (2008). Controlled diffusion processes. Berlin: Springer Science & Business Media.Google Scholar
  26. Kuramoto, Y. (1975). Self-entrainment of a population of coupled non-linear oscillators. In International symposium on mathematical problems in theoretical physics (pp. 420–422). Springer.Google Scholar
  27. Land, M., & Belew, R. K. (1995). No perfect two-state cellular automata for density classification exists. Physical Review Letters, 74(25), 5148.CrossRefGoogle Scholar
  28. Lasry, J.-M., & Lions, P.-L. (2007). Mean field games. Japanese Journal of Mathematics, 2(1), 229–260.MathSciNetCrossRefMATHGoogle Scholar
  29. Lerman, K., Martinoli, A., & Galstyan, A. (2005). A review of probabilistic macroscopic models for swarm robotic systems. In Swarm robotics: SAB 2004 international workshop (pp. 143–152). Berlin: Springer.Google Scholar
  30. Lesser, V., Ortiz, C. L., & Tambe, M. (2003). Distributed sensor networks: A multiagent perspective. Berlin: Springer Science & Business Media.CrossRefMATHGoogle Scholar
  31. MacLennan, B. J. (1990). Continuous spatial automata. Technical report, University of Tennessee, Computer Science Department.Google Scholar
  32. Macua, S. V., Chen, J., Zazo, S., & Sayed, A. H. (2015). Distributed policy evaluation under multiple behavior strategies. IEEE Transactions on Automatic Control, 60(5), 1260–1274.MathSciNetCrossRefMATHGoogle Scholar
  33. Martinoli, A., Ijspeert, A. J., & Mondada, F. (1999). Understanding collective aggregation mechanisms: From probabilistic modelling to experiments with real robots. Robotics and Autonomous Systems, 29(1), 51–63.CrossRefGoogle Scholar
  34. Michini, B., & How, J. P. (2012). Bayesian nonparametric inverse reinforcement learning. In P. A. Flach, T. De Bie, & N. Cristianini (Eds.), Machine learning and knowledge discovery in databases (pp. 148–163). Berlin: Springer.Google Scholar
  35. Munos, R. (2006). Policy gradient in continuous time. Journal of Machine Learning Research, 7, 771–791.MathSciNetMATHGoogle Scholar
  36. Ohkubo, J., Shnerb, N., & Kessler, D. A. (2008). Transition phenomena induced by internal noise and quasi-absorbing state. Journal of the Physical Society of Japan, 77(4), 044002.CrossRefGoogle Scholar
  37. Ramaswamy, S. (2010). The mechanics and statistics of active matter. Annual Review of Condensed Matter Physics, 1(1), 323–345.MathSciNetCrossRefGoogle Scholar
  38. Risken, H. (1996). Fokker–Planck equation. In H. Haken (Ed.) The Fokker–Planck equation (pp. 63–95). Berlin, Heidelberg: Springer.Google Scholar
  39. Schweitzer, F. (2003). Brownian agents and active particles: Collective dynamics in the natural and social sciences. Berlin, Heidelberg: Springer.MATHGoogle Scholar
  40. Sehnke, F., Osendorfer, C., Rückstieß, T., Graves, A., Peters, J., & Schmidhuber, J. (2010). Parameter-exploring policy gradients. Neural Networks, 23(4), 551–559.CrossRefGoogle Scholar
  41. Sipper, M. (1999). The emergence of cellular computing. Computer, 32(7), 18–26.CrossRefGoogle Scholar
  42. Šošić, A., KhudaBukhsh, W. R., Zoubir, A. M., Koeppl, H. (2017). Inverse reinforcement learning in swarm systems. In Proceedings of the 16th international conference on autonomous agents and multiagent systems (pp. 1413–1421). International Foundation for Autonomous Agents and Multiagent Systems.Google Scholar
  43. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.Google Scholar
  44. Vicsek, T., Czirók, A., Ben-Jacob, E., Cohen, I., & Shochet, O. (1995). Novel type of phase transition in a system of self-driven particles. Physical Review Letters, 75(6), 1226–1229.MathSciNetCrossRefGoogle Scholar
  45. Whitesides, G. M., & Grzybowski, B. (2002). Self-assembly at all scales. Science, 295(5564), 2418–2421.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Department of Electrical Engineering and Information TechnologyTechnische Universität DarmstadtDarmstadtGermany

Personalised recommendations