Embodied Cognition and Multi-Agent Behavioral Emergence

  • Paul E. SilveyEmail author
  • Michael D. Norman
Conference paper
Part of the Springer Proceedings in Complexity book series (SPCOM)


Autonomous systems embedded in our physical world need real-world interaction in order to function, but they also depend on it as a means to learn. This is the essence of artificial Embodied Cognition, in which machine intelligence is tightly coupled to sensors and effectors and where learning happens from continually experiencing the dynamic world as time-series data, received and processed from a situated and contextually-relative perspective. From this stream, our engineered agents must perceptually discriminate, deal with noise and uncertainty, recognize the causal influence of their actions (sometimes with significant and variable temporal lag), pursue multiple and changing goals that are often incompatible with each other, and make decisions under time pressure. To further complicate matters, unpredictability caused by the actions of other adaptive agents makes this experiential data stochastic and statistically non-stationary. Reinforcement Learning approaches to these problems often oversimplify many of these aspects, e.g., by assuming stationarity, collapsing multiple goals into a single reward signal, using repetitive discrete training episodes, or removing real-time requirements. Because we are interested in developing dependable and trustworthy autonomy, we have been studying these problems by retaining all these inherent complexities and only simplifying the agent’s environmental bandwidth requirements. The Multi-Agent Research Basic Learning Environment (MARBLE) is a computational framework for studying the nuances of cooperative, competitive, and adversarial learning, where emergent behaviors can be better understood through carefully controlled experiments. In particular, we are using it to evaluate a novel reinforcement learning long-term memory data structure based on probabilistic suffix trees. Here, we describe this research methodology, and report on the results of some early experiments.


Embodied cognition Reinforcement learning Agent-based modeling Multi-agent systems Emergence 


Acknowledgements and Disclaimer

The authors wish to thank Jason F. Kutarnia and Brittany A. Tracy for their assistance with this research. Approved for Public Release; Distribution Unlimited. Case Number 18-1473.


  1. 1.
  2. 2.
    Anderson, M.L.: Embodied cognition: a field guide. Artif. Intell. 149(1), 91–130 (2003)CrossRefGoogle Scholar
  3. 3.
    Axelrod, R., Hamilton, W.D.: The evolution of cooperation. Science 211(4489), 1390–1396 (1981)ADSMathSciNetCrossRefGoogle Scholar
  4. 4.
    Bach, J.: Principles of Synthetic Intelligence PSI: An Architecture of Motivated Cognition, vol. 4. Oxford University Press, Oxford (2009)Google Scholar
  5. 5.
    Begleiter, R., El-Yaniv, R., Yona, G.: On prediction using variable order Markov models. J. Artif. Intell. Res. 22, 385–421 (2004)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: OpenAI Gym (2016).
  7. 7.
    Brooks, R.: A robust layered control system for a mobile robot. IEEE J. Robot. Autom. 2(1), 14–23 (1986)CrossRefGoogle Scholar
  8. 8.
    Chung, M., Buro, M., Schaeffer, J.: Monte Carlo planning in RTS games. In: Proceedings of IEEE 2005 Symposium on Computational Intelligence and Games, pp. 117–125 (2005)Google Scholar
  9. 9.
    Coad, P.: Object-oriented patterns. Commun. ACM 35(9), 152–159 (1992)CrossRefGoogle Scholar
  10. 10.
    Dean, T.L., Boddy, M.S.: An analysis of time-dependent planning. In: Proceedings of the Seventh AAAI National Conference on Artificial Intelligence, vol. 88, pp. 49–54. AAAI Press, Saint Paul (1988)Google Scholar
  11. 11.
    Domingos, P.: The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World. Basic Books, New York (2015)Google Scholar
  12. 12.
    Hawkins, J., Blakeslee, S.: On Intelligence: How a New Understanding of the Brain Will Lead to the Creation of Truly Intelligent Machines. Macmillan, London (2007)Google Scholar
  13. 13.
    Jennings, N.R., Sycara, K., Wooldridge, M.: A roadmap of agent research and development. Auton. Agent. Multi Agent Syst. 1(1), 7–38 (1998)CrossRefGoogle Scholar
  14. 14.
    Laird, J.E., Newell, A., Rosenbloom, P.S.: Soar: an architecture for general intelligence. Artif. Intell. 33(1), 1–64 (1987)CrossRefGoogle Scholar
  15. 15.
    Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M., Bowling, M.: Revisiting the arcade learning environment: evaluation protocols and open problems for general agents. J. Artif. Intell. Res. 61, 523–562 (2018)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Mitchell, M.: Complexity: A Guided Tour. Oxford University Press, Oxford (2009)Google Scholar
  17. 17.
    Mukherjee, S.: The Gene: An Intimate History. Simon and Schuster, New York (2017)Google Scholar
  18. 18.
    Nguyen, P., Sunehag, P., Hutter, M.: Context tree maximizing reinforcement learning. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence. Association for the Advancement of Artificial Intelligence (2012)Google Scholar
  19. 19.
    Norman, M.D., Koehler, M.T., Pitsko, R.: Applied complexity science: enabling emergence through heuristics and simulations. In: Mittal, S., Diallo, S., Tolk, A. (eds.) Emergent Behavior in Complex Systems Engineering: A Modeling and Simulation Approach, pp. 201–226. Wiley, Hoboken (2018)CrossRefGoogle Scholar
  20. 20.
    Ontañón, S., Barriga, N.A., Silva, C.R., Moraes, R.O., Lelis, L.H.: The first microRTS artificial intelligence competition. AI Mag. 39(1), 75–83 (2018)CrossRefGoogle Scholar
  21. 21.
    Patel, A.: Red blob games, hexagonal grid reference.
  22. 22.
    Schank, R.C.: Dynamic Memory Revisited. Cambridge University Press, New York (1999)Google Scholar
  23. 23.
    Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., Hassabis, D.: Mastering the game of Go without human knowledge. Nature 550(7676), 354–371 (2017)ADSCrossRefGoogle Scholar
  24. 24.
    Silvey, P.E.: Leveling up: strategies to achieve integrated cognitive architectures. In: Fall Symposium Series - A Standard Model of Mind: AAAI Technical Report FS-17-05, AAAI 2017, pp. 460–465 (2017)Google Scholar
  25. 25.
    Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)Google Scholar
  26. 26.
    Volf, P.A., Willems, F.M.: A study of the context tree maximizing method. In: Proceedings of 16th Benelux Symposium on Information Theory, Nieuwerkerk Ijsel, Netherlands, pp. 3–9 (1995)Google Scholar
  27. 27.
    Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge (1989)Google Scholar
  28. 28.
    Wilson, M.: Six views of embodied cognition. Psychon. Bull. Rev. 9(4), 625–636 (2002)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.The MITRE CorporationBedfordUSA

Personalised recommendations