Autonomous Agents and Multi-Agent Systems

, Volume 33, Issue 1–2, pp 216–274 | Cite as

A probabilistic argumentation framework for reinforcement learning agents

Towards a mentalistic approach to agent profiles
  • Régis RiveretEmail author
  • Yang Gao
  • Guido Governatori
  • Antonino Rotolo
  • Jeremy Pitt
  • Giovanni Sartor


A bounded-reasoning agent may face two dimensions of uncertainty: firstly, the uncertainty arising from partial information and conflicting reasons, and secondly, the uncertainty arising from the stochastic nature of its actions and the environment. This paper attempts to address both dimensions within a single unified framework, by bringing together probabilistic argumentation and reinforcement learning. We show how a probabilistic rule-based argumentation framework can capture Markov decision processes and reinforcement learning agents; and how the framework allows us to characterise agents and their argument-based motivations from both a logic-based perspective and a probabilistic perspective. We advocate and illustrate the use of our approach to capture models of agency and norms, and argue that, in addition to providing a novel method for investigating agent types, the unified framework offers a sound basis for taking a mentalistic approach to agent profiles.


Probabilistic argumentation Markov decision process Reinforcement learning Norms 



We would like to thank Pietro Baroni for his insights in argumentation. This work was supported by the Marie Curie Intra-European Fellowship PIEFGA-2012-331472.


  1. 1.
    Alexy, R. (1989). A theory of legal argumentation: The theory of rational discourse as theory of legal justification. Oxford: Clarendon.Google Scholar
  2. 2.
    Amgoud, L. (2009). Argumentation for decision making. In Argumentation in artificial intelligence (pp. 301–320). Springer.Google Scholar
  3. 3.
    Artikis, A., Sergot, M., & Pitt, J. (2009). Specifying norm-governed computational societies. ACM Transactions on Computational Logic, 10(1), 1:1–1:42.MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Artikis, A., Sergot, M., Pitt, J., Busquets, D., & Riveret, R. (2016). Specifying and executing open multi-agent systems. In Social coordination frameworks for social technical systems (pp. 197–212). Springer.Google Scholar
  5. 5.
    Atkinson, K., Baroni, P., Giacomin, M., Hunter, A., Prakken, H., Reed, C., et al. (2017). Towards artificial argumentation. AI Magazine, 38(3), 25–36.CrossRefGoogle Scholar
  6. 6.
    Atkinson, K., & Bench-Capon, T. J. M. (2007). Practical reasoning as presumptive argumentation using action based alternating transition systems. Artificial Intellignence, 171(10–15), 855–874.MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Baroni, P., Caminada, M., & Giacomin, M. (2011). An introduction to argumentation semantics. The Knowledge Engineering Review, 26(4), 365–410.CrossRefGoogle Scholar
  8. 8.
    Baroni, P., Governatori, G., & Riveret, R. (2016). On labelling statements in multi-labelling argumentation. In Proceedings of the 22nd European conference on artificial intelligence (Vol. 285, pp. 489–497). IOS Press.Google Scholar
  9. 9.
    Bellman, R. (1956). Dynamic programming and Lagrange multipliers. Proceedings of the National Academy of Sciences of the United States of America, 42(10), 767.MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Bench-Capon, T. J. M., & Atkinson, K. (2009). Abstract argumentation and values. In L. Rahwan & G. Simari (eds.) Argumentation in artificial intelligence. Springer.Google Scholar
  11. 11.
    Bertsekas, D. P. (1995). Dynamic programming and optimal control (Vol. 1). Belmont, MA: Athena Scientific.zbMATHGoogle Scholar
  12. 12.
    Besnard, P., García, A. J., Hunter, A., Modgil, S., Prakken, H., Simari, G. R., et al. (2014). Introduction to structured argumentation. Argument & Computation, 5(1), 1–4.CrossRefGoogle Scholar
  13. 13.
    Broersen, J., Dastani, M., Hulstijn, J., & van der Torre, L. (2002). Goal generation in the BOID architecture. Cognitive Science Quarterly, 2(3–4), 428–447.Google Scholar
  14. 14.
    Chen, S. H., & Huang, Y. C. (2005). Risk preference and survival dynamics. In: Agent-based simulation: From modeling methodologies to real-world applications, Agent-based social systems (Vol. 1, pp. 135–143). Tokyo: Springer.Google Scholar
  15. 15.
    Conte, R., & Castelfranchi, C. (1995). Cognitive and social action. London: University College of London Press.Google Scholar
  16. 16.
    Conte, R., & Castelfranchi, C. (2006). The mental path of norms. Ratio Juris, 19, 501–517.CrossRefGoogle Scholar
  17. 17.
    Conte, R., Falcone, R., & Sartor, G. (1999). Introduction: Agents and norms: How to fill the gap? Artificial Intelligence and Law, 7(1), 1–15.CrossRefGoogle Scholar
  18. 18.
    Cormen, T. H., Leiserson, C. E., Rivest, R. L., Stein, C., et al. (2001). Introduction to algorithms (Vol. 2). Cambridge: MIT press.zbMATHGoogle Scholar
  19. 19.
    Dung, P. M. (1995). On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artificial Intelligence, 77(2), 321–358.MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Edmonds, B. (2004). How formal logic can fail to be useful for modelling or designing mas. In Regulated agent-based social systems, Lecture Notes in Computer Science (Vol. 2934, pp. 1–15). Springer.Google Scholar
  21. 21.
    Fasli, M. (2004). Formal systems and agent-based social simulation equals null? Journal of Artificial Societies and Social Simulation, 7(4), 1–7.Google Scholar
  22. 22.
    Fornara, N., & Colombetti, M. (2009). Specifying and enforcing norms in artificial institutions. In Declarative agent languages and technologies VI, Lecture Notes in Computer Science (Vol. 5397, pp. 1–17). Springer.Google Scholar
  23. 23.
    Fox, J., & Parsons, S. (1997). On using arguments for reasoning about actions and values. In Proceedings of the AAAI spring symposium on qualitative preferences in deliberation and practical reasoning.Google Scholar
  24. 24.
    Gao, Y., & Toni, F. (2014). Argumentation accelerated reinforcement learning for cooperativeulti-agent systems. In Proceedings of 21st European conference on artificial intelligence (pp. 333–338). IOS Press.Google Scholar
  25. 25.
    Gao, Y., Toni, F., & Craven, R. (2012). Argumentation-based reinforcement learning for robocup soccer keepaway. In Proceedings of 20th European conference on artificial intelligence (pp. 342–347). IOS Press.Google Scholar
  26. 26.
    Gaudou, B., Lorini, E., & Mayor, E. (2013). Moral guilt: An agent-based model analysis. In Advances in social simulation—Proceedings of the 9th conference of the european social simulation association (pp. 95–106).Google Scholar
  27. 27.
    Governatori, G., & Rotolo, A. (2008). BIO logical agents: Norms, beliefs, intentions in defeasible logic. Autonomous Agents and Multi-Agent Systems, 17(1), 36–69.CrossRefGoogle Scholar
  28. 28.
    Hunter, A., & Thimm, M. (2017). Probabilistic reasoning with abstract argumentation frameworks. Journal of Artificial Intelligence Research, 59, 565–611.MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Koller, D., & Friedman, N. (2009). Probabilistic graphical models: Principles and techniques—Adaptive computation and machine learning. Cambridge: The MIT Press.Google Scholar
  30. 30.
    Kostrikin, A. I., Manin, Y. I., & Alferieff, M. E. (1997). Linear algebra and geometry. Washington, DC: Gordon and Breach Science Publishers.Google Scholar
  31. 31.
    Modgil, S., & Caminada, M. (2009). Proof theories and algorithms for abstract argumentation frameworks. In Argumentation in artificial intelligence (pp. 105–129). Springer.Google Scholar
  32. 32.
    Muller, J., & Hunter, A. (2012). An argumentation-based approach for decision making. In 24th international conference on tools with artificial intelligence (Vol. 1, pp. 564–571). IEEE.Google Scholar
  33. 33.
    Ng, A., Harada, D., & Russell, S. (1999). Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of 16th international conference on machine learning (pp. 278–287).Google Scholar
  34. 34.
    Ng, A. Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., & Liang, E. (2006). Autonomous inverted helicopter flight via reinforcement learning. In Experimental robotics IX (pp. 363–372). Springer.Google Scholar
  35. 35.
    Oren, N. (2014). Argument schemes for normative practical reasoning (pp. 63–78). Berlin: Springer.zbMATHGoogle Scholar
  36. 36.
    Parsons, S., & Fox, J. (1996). Argumentation and decision making: A position paper. In Practical reasoning (pp. 705–709). Springer.Google Scholar
  37. 37.
    Pattaro, E. (2005). The law and the right. In E. Pattaro (Ed.), Treatise of legal philosophy and general jurisprudence (Vol. 1). Berlin: Springer.Google Scholar
  38. 38.
    Pollock, J. L. (1995). Cognitive carpentry: A blueprint for how to build a person. Cambridge, MA: MIT Press.Google Scholar
  39. 39.
    Prakken, H. (2006). Combining sceptical epistemic reasoning with credulous practical reasoning. In Proceedings of the 1st conference on computational models of argument (pp. 311–322). IOS Press.Google Scholar
  40. 40.
    Prakken, H. (2011). An abstract framework for argumentation with structured arguments. Argument and Computation, 1(2), 93–124.CrossRefGoogle Scholar
  41. 41.
    Prakken, H., & Sartor, G. (1997). Argument-based extended logic programming with defeasible priorities. Journal of Applied Non-Classical Logics, 7(1–2), 25–75.MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Prakken, H., & Sartor, G. (2015). Law and logic: A review from an argumentation perspective. Artificial Intelligence, 227, 214–245.MathSciNetCrossRefzbMATHGoogle Scholar
  43. 43.
    Rahwan, I., & Simari, G. R. (Eds.). (2009). Argumentation in artificial Intelligence. Berlin: Springer.Google Scholar
  44. 44.
    Riveret, R., Baroni, P., Gao, Y., Governatori, G., Rotolo, A., & Sartor, G. (2018). A labelling framework for probabilistic argumentation. Annals of Mathamatics and Artificial Intelligence, 83(1), 21–71.MathSciNetCrossRefzbMATHGoogle Scholar
  45. 45.
    Riveret, R., Korkinof, D., Draief, M., & Pitt, J. V. (2015). Probabilistic abstract argumentation: An investigation with boltzmann machines. Argumentation & Computation, 6(2), 178–218.CrossRefGoogle Scholar
  46. 46.
    Riveret, R., Pitt, J. V., Korkinof, D., & Draief, M. (2015). Neuro-symbolic agents: Boltzmann machines and probabilistic abstract argumentation with sub-arguments. In Proceedings of the 14th international conference on autonomous agents and multiagent systems (pp. 1481–1489). ACM.Google Scholar
  47. 47.
    Riveret, R., Rotolo, A., & Sartor, G. (2012). Probabilistic rule-based argumentation for norm-governed learning agents. Artificial Intelligence and Law, 20(4), 383–420.CrossRefzbMATHGoogle Scholar
  48. 48.
    Ross, A. (1958). On law and justice. London: Stevens.Google Scholar
  49. 49.
    Rummery, G. A., & Niranjan, M. (1994). On-line Q-learning using connectionist systems. Technical report. University of Cambridge.Google Scholar
  50. 50.
    Sartor, G. (2005). Legal reasoning: A cognitive approach to the law. Berlin: Springer.Google Scholar
  51. 51.
    Shams, Z., Vos, M. D., Oren, N., Padget, J., & Satoh, K. (2015). Argumentation-based normative practical reasoning. In Proceedings of the 3rd international workshop on theory and applications of formal argumentation, revised selected papers (pp. 226–242). Springer.Google Scholar
  52. 52.
    Simari, G. I., Shakarian, P., & Falappa, M. A. (2016). A quantitative approach to belief revision in structured probabilistic argumentation. Annals of Mathematics and Artificial Intelligence, 76(3), 375–408.MathSciNetCrossRefzbMATHGoogle Scholar
  53. 53.
    Stone, P., Sutton, R. S., & Kuhlmann, G. (2005). Reinforcement learning for robocup soccer keepaway. Adaptive Behavior, 13, 165–188.CrossRefGoogle Scholar
  54. 54.
    Sutton, R. S., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.zbMATHGoogle Scholar
  55. 55.
    Tadepalli, P., Givan, R., & Driessens, K. (2004). Relational reinforcement learning: An overview. In Proceedings of the ICML04 workshop on relational reinforcement learning.Google Scholar
  56. 56.
    van der Hoek, W., Roberts, M., & Wooldridge, M. (2007). Social laws in alternating time: Effectiveness, feasibility, and synthesis. Synthese, 156(1), 1–19.MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Data61 - CSIROBrisbaneAustralia
  2. 2.Technische Universität DarmstadtDarmstadtGermany
  3. 3.University of BolognaBolognaItaly
  4. 4.Imperial College LondonLondonUK
  5. 5.European University InstituteFiesoleItaly

Personalised recommendations