Multi-agent Case-Based Reasoning for Cooperative Reinforcement Learners

  • Thomas Gabel
  • Martin Riedmiller
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4106)


In both research fields, Case-Based Reasoning and Reinforcement Learning, the system under consideration gains its expertise from experience. Utilizing this fundamental common ground as well as further characteristics and results of these two disciplines, in this paper we develop an approach that facilitates the distributed learning of behaviour policies in cooperative multi-agent domains without communication between the learning agents. We evaluate our algorithms in a case study in reactive production scheduling.


Reinforcement Learning Elementary Action Multiagent System Markov Decision Process Independent Learner 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bertsekas, D., Tsitsiklis, J.: Neuro Dynamic Programming. Athena Scientific, Belmont (1996)zbMATHGoogle Scholar
  2. 2.
    Bowling, M., Veloso, M.: Simultaneous Adversarial Multi-Robot Learning. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico, pp. 699–704. Morgan Kaufmann, San Francisco (2003)Google Scholar
  3. 3.
    Bridge, D.: The Virtue of Reward: Performance, Reinforcement and Discovery in Case-Based Reasoning. In: Muñoz-Ávila, H., Ricci, F. (eds.) ICCBR 2005. LNCS (LNAI), vol. 3620, p. 1. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  4. 4.
    Claus, C., Boutilier, C.: The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI 1998). AAAI Press, Menlo Park (1998)Google Scholar
  5. 5.
    Gabel, T., Riedmiller, M.: CBR for State Value Function Approximation in Reinforcement Learning. In: Muñoz-Ávila, H., Ricci, F. (eds.) ICCBR 2005. LNCS (LNAI), vol. 3620, pp. 206–221. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. 6.
    Gabel, T., Riedmiller, M.: Reducing Policy Degradation in Neuro-Dynamic Programming. In: Proceedings of ESANN 2006, Bruges, Belgium (to appear, 2006)Google Scholar
  7. 7.
    Hu, J., Wellman, M.: Nash Q-Learning for General-Sum Stochastic Games. Journal of Machine Learning Research 4, 1039–1069 (2003)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Kim, J., Seong, D., Jung, S., Park, J.: Integrated CBR Framework for Quality Designing and Scheduling in Steel Industry. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 645–658. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Lauer, M., Riedmiller, M.: Reinforcement Learning for Stochastic Cooperative Multi-Agent Systems. In: AAMAS 2004, pp. 1514–1515. ACM Press, New York (2004)Google Scholar
  10. 10.
    Leake, D., Sooriamurthi, R.: Managing Multiple Case Bases: Dimensions and Issues. In: FLAIRS Conference, Pensacola Beach, pp. 106–110. AAAI Press, Menlo Park (2002)Google Scholar
  11. 11.
    Littman, M.: Friend-or-Foe Q-learning in General-Sum Games. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williamstown, USA, pp. 322–328. Morgan Kaufman, San Francisco (2001)Google Scholar
  12. 12.
    Louis, S., McDonnell, J.: Learning with Case-Injected Genetic Algorithms. IEEE Trans. Evolutionary Computation 8(4), 316–328 (2004)CrossRefGoogle Scholar
  13. 13.
    Macedo, L., Cardoso, A.: Using CBR in the Exploration of Unknown Environments with an Autonomous Agent. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 272–286. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    Ontanon, S., Plaza, E.: Collaborative Case Retention Strategies for CBR Agents. In: Ashley, K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS, vol. 2689, pp. 392–406. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  15. 15.
    Ontanon, S., Plaza, E.: Cooperative Reuse for Compositional Cases in Multi-agent Systems. In: Muñoz-Ávila, H., Ricci, F. (eds.) ICCBR 2005. LNCS (LNAI), vol. 3620, pp. 382–396. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    Pinedo, M.: Scheduling. Theory, Algorithms, and Systems. Prentice Hall, Englewood Cliffs (2002)zbMATHGoogle Scholar
  17. 17.
    Powell, J., Hauff, B., Hastings, J.: Evaluating the Effectiveness of Exploration and Accumulated Experience in Automatic Case Elicitation. In: Muñoz-Ávila, H., Ricci, F. (eds.) ICCBR 2005. LNCS (LNAI), vol. 3620, pp. 397–407. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  18. 18.
    Riedmiller, M., Merke, A.: Using Machine Learning Techniques in Complex Multi-Agent Domains. In: Stamatescu, I., Menzel, W., Richter, M., Ratsch, U. (eds.) Adaptivity and Learning. Springer, Heidelberg (2003)Google Scholar
  19. 19.
    Riedmiller, S., Riedmiller, M.: A Neural Reinforcement Learning Approach to Learn Local Dispatching Policies in Production Scheduling. In: Proceedings of ICJAI 1999, Stockholm, Sweden, pp. 764–771 (1999)Google Scholar
  20. 20.
    Santamaria, J., Sutton, R., Ram, A.: Experiments with RL in Problems with Continuous State and Action Spaces. Adaptive Behavior 6(2), 163–217 (1998)CrossRefGoogle Scholar
  21. 21.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning. An Introduction. MIT Press/A Bradford Book, Cambridge (1998)Google Scholar
  22. 22.
    Szer, D., Charpillet, F.: Coordination through Mutual Notification in Cooperative Multiagent Reinforcement Learning. In: Proceedings of AAMAS 2004, New York, USA, pp. 1254–1255. IEEE Computer Society, Los Alamitos (2004)Google Scholar
  23. 23.
    Tesauro, G.: Extending Q-Learning to General Adaptive Multi-Agent Systems. In: Proceedings of NIPS 2003, Vancouver and Whistler, Canada. MIT Press, Cambridge (2003)Google Scholar
  24. 24.
    Tinkler, P., Fox, J., Green, C., Rome, D., Casey, K., Furmanski, C.: Analogical and Case-Based Reasoning for Predicting Satellite Task Schedulability. In: Muñoz-Ávila, H., Ricci, F. (eds.) ICCBR 2005. LNCS (LNAI), vol. 3620, pp. 566–578. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  25. 25.
    Uther, W., Veloso, M.: Adversarial Reinforcement Learning. Technical Report CMU-CS-03-107, School of Computer Science, Carnegie Mellon University (2003)Google Scholar
  26. 26.
    Watkins, C., Dayan, P.: Q-Learning. Machine Learning 8, 279–292 (1992)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Thomas Gabel
    • 1
  • Martin Riedmiller
    • 1
  1. 1.Neuroinformatics Group, Department of Mathematics and Computer Science, Institute of Cognitive ScienceUniversity of OsnabrückOsnabrückGermany

Personalised recommendations