Exploration Strategies for Learning in Multi-agent Foraging

  • Yogeswaran Mohan
  • Ponnambalam S.G.
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7077)


During the learning process, every agent’s action affects the interaction with the environment based on the agent’s current knowledge and future knowledge. The agent must therefore have to choose between exploiting its current knowledge or exploring other alternatives to improve its knowledge for better decisions in the future. This paper presents critical analysis on a number of exploration strategies reported in the open literatures. Exploration strategies namely random search, greedy, ε-greedy, Boltzmann Distribution (BD), Simulated Annealing (SA), Probability Matching (PM) and Optimistic Initial Values (OIV) are implemented to study on their performances on a multi-agent foraging task modeled.


Foraging-task reinforcement learning exploration strategies learning policies \(\mathcal{Q}\)-Learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Carmel, D., Markovitch, S.: Exploration strategies for model-based learning in multi-agent systems: Exploration strategies. Autonomous Agents and Multi-agent Systems 2(2), 141–172 (1999)CrossRefGoogle Scholar
  2. 2.
    Even-Dar, E., Mansour, Y.: Convergence of optimistic and incremental Q-learning. Advances in Neural Information Processing Systems 2, 1499–1506 (2002)Google Scholar
  3. 3.
    Guo, M., Liu, Y., Malec, J.: A new Q-learning algorithm based on the metropolis criterion. IEEE Transactions On Systems, Man, And Cybernetics? Part B: Cybernetics 34(5), 2141 (2004)Google Scholar
  4. 4.
    Koulouriotis, D.E., Xanthopoulos, A.: Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems. Applied Mathematics and Computation 196(2), 913–922 (2008)CrossRefzbMATHGoogle Scholar
  5. 5.
    Morihiro, K., Isokawa, T., Nishimura, H., Matsui, N.: Emergence of Flocking Behavior Based on Reinforcement Learning. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds.) KES 2006. LNCS (LNAI), vol. 4253, pp. 699–706. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Price, B., Boutilier, C.: Accelerating reinforcement learning through implicit imitation. Journal of Artificial Intelligence Research 19(1), 569–629 (2003)zbMATHGoogle Scholar
  7. 7.
    Strehl, A., Li, L., Wiewiora, E., Langford, J., Littman, M.: PAC model-free reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning, p. 888. ACM (2006)Google Scholar
  8. 8.
    Sutton, R., Barto, A.: Reinforcement learning: An introduction. The MIT Press (1998)Google Scholar
  9. 9.
    Szita, I., Lőrincz, A.: The many faces of optimism: a unifying approach. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1048–1055. ACM (2008)Google Scholar
  10. 10.
    Webots:,, commercial Mobile Robot Simulation Software
  11. 11.
    Whiteson, S., Taylor, M., Stone, P.: Empirical studies in action selection with reinforcement learning. Adaptive Behavior 15(1), 33 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Yogeswaran Mohan
    • 1
  • Ponnambalam S.G.
    • 1
  1. 1.School of EngineeringMonash UniversityPetaling JayaMalaysia

Personalised recommendations