Abstract
The trade-off issue between exploitation and exploration in multi-agent systems learning have been a crucial area of research for the past few decades. A proper learning policy is necessary to address the issue for the agents to react rapidly and adapt in a dynamic environment. A family of core learning policies were identified in the open literature that are suitable for non-stationary multi-agent foraging task modeled in this paper. The model is used to compare and contrast between the identified learning policies namely greedy, ε-greedy and Boltzmann distribution. A simple random search is also included to justify the convergence of q-learning. A number of simulation-based experiments was conducted and based on the numerical results that was obtained, the performances of the learning policies are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Koulouriotis, D.E., Xanthopoulos, A.: Reinforcement Learning and Evolutionary Algorithms for Non-stationary Multi-armed Bandit Problems. Applied Mathematics and Computation 196, 913–922 (2008)
Webots: Commercial Mobile Robot Simulation Software, http://www.cyberbotics.com
Ji, Z., Wu, Q., Sid-Ahmed, M.: An Improved Immune Q-learning Algorithm. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 1636–1641 (2007)
Gomes, E., Kowalczyk, R.: Dynamic Analysis of Multiagent Q-learning with E-greedy Exploration. In: Proceedings of the 26th International Conference on Machine Learning, vol. 382, pp. 369–376 (2009)
Tuyls, K., Verbeeck, K., Lenaerts, T.: A Selection-Mutation Model for Q-learning in Multi-agent Systems. In: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 693–700 (2003)
Morihiro, K., Isokawa, T., Nishimura, H., Matsui, N.: Emergence of Flocking Behavior Based on Reinforcement Learning. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds.) KES 2006. LNCS (LNAI), vol. 4253, pp. 699–706. Springer, Heidelberg (2006)
Dahmani, Y., Benyettou, A.: Seek of an Optimal Way by Q-Learning. Journal of Computer Science 1(1), 28–30 (2005)
Whiteson, S., Taylor, M., Stone, P.: Empirical Studies in Action Selection with Reinforcement Learning. Adaptive Behavior 15(1), 33–50 (2007)
Price, B., Boutilier, C.: Accelerating Reinforcement Learning Through Implicit Imitation. Journal of Artificial Intelligence Research 19, 569–629 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
M., Y., S.G., P. (2010). Q-Learning Policies for Multi-Agent Foraging Task. In: Vadakkepat, P., et al. Trends in Intelligent Robotics. FIRA 2010. Communications in Computer and Information Science, vol 103. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15810-0_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-15810-0_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15809-4
Online ISBN: 978-3-642-15810-0
eBook Packages: Computer ScienceComputer Science (R0)