Abstract
In this article, we apply a policy gradient-based reinforcement learning to allowing multiple agents to perform cooperative actions in a partially observable environment. We introduce an auxiliary state variable, an internal state, whose stochastic process is Markov, for extracting important features of multi-agent’s dynamics. Computer simulations show that every agent can identify an appropriate internal state model and acquire a good policy; this approach is shown to be more effective than a traditional memory-based method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aberdeen, D., Baxter, J.: Scaling Internal State Policy-Gradient Methods for POMDPs. In: Proceedings of the 19th International Conference on Machine Learning, pp. 3–10 (2002)
Baxter, J., Bartlett, P.L.: Infinite-Horizon Policy-Gradient Estimation. Journal of Artificial Intelligence Research 15, 229–256 (2001)
Hauskrecht, M.: Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research 13, 33–99 (2000)
Kaelbling, L.P., Littman, M.L, Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99–134 (1998)
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning, pp. 157–163 (1994)
Stone, P., Sutton, R., Singh, S.: Reinforcement Learning for 3 vs. 2 Keepaway. RoboCup-2000: Robot soccer world cup IV 249–258 (2000)
Stone, P., Veloso, M.: Multiagent Systems: A Survey from a Machine Learning Perspective. Autonomous Robotics 8(3) (2000)
Sutton, R., Barto, A.: An introduction to reinforcement learning. MIT Press, Cambridge (1998)
Thrun, S.: Monte Carlo POMDPs. Advances in Neural Information Processing Systems 12, 1064–1070 (2000)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)
Whitehead, S.D.: A complexity analysis of cooperative mechanisms in reinforcement leaning. In: Proc. of the 9th National Conf. on Artificial Intelligence, vol. 2, pp. 607–613 (1991)
Yoshimoto, J., Ishii, S., Sato, M.: System identification based on on-line variational Bayes method and its application to reinforcement learning. In: Kaynak, O., Alpaydın, E., Oja, E., Xu, L. (eds.) ICANN 2003 and ICONIP 2003. LNCS, vol. 2714, pp. 123–131. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Taniguchi, Y., Mori, T., Ishii, S. (2007). Reinforcement Learning for Cooperative Actions in a Partially Observable Multi-agent System. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds) Artificial Neural Networks – ICANN 2007. ICANN 2007. Lecture Notes in Computer Science, vol 4668. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74690-4_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-74690-4_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74689-8
Online ISBN: 978-3-540-74690-4
eBook Packages: Computer ScienceComputer Science (R0)