Behavior Learning Based on a Policy Gradient Method: Separation of Environmental Dynamics and State Values in Policies

Ishihara, Seiji; Igarashi, Harukazu

doi:10.1007/978-3-540-89197-0_18

Behavior Learning Based on a Policy Gradient Method: Separation of Environmental Dynamics and State Values in Policies

Seiji Ishihara³ &
Harukazu Igarashi⁴

Conference paper

1333 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5351))

Abstract

Policy gradient methods are very useful approaches in reinforcement learning. In our policy gradient approach to behavior learning of agents, we define an agent’s decision problem at each time step as a problem of minimizing an objective function. In this paper, we give an objective function that consists of two types of parameters representing environmental dynamics and state-value functions. We derive separate learning rules for the two types of parameters so that the two sets of parameters can be learned independently. Separating these two types of parameters will make it possible to reuse state-value functions for agents in other different environmental dynamics, even if the dynamics is stochastic. Our simulation experiments on learning hunter-agent policies in pursuit problems show the effectiveness of our method.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge (1998)
Google Scholar
Williams, R.J.: Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning 8, 229–256 (1992)
MATH Google Scholar
Kimura, H., Yamamura, M., Kobayashi, S.: Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward. In: Proceedings of the 12th International Conference on Machine Learning, pp. 295–303 (1995)
Google Scholar
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy Gradient Methods for Reinforcement Learning with Function Approximation. In: Advances in Neural Information Processing Systems (Proc. NIPS 1999 Conf.), vol. 12, pp. 1057–1063 (2000)
Google Scholar
Konda, V.R., Tsitsiklis, J.N.: Actor-Critic Algorithms. In: Advances in Neural Information Processing Systems (Proc. NIPS 1999 Conf.), vol. 12, pp. 1008–1014 (2000)
Google Scholar
Baird, L., Moore, A.: Gradient Descent for General Reinforcement Learning. In: Advances in Neural Information Processing Systems (Proc. NIPS 1998 Conf.), vol. 11, pp. 968–974 (1999)
Google Scholar
Igarashi, H., Ishihara, S., Kimura, M.: Reinforcement Learning in Non-Markov Decision Processes —Statistical Properties of Characteristic Eligibility. IEICE Transactions on Information and Systems J90-D(9), 2271–2280 (2007) (in Japanese)
Google Scholar
Ishihara, S., Igarashi, H.: Applying the Policy Gradient Method to Behavior Learning in Multi-agent Systems: The Pursuit Problem. Systems and Computers in Japan 37(10), 101–109 (2006)
Article Google Scholar
Peshkin, L., Kim, K.E., Meuleau, N., Kaelbling, L.P.: Learning to cooperative via policy search. In: Proc. of 16th Conference on Uncertainty in Artificial Intelligence (UAI 2000), pp. 489–496 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Kinki University, 1 Takaya-umenobe, Higashi-hiroshima, Hiroshima, 739–2116, Japan
Seiji Ishihara
Shibaura Institute of Technology, 3–7–5 Toyosu, Koto-ku, Tokyo, 135–8548, Japan
Harukazu Igarashi

Authors

Seiji Ishihara
View author publications
You can also search for this author in PubMed Google Scholar
Harukazu Igarashi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Japan Advanced Institute of Science and Technology, Asahidai 1-1, 923-12292, Nomi, Japan
Tu-Bao Ho
Department of Computer Science & Technology, Nanjing University, 22 Hankou Road, 210093, China
Zhi-Hua Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ishihara, S., Igarashi, H. (2008). Behavior Learning Based on a Policy Gradient Method: Separation of Environmental Dynamics and State Values in Policies. In: Ho, TB., Zhou, ZH. (eds) PRICAI 2008: Trends in Artificial Intelligence. PRICAI 2008. Lecture Notes in Computer Science(), vol 5351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89197-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-540-89197-0_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89196-3
Online ISBN: 978-3-540-89197-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics