Self-organized Reinforcement Learning Based on Policy Gradient in Nonstationary Environments

Hiei, Yu; Mori, Takeshi; Ishii, Shin

doi:10.1007/978-3-540-87536-9_38

Yu Hiei¹,
Takeshi Mori² &
Shin Ishii^1,2

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5163))

Included in the following conference series:

International Conference on Artificial Neural Networks

2033 Accesses
1 Citations

Abstract

In real-world problems, the environment surrounding a controlled system is nonstationary, and the optimal control may change with time. It is difficult to learn such controls when using reinforcement learning (RL) which usually assumes stationary Markov decision processes. A modular-based RL method was formerly proposed by Doya et al., in which multiple-paired predictors and controllers were gated to produce nonstationary controls, and its effectiveness in nonstationary problems was shown. However, learning of time-dependent decomposition of the constituent pairs could be unstable, and the resulting control was somehow obscure due to the heuristical combination of predictors and controllers. To overcome these difficulties, we propose a new modular RL algorithm, in which predictors are learned in a self-organized manner to realize stable decomposition and controllers are appropriately optimized by a policy gradient-based RL method. Computer simulations show that our method achieves faster and more stable learning than the previous one.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wolpert, D.M., Kawato, M.: Multiple paired forward and inverse models for motor control. Neural Networks 11(7-8), 1317–1329 (1998)
Article Google Scholar
Haruno, M., Wolpert, D.M., Kawato, M.: MOSAIC Model for Sensorimotor Learning and Control. Neural Computation 13(10), 2201–2220 (2001)
Article MATH Google Scholar
Doya, K., Samejima, K., Katagiri, K., Kawato, M.: Multiple Model-Based Reinforcement Learning. Neural Computation 14(6), 1347–1369 (2002)
Article MATH Google Scholar
Sutton, R., Barto, A.: An introduction to reinforcement learning. MIT Press, Cambridge (1998)
Google Scholar
Williams, R.J.: Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning 8, 229–256 (1992)
MATH Google Scholar
Kohonen, T.: The self-organized map. Proc. IEEE 78(9), 1464–1480 (1990)
Article Google Scholar
Singh, S.P.: Transfer of Learning by Composing Solutions of Elemental Sequential Tasks. Machine Learning 8, 323–339 (1992)
MATH Google Scholar
Sutton, R., Precup, D., Singh, S.: Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artif. Intell. 112(1-2), 181–211 (1999)
Article MATH MathSciNet Google Scholar
Hinton, G.E.: Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation 14(8), 1771–1800 (2002)
Article MATH Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic model for segmenting and labeling sequence data. In: Proc. 18th International Conf. on Machine Learning (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science, Nara Institute of Science and Technology (NAIST), , Takayama 8916-5, Ikoma, Nara, 630-0192, Japan
Yu Hiei & Shin Ishii
Graduate School of Informatics, Kyoto University, Gokasho, Uji, Kyoto, 611-0011, Japan
Takeshi Mori & Shin Ishii

Authors

Yu Hiei
View author publications
You can also search for this author in PubMed Google Scholar
Takeshi Mori
View author publications
You can also search for this author in PubMed Google Scholar
Shin Ishii
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Véra Kůrková Roman Neruda Jan Koutník

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hiei, Y., Mori, T., Ishii, S. (2008). Self-organized Reinforcement Learning Based on Policy Gradient in Nonstationary Environments. In: Kůrková, V., Neruda, R., Koutník, J. (eds) Artificial Neural Networks - ICANN 2008. ICANN 2008. Lecture Notes in Computer Science, vol 5163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87536-9_38

Download citation

DOI: https://doi.org/10.1007/978-3-540-87536-9_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87535-2
Online ISBN: 978-3-540-87536-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Self-organized Reinforcement Learning Based on Policy Gradient in Nonstationary Environments