Abstract
In real-world problems, the environment surrounding a controlled system is nonstationary, and the optimal control may change with time. It is difficult to learn such controls when using reinforcement learning (RL) which usually assumes stationary Markov decision processes. A modular-based RL method was formerly proposed by Doya et al., in which multiple-paired predictors and controllers were gated to produce nonstationary controls, and its effectiveness in nonstationary problems was shown. However, learning of time-dependent decomposition of the constituent pairs could be unstable, and the resulting control was somehow obscure due to the heuristical combination of predictors and controllers. To overcome these difficulties, we propose a new modular RL algorithm, in which predictors are learned in a self-organized manner to realize stable decomposition and controllers are appropriately optimized by a policy gradient-based RL method. Computer simulations show that our method achieves faster and more stable learning than the previous one.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wolpert, D.M., Kawato, M.: Multiple paired forward and inverse models for motor control. Neural Networks 11(7-8), 1317–1329 (1998)
Haruno, M., Wolpert, D.M., Kawato, M.: MOSAIC Model for Sensorimotor Learning and Control. Neural Computation 13(10), 2201–2220 (2001)
Doya, K., Samejima, K., Katagiri, K., Kawato, M.: Multiple Model-Based Reinforcement Learning. Neural Computation 14(6), 1347–1369 (2002)
Sutton, R., Barto, A.: An introduction to reinforcement learning. MIT Press, Cambridge (1998)
Williams, R.J.: Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning 8, 229–256 (1992)
Kohonen, T.: The self-organized map. Proc. IEEE 78(9), 1464–1480 (1990)
Singh, S.P.: Transfer of Learning by Composing Solutions of Elemental Sequential Tasks. Machine Learning 8, 323–339 (1992)
Sutton, R., Precup, D., Singh, S.: Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artif. Intell. 112(1-2), 181–211 (1999)
Hinton, G.E.: Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation 14(8), 1771–1800 (2002)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic model for segmenting and labeling sequence data. In: Proc. 18th International Conf. on Machine Learning (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hiei, Y., Mori, T., Ishii, S. (2008). Self-organized Reinforcement Learning Based on Policy Gradient in Nonstationary Environments. In: Kůrková, V., Neruda, R., Koutník, J. (eds) Artificial Neural Networks - ICANN 2008. ICANN 2008. Lecture Notes in Computer Science, vol 5163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87536-9_38
Download citation
DOI: https://doi.org/10.1007/978-3-540-87536-9_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87535-2
Online ISBN: 978-3-540-87536-9
eBook Packages: Computer ScienceComputer Science (R0)