Unsupervised Modeling of Partially Observable Environments
We present an architecture based on self-organizing maps for learning a sensory layer in a learning system. The architecture, temporal network for transitions (TNT), enjoys the freedoms of unsupervised learning, works on-line, in non-episodic environments, is computationally light, and scales well. TNT generates a predictive model of its internal representation of the world, making planning methods available for both the exploitation and exploration of the environment. Experiments demonstrate that TNT learns nice representations of classical reinforcement learning mazes of varying size (up to 20×20) under conditions of high-noise and stochastic actions.
KeywordsSelf-Organizing Maps POMDPs Reinforcement Learning
Unable to display preview. Download preview PDF.
- 2.Ferro, M., Ognibene, D., Pezzulo, G., Pirrelli, V.: Reading as active sensing: a computational model of gaze planning during word discrimination. Frontiers in Neurorobotics 4 (2010)Google Scholar
- 3.Fritzke, B.: A growing neural gas network learns topologies. In: Advances in Neural Information Processing Systems, vol. 7, pp. 625–632. MIT Press, Cambridge (1995)Google Scholar
- 4.Gisslén, L., Graziano, V., Luciw, M., Schmidhuber, J.: Sequential Constant Size Compressors and Reinforcement Learning. In: Proceedings of the Fourth Conference on Artificial General Intelligence (2011)Google Scholar
- 6.Koutník, J.: Inductive modelling of temporal sequences by means of self-organization. In: Proceeding of Internation Workshop on Inductive Modelling (IWIM 2007), pp. 269–277. CTU in Prague, Ljubljana (2007)Google Scholar
- 7.Koutník, J., Šnorek, M.: Temporal hebbian self-organizing map for sequences. In: ICANN 2006, vol. 1, pp. 632–641. Springer, Heidelberg (2008)Google Scholar
- 8.Lange, S., Riedmiller, M.: Deep auto-encoder neural networks in reinforcement learning. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (July 2010)Google Scholar
- 9.Luciw, M., Graziano, V., Ring, M., Schmidhuber, J.: Artificial Curiosity with Planning for Autonomous Perceptual and Cognitive Development. In: Proceedings of the International Conference on Development and Learning (2011)Google Scholar
- 10.Marsland, S., Shapiro, J., Nehmzow, U.: A self-organising network that grows when required. Neural Netw. 15 (October 2002)Google Scholar
- 11.Provost, J.: Reinforcement Learning in High-Diameter, Continuous Environments. Ph.D. thesis, Computer Sciences Department, University of Texas at Austin, Austin, TX (2007)Google Scholar
- 12.Provost, J., Kuipers, B.J., Miikkulainen, R.: Developing navigation behavior through self-organizing distinctive state abstraction. Connection Science 18 (2006)Google Scholar