Unsupervised Modeling of Partially Observable Environments

  • Vincent Graziano
  • Jan Koutník
  • Jürgen Schmidhuber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6911)


We present an architecture based on self-organizing maps for learning a sensory layer in a learning system. The architecture, temporal network for transitions (TNT), enjoys the freedoms of unsupervised learning, works on-line, in non-episodic environments, is computationally light, and scales well. TNT generates a predictive model of its internal representation of the world, making planning methods available for both the exploitation and exploration of the environment. Experiments demonstrate that TNT learns nice representations of classical reinforcement learning mazes of varying size (up to 20×20) under conditions of high-noise and stochastic actions.


Self-Organizing Maps POMDPs Reinforcement Learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fernández, F., Borrajo, D.: Two steps reinforcement learning. International Journal of Intelligent Systems 23(2), 213–245 (2008)CrossRefzbMATHGoogle Scholar
  2. 2.
    Ferro, M., Ognibene, D., Pezzulo, G., Pirrelli, V.: Reading as active sensing: a computational model of gaze planning during word discrimination. Frontiers in Neurorobotics 4 (2010)Google Scholar
  3. 3.
    Fritzke, B.: A growing neural gas network learns topologies. In: Advances in Neural Information Processing Systems, vol. 7, pp. 625–632. MIT Press, Cambridge (1995)Google Scholar
  4. 4.
    Gisslén, L., Graziano, V., Luciw, M., Schmidhuber, J.: Sequential Constant Size Compressors and Reinforcement Learning. In: Proceedings of the Fourth Conference on Artificial General Intelligence (2011)Google Scholar
  5. 5.
    Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer, Heidelberg (2001)CrossRefzbMATHGoogle Scholar
  6. 6.
    Koutník, J.: Inductive modelling of temporal sequences by means of self-organization. In: Proceeding of Internation Workshop on Inductive Modelling (IWIM 2007), pp. 269–277. CTU in Prague, Ljubljana (2007)Google Scholar
  7. 7.
    Koutník, J., Šnorek, M.: Temporal hebbian self-organizing map for sequences. In: ICANN 2006, vol. 1, pp. 632–641. Springer, Heidelberg (2008)Google Scholar
  8. 8.
    Lange, S., Riedmiller, M.: Deep auto-encoder neural networks in reinforcement learning. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (July 2010)Google Scholar
  9. 9.
    Luciw, M., Graziano, V., Ring, M., Schmidhuber, J.: Artificial Curiosity with Planning for Autonomous Perceptual and Cognitive Development. In: Proceedings of the International Conference on Development and Learning (2011)Google Scholar
  10. 10.
    Marsland, S., Shapiro, J., Nehmzow, U.: A self-organising network that grows when required. Neural Netw. 15 (October 2002)Google Scholar
  11. 11.
    Provost, J.: Reinforcement Learning in High-Diameter, Continuous Environments. Ph.D. thesis, Computer Sciences Department, University of Texas at Austin, Austin, TX (2007)Google Scholar
  12. 12.
    Provost, J., Kuipers, B.J., Miikkulainen, R.: Developing navigation behavior through self-organizing distinctive state abstraction. Connection Science 18 (2006)Google Scholar
  13. 13.
    Schmidhuber, J.: Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010). IEEE Transactions on Autonomous Mental Development 2(3), 230–247 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Vincent Graziano
    • 1
  • Jan Koutník
    • 1
  • Jürgen Schmidhuber
    • 1
  1. 1.IDSIA, SUPSI, University of LuganoMannoSwitzerland

Personalised recommendations