Models for Autonomously Motivated Exploration in Reinforcement Learning

  • Peter Auer
  • Shiau Hong Lim
  • Chris Watkins
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6926)


One of the striking differences between current reinforcement learning algorithms and early human learning is that animals and infants appear to explore their environments with autonomous purpose, in a manner appropriate to their current level of skills. An important intuition for autonomously motivated exploration was proposed by Schmidhuber [1,2]: an agent should be interested in making observations that reduce its uncertainty about future observations.

However, there is not yet a theoretical analysis of the usefulness of autonomous exploration in respect to the overall performance of a learning agent. We discuss models for a learning agent’s autonomous exploration and present some recent results. In particular, we investigate the exploration time for navigating effectively in a Markov Decsion Process (MDP) without rewards, and we consider extensions to MDPs with infinite state spaces.


Reinforcement learning autonomous exploration intrinsic rewards 


  1. 1.
    Schmidhuber, J.: A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers. In: Meyer, J.A., Wilson, S.W. (eds.) International Conference on Simulation of Adaptive Behavior: From Animals to Animats, pp. 222–227. MIT Press, Cambridge (1991)Google Scholar
  2. 2.
    Schmidhuber, J.: Developmental Robotics, Optimal Artificial Curiosity, Creativity, Music, and the Fine Arts. Connection Science 18(2), 173–187 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Peter Auer
    • 1
  • Shiau Hong Lim
    • 1
  • Chris Watkins
    • 2
  1. 1.Chair for Information TechnologyMontanuniversität LeobenAustria
  2. 2.Department of Computer ScienceRoyal Holloway University of LondonUK

Personalised recommendations