Abstract
After reviewing the main ingredients of the UCRL algorithm and its analysis for online reinforcement learning — exploration vs. exploitation, optimism in the face of uncertainty, consistency with observations and upper confidence bounds, regret analysis — I show how these techniques can also be used to derive PAC-MDP bounds which match the best currently available bounds for the discounted and the undiscounted setting. As typical for reinforcement learning, the analysis for the undiscounted setting is significantly more involved.
In the second part of my talk I consider a model for autonomous exploration, where an agent learns about its environment and how to navigate in it. Whereas evaluating autonomous exploration is typically difficult, in the presented setting rigorous performance bounds can be derived. For that we present an algorithm that optimistically explores, by repeatedly choosing the apparently closest unknown state — as indicated by an optimistic policy — for further exploration.
Acknowledgements. This is joint work with Shiau Hong Lim. The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement 231495 (CompLACS).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsAuthor information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Auer, P. (2012). Invited Talk: UCRL and Autonomous Exploration. In: Sanner, S., Hutter, M. (eds) Recent Advances in Reinforcement Learning. EWRL 2011. Lecture Notes in Computer Science(), vol 7188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29946-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-29946-9_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29945-2
Online ISBN: 978-3-642-29946-9
eBook Packages: Computer ScienceComputer Science (R0)