Switching between Representations in Reinforcement Learning

van Seijen, Harm; Whiteson, Shimon; Kester, Leon

doi:10.1007/978-3-642-11688-9_3

Harm van Seijen⁴,
Shimon Whiteson⁵ &
Leon Kester⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 281))

1508 Accesses
2 Citations

Abstract

This chapter presents and evaluates an online representation selection method for factored Markov decision processes (MDPs). The method addresses a special case of the feature selection problem that only considers certain subsets of features, which we call candidate representations. A motivation for the method is that it can potentially deal with problems where other structure learning algorithms are infeasible due to a large degree of the associated dynamic Bayesian network. Our method uses switch actions to select a representation and uses off-policy updating to improve the policy of representations that were not selected. We demonstrate the validity of the method by showing for a contextual bandit task and a regular MDP that given a feature set containing only a single relevant feature, we can find this feature very efficiently using the switch method. We also show for a contextual bandit task that switching between a set of relevant features and a subset of these features can outperform each of the individual representations. The reason for this is that the switch method combines the fast performance increase of the small representation with the high asymptotic performance of the large representation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abbeel, P., Koller, D., Ng, A.: Learning factor graphs in polynomial time and sample complexity. The Journal of Machine Learning Research 7, 1743–1788 (2006)
MathSciNet Google Scholar
Bellman, R.E.: A Markov decision process. Journal of Mathematical Mechanics 6, 679–684 (1957)
MATH Google Scholar
Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: International Joint Conference on Artificial Intelligence, vol. 14, pp. 1104–1113 (1995)
Google Scholar
Diuk, C., Li, L., Leffler, B.: The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACM, New York (2009)
Google Scholar
Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solution algorithms for factored mdps. Journal of Artificial Intelligence Research 19, 399–468 (2003)
MATH MathSciNet Google Scholar
Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: Spudd: Stochastic planning using decision diagrams. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 279–288. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Google Scholar
Kearns, M., Koller, D.: Efficient reinforcement learning in factored mdps. In: International Joint Conference on Artificial Intelligence, vol. 16, pp. 740–747 (1999)
Google Scholar
Li, L., Littman, M., Walsh, T.: Knows what it knows: a framework for self-aware learning. In: Proceedings of the 25th international conference on Machine learning, pp. 568–575. ACM, New York (2008)
Chapter Google Scholar
Siegmund, D.: Importance sampling in the monte carlo study of sequential tests. Annals of Statistics 4, 673–684 (1976)
Article MATH MathSciNet Google Scholar
St-Aubin, R., Hoey, J., Boutilier, C.: Apricodd: Approximate policy construction using decision diagrams. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1089–1095. MIT Press, Cambridge (2000)
Google Scholar
Strehl, A., Diuk, C., Littman, M.: Efficient structure learning in factored-state mdps. In: Proceedings of the Twenty-Second National Conference on Artificial Intelligence, vol. 22, p. 645. AAAI Press/MIT Press, Menlo Park/Cambridge (2007)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

TNO Defence, Security and Safety, Oude Waalsdorperweg 63, 2597, AK, The Hague, The Netherlands
Harm van Seijen & Leon Kester
University of Amsterdam, Science Park 107, 1098, XG, Amsterdam, The Netherlands
Shimon Whiteson

Authors

Harm van Seijen
View author publications
You can also search for this author in PubMed Google Scholar
Shimon Whiteson
View author publications
You can also search for this author in PubMed Google Scholar
Leon Kester
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Delft Center for Systems and Control, Delft University of Technology, Mekelweg 2, 2628, Delft, CD, The Netherlands
Robert Babuška
Faculty of Science, Informatics Institute, Science Park 107, 1098, Amsterdam, XG, The Netherlands
Frans C. A. Groen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

van Seijen, H., Whiteson, S., Kester, L. (2010). Switching between Representations in Reinforcement Learning. In: Babuška, R., Groen, F.C.A. (eds) Interactive Collaborative Information Systems. Studies in Computational Intelligence, vol 281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11688-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-11688-9_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11687-2
Online ISBN: 978-3-642-11688-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics