Skip to main content

Switching between Representations in Reinforcement Learning

  • Chapter
Interactive Collaborative Information Systems

Part of the book series: Studies in Computational Intelligence ((SCI,volume 281))

Abstract

This chapter presents and evaluates an online representation selection method for factored Markov decision processes (MDPs). The method addresses a special case of the feature selection problem that only considers certain subsets of features, which we call candidate representations. A motivation for the method is that it can potentially deal with problems where other structure learning algorithms are infeasible due to a large degree of the associated dynamic Bayesian network. Our method uses switch actions to select a representation and uses off-policy updating to improve the policy of representations that were not selected. We demonstrate the validity of the method by showing for a contextual bandit task and a regular MDP that given a feature set containing only a single relevant feature, we can find this feature very efficiently using the switch method. We also show for a contextual bandit task that switching between a set of relevant features and a subset of these features can outperform each of the individual representations. The reason for this is that the switch method combines the fast performance increase of the small representation with the high asymptotic performance of the large representation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abbeel, P., Koller, D., Ng, A.: Learning factor graphs in polynomial time and sample complexity. The Journal of Machine Learning Research 7, 1743–1788 (2006)

    MathSciNet  Google Scholar 

  2. Bellman, R.E.: A Markov decision process. Journal of Mathematical Mechanics 6, 679–684 (1957)

    MATH  Google Scholar 

  3. Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: International Joint Conference on Artificial Intelligence, vol. 14, pp. 1104–1113 (1995)

    Google Scholar 

  4. Diuk, C., Li, L., Leffler, B.: The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACM, New York (2009)

    Google Scholar 

  5. Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solution algorithms for factored mdps. Journal of Artificial Intelligence Research 19, 399–468 (2003)

    MATH  MathSciNet  Google Scholar 

  6. Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: Spudd: Stochastic planning using decision diagrams. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 279–288. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  7. Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)

    Google Scholar 

  8. Kearns, M., Koller, D.: Efficient reinforcement learning in factored mdps. In: International Joint Conference on Artificial Intelligence, vol. 16, pp. 740–747 (1999)

    Google Scholar 

  9. Li, L., Littman, M., Walsh, T.: Knows what it knows: a framework for self-aware learning. In: Proceedings of the 25th international conference on Machine learning, pp. 568–575. ACM, New York (2008)

    Chapter  Google Scholar 

  10. Siegmund, D.: Importance sampling in the monte carlo study of sequential tests. Annals of Statistics 4, 673–684 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  11. St-Aubin, R., Hoey, J., Boutilier, C.: Apricodd: Approximate policy construction using decision diagrams. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1089–1095. MIT Press, Cambridge (2000)

    Google Scholar 

  12. Strehl, A., Diuk, C., Littman, M.: Efficient structure learning in factored-state mdps. In: Proceedings of the Twenty-Second National Conference on Artificial Intelligence, vol. 22, p. 645. AAAI Press/MIT Press, Menlo Park/Cambridge (2007)

    Google Scholar 

  13. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

van Seijen, H., Whiteson, S., Kester, L. (2010). Switching between Representations in Reinforcement Learning. In: Babuška, R., Groen, F.C.A. (eds) Interactive Collaborative Information Systems. Studies in Computational Intelligence, vol 281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11688-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11688-9_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11687-2

  • Online ISBN: 978-3-642-11688-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics