Skip to main content

Contingent Features for Reinforcement Learning

  • Conference paper
  • 4271 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8681))

Abstract

Applying reinforcement learning algorithms in real-world domains is challenging because relevant state information is often embedded in a stream of high-dimensional sensor data. This paper describes a novel algorithm for learning task-relevant features through interactions with the environment. The key idea is that a feature is likely to be useful to the degree that its dynamics can be controlled by the actions of the agent. We describe an algorithm that can find such features and we demonstrate its effectiveness in an artificial domain.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bellemare, M.G., Veness, J., Bowling, M.: Investigating contingency awareness using Atari 2600 games. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (2012)

    Google Scholar 

  2. Escalante-B, A.N., Wiskott, L.: Slow feature analysis: Perspectives for technical applications of a versatile learning algorithm. Künstliche Intelligenz 26(4), 341–348 (2012)

    Article  Google Scholar 

  3. Geramifard, A., Walsh, T., Roy, N., How, J.: Batch iFDD: A Scalable Matching Pursuit Algorithm for Solving MDPs. In: Proceedings of the 29th Annual Conference on Uncertainty in Artificial Intelligence (2013)

    Google Scholar 

  4. Kolter, J., Ng, A.Y.: Regularization and feature selection in least-squares temporal difference learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 521–528. ACM (2009)

    Google Scholar 

  5. Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. The Journal of Machine Learning Research 4, 1107–1149 (2003)

    MathSciNet  Google Scholar 

  6. Lange, S., Riedmiller, M.: Deep auto-encoder neural networks in reinforcement learning. In: Proceedings of the 2010 International Joint Conference on Neural Networks, pp. 1–8. IEEE (2010)

    Google Scholar 

  7. Luciw, M., Schmidhuber, J.: Low complexity proto-value function learning from sensory observations with incremental slow feature analysis. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds.) ICANN 2012, Part II. LNCS, vol. 7553, pp. 279–287. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Mahadevan, S., Giguere, S., Jacek, N.: Basis adaptation for sparse nonlinear reinforcement learning (2013)

    Google Scholar 

  9. Mahadevan, S., Maggioni, M.: Proto-value functions: A laplacian framework for learning representation and control in markov decision processes. Journal of Machine Learning Research 8(16), 2169–2231 (2007)

    MATH  MathSciNet  Google Scholar 

  10. Parr, R., Painter-Wakefield, C., Li, L., Littman, M.: Analyzing feature generation for value-function approximation. In: Proceedings of the 24th International Conference on Machine Learning (2007)

    Google Scholar 

  11. Sprague, N.: Basis iteration for reward based dimensionality reduction. In: Proceedings of the 6th IEEE International Conference on Development and Learning (2007)

    Google Scholar 

  12. Sprague, N.: Predictive projections. In: Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (2009)

    Google Scholar 

  13. Sprekeler, H.: On the relation of slow feature analysis and laplacian eigenmaps. Neural Computation 23(12), 3287–3302 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  14. Wiskott, L., Sejnowski, T.J.: Slow feature analysis: Unsupervised learning of invariances. Neural Computation 14(4), 715–770 (2002)

    Article  MATH  Google Scholar 

  15. Zito, T., Wilbert, N., Wiskott, L., Berkes, P.: Modular toolkit for data processing (MDP): a Python data processing frame work. Front. Neuroinform. 2(8) (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Sprague, N. (2014). Contingent Features for Reinforcement Learning. In: Wermter, S., et al. Artificial Neural Networks and Machine Learning – ICANN 2014. ICANN 2014. Lecture Notes in Computer Science, vol 8681. Springer, Cham. https://doi.org/10.1007/978-3-319-11179-7_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11179-7_44

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11178-0

  • Online ISBN: 978-3-319-11179-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics