Advertisement

Contingent Features for Reinforcement Learning

  • Nathan Sprague
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8681)

Abstract

Applying reinforcement learning algorithms in real-world domains is challenging because relevant state information is often embedded in a stream of high-dimensional sensor data. This paper describes a novel algorithm for learning task-relevant features through interactions with the environment. The key idea is that a feature is likely to be useful to the degree that its dynamics can be controlled by the actions of the agent. We describe an algorithm that can find such features and we demonstrate its effectiveness in an artificial domain.

Keywords

Weight Vector Reinforcement Learning Temporal Derivative Contingent Feature Policy Iteration 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bellemare, M.G., Veness, J., Bowling, M.: Investigating contingency awareness using Atari 2600 games. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (2012)Google Scholar
  2. 2.
    Escalante-B, A.N., Wiskott, L.: Slow feature analysis: Perspectives for technical applications of a versatile learning algorithm. Künstliche Intelligenz 26(4), 341–348 (2012)CrossRefGoogle Scholar
  3. 3.
    Geramifard, A., Walsh, T., Roy, N., How, J.: Batch iFDD: A Scalable Matching Pursuit Algorithm for Solving MDPs. In: Proceedings of the 29th Annual Conference on Uncertainty in Artificial Intelligence (2013)Google Scholar
  4. 4.
    Kolter, J., Ng, A.Y.: Regularization and feature selection in least-squares temporal difference learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 521–528. ACM (2009)Google Scholar
  5. 5.
    Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. The Journal of Machine Learning Research 4, 1107–1149 (2003)MathSciNetGoogle Scholar
  6. 6.
    Lange, S., Riedmiller, M.: Deep auto-encoder neural networks in reinforcement learning. In: Proceedings of the 2010 International Joint Conference on Neural Networks, pp. 1–8. IEEE (2010)Google Scholar
  7. 7.
    Luciw, M., Schmidhuber, J.: Low complexity proto-value function learning from sensory observations with incremental slow feature analysis. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds.) ICANN 2012, Part II. LNCS, vol. 7553, pp. 279–287. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  8. 8.
    Mahadevan, S., Giguere, S., Jacek, N.: Basis adaptation for sparse nonlinear reinforcement learning (2013)Google Scholar
  9. 9.
    Mahadevan, S., Maggioni, M.: Proto-value functions: A laplacian framework for learning representation and control in markov decision processes. Journal of Machine Learning Research 8(16), 2169–2231 (2007)zbMATHMathSciNetGoogle Scholar
  10. 10.
    Parr, R., Painter-Wakefield, C., Li, L., Littman, M.: Analyzing feature generation for value-function approximation. In: Proceedings of the 24th International Conference on Machine Learning (2007)Google Scholar
  11. 11.
    Sprague, N.: Basis iteration for reward based dimensionality reduction. In: Proceedings of the 6th IEEE International Conference on Development and Learning (2007)Google Scholar
  12. 12.
    Sprague, N.: Predictive projections. In: Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (2009)Google Scholar
  13. 13.
    Sprekeler, H.: On the relation of slow feature analysis and laplacian eigenmaps. Neural Computation 23(12), 3287–3302 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  14. 14.
    Wiskott, L., Sejnowski, T.J.: Slow feature analysis: Unsupervised learning of invariances. Neural Computation 14(4), 715–770 (2002)CrossRefzbMATHGoogle Scholar
  15. 15.
    Zito, T., Wilbert, N., Wiskott, L., Berkes, P.: Modular toolkit for data processing (MDP): a Python data processing frame work. Front. Neuroinform. 2(8) (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Nathan Sprague
    • 1
  1. 1.Department of Computer ScienceJames Madison UniversityHarrisonburgUSA

Personalised recommendations