Skip to main content

Sparse Approximations to Value Functions in Reinforcement Learning

  • Conference paper
Artificial Neural Networks

Part of the book series: Springer Series in Bio-/Neuroinformatics ((SSBN,volume 4))

Abstract

We present a novel sparsification and value function approximation method for on-line reinforcement learning in continuous state and action spaces. Our approach is based on the kernel least squares temporal difference learning algorithm. We derive a recursive version and enhance the algorithm with a new sparsification mechanism based on the topology obtained from proximity graphs. The sparsification mechanism - necessary to speed up the computations - favors datapoints minimizing the divergence of the target-function gradient, thereby also considering the shape of the target function. The performance of our sparsification and approximation method is tested on a standard benchmark RL problem and comparisons with existing approaches are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Boyan, J.A.: Technical update: Least-squares temporal difference learning. Machine Learning 49(2-3), 233–246 (2002)

    Article  MATH  Google Scholar 

  2. Bradtke, S.J., Barto, A.G., Kaelbling, P.: Linear least-squares algorithms for temporal difference learning. Machine Learning, 22–33 (1996)

    Google Scholar 

  3. Csató, L.: Gaussian Processes – Iterative Sparse Approximation. PhD thesis, Neural Computing Research Group (March 2002)

    Google Scholar 

  4. Csató, L., Opper, M.: Sparse representation for Gaussian process models. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) NIPS, vol. 13, pp. 444–450. The MIT Press (2001)

    Google Scholar 

  5. Csató, L., Opper, M.: Sparse on-line Gaussian Processes. Neural Computation 14(3), 641–669 (2002)

    Article  MATH  Google Scholar 

  6. Deisenroth, M.P., Rasmussen, C.E.: PILCO: A Model-Based and Data-Efficient Approach to Policy Search. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA (June 2011)

    Google Scholar 

  7. Deisenroth, M.P., Rasmussen, C.E., Peters, J.: Gaussian process dynamic programming. Neurocomputing 72(7-9), 1508–1524 (2009)

    Article  Google Scholar 

  8. Engel, Y., Mannor, S., Meir, R.: The kernel recursive least squares algorithm. IEEE Transactions on Signal Processing 52, 2275–2285 (2003)

    Article  MathSciNet  Google Scholar 

  9. Jakab, H., Csató, L.: Manifold-based non-parametric learning of action-value functions. In: Verleysen, M. (ed.) European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, pp. 579–585. UCL, KULeuven (2012)

    Google Scholar 

  10. Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003)

    MathSciNet  Google Scholar 

  11. Maei, H., Szepesvari, C., Bhatnagar, S., Precup, D., Silver, D., Sutton, R.: Convergent temporal-difference learning with arbitrary smooth function approximation. In: Advances in Neural Information Processing Systems NIPS 22, pp. 1204–1212 (2009)

    Google Scholar 

  12. Mahadevan, S., Maggioni, M.: Value function approximation with diffusion wavelets and laplacian eigenfunctions. In: Weiss, Y., Schölkopf, B., Platt, J. (eds.) Advances in Neural Information Processing Systems 18, pp. 843–850. MIT Press, Cambridge (2006)

    Google Scholar 

  13. McCullagh, P., Nelder, J.A.: Generalized Linear Models. Chapman & Hall, London (1989)

    Book  MATH  Google Scholar 

  14. Olivier, P., Perez, J.-P., Simó, C., Simon, S., Weil, J.-A.: Swinging atwood’s machine: Experimental and numerical results, and a theoretical study. Physica D 239, 1067–1081 (2010)

    Article  MathSciNet  Google Scholar 

  15. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York (1994)

    Book  MATH  Google Scholar 

  16. Riedmiller, M.: Neural fitted q iteration: first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005)

    Google Scholar 

  17. Ruggeri, M.R., Saupe, D.: Isometry-invariant matching of point set surfaces. In: Eurographics Workshop on 3D Object Retrieval (2008)

    Google Scholar 

  18. Schölkopf, B., Smola, A.J.: Learning with Kernels. The MIT Press, Cambridge (2002)

    Google Scholar 

  19. Seeger, M.W., Kakade, S.M., Foster, D.P.: Information consistency of nonparametric gaussian process methods

    Google Scholar 

  20. Sugiyama, M., Hachiya, H., Kashima, H., Morimura, T.: Least absolute policy iteration for robust value function approximation. In: Proceedings of the 2009 IEEE International Conference on Robotics and Automation, ICRA 2009, Piscataway, NJ, USA, pp. 699–704. IEEE Press (2009)

    Google Scholar 

  21. Sugiyama, M., Kawanabe, M.: Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation (2012)

    Google Scholar 

  22. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)

    Google Scholar 

  23. Szepesvári, C.: Algorithms for Reinforcement Learning. Morgan & Claypool Publishers (2011)

    Google Scholar 

  24. Taylor, G., Parr, R.: Kernelized value function approximation for reinforcement learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, New York, NY, USA, pp. 1017–1024. ACM (2009)

    Google Scholar 

  25. Vapnik, V.N.: Statistical learning theory. John Wiley (1997)

    Google Scholar 

  26. Vollbrecht, H.: Hierarchic function approximation in kd-q-learning. In: Proc. Fourth Int. Knowledge-Based Intelligent Engineering Systems and Allied Technologies Conf., vol. 2, pp. 466–469 (2000)

    Google Scholar 

  27. von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4) (2007)

    Google Scholar 

  28. Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 973–992 (2007)

    Google Scholar 

  29. Xu, X., Xie, T., Hu, D., Lu, X.: Kernel least-squares temporal difference learning. International Journal of Information Technology, 55–63 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hunor S. Jakab .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Jakab, H.S., Csató, L. (2015). Sparse Approximations to Value Functions in Reinforcement Learning. In: Koprinkova-Hristova, P., Mladenov, V., Kasabov, N. (eds) Artificial Neural Networks. Springer Series in Bio-/Neuroinformatics, vol 4. Springer, Cham. https://doi.org/10.1007/978-3-319-09903-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09903-3_14

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09902-6

  • Online ISBN: 978-3-319-09903-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics