Abstract
Feature selection in reinforcement learning (RL), i.e. choosing basis functions such that useful approximations of the unkown value function can be obtained, is one of the main challenges in scaling RL to real-world applications. Here we consider the Gaussian process based framework GPTD for approximate policy evaluation, and propose feature selection through marginal likelihood optimization of the associated hyperparameters. Our approach has two appealing benefits: (1) given just sample transitions, we can solve the policy evaluation problem fully automatically (without looking at the learning task, and, in theory, independent of the dimensionality of the state space), and (2) model selection allows us to consider more sophisticated kernels, which in turn enable us to identify relevant subspaces and eliminate irrelevant state variables such that we can achieve substantial computational savings and improved prediction performance.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bach, F.R., Jordan, M.I.: Kernel independent component analysis. JMLR 3, 1–48 (2002)
Bach, F.R., Jordan, M.I.: Predictive low-rank decomposition for kernel methods. In: Proc. of ICML, vol. 22 (2005)
Bertsekas, D.: Dynamic programming and Optimal Control, vol. II. Athena Scientific (2007)
Csató, L., Opper, M.: Sparse online Gaussian processes. Neural Computation 14(3), 641–668 (2002)
Deisenroth, M.P., Rasmussen, C.E., Peters, J.: Gaussian process dynamic programming. Neurocomputing 72(7-9), 1508–1524 (2009)
Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: Proc. of ICML, vol. 22 (2005)
Fine, S., Scheinberg, K.: Efficient SVM training using low-rank kernel representation. JMLR 2, 243–264 (2001)
Jung, T., Polani, D.: Learning robocup-keepaway with kernels. In: JMLR: Workshop and Conference Proceedings (Gaussian Processes in Practice), vol. 1, pp. 33–57 (2007)
Keller, P., Mannor, S., Precup, D.: Automatic basis function construction for approximate dynamic programming and reinforcement learning. In: Proc. of ICML, vol. 23 (2006)
Luo, Z., Wahba, G.: Hybrid adaptive splines. J. Amer. Statist. Assoc. 92, 107–116 (1997)
Mahadevan, S., Maggioni, M.: Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. JMLR 8, 2169–2231 (2007)
Menache, N., Shimkin, N., Mannor, S.: Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134, 215–238 (2005)
Nabney, I.T.: Netlab: Algorithms for Pattern Recognition. Springer, Heidelberg (2002)
Parr, R., Painter-Wakefield, C., Li, L., Littman, M.: Analyzing feature generation for value-function approximation. In: Proc. of ICML, vol. 24 (2007)
Poggio, T., Girosi, F.: Networks for approximation and learning. Proceedings of the IEEE 78(9), 1481–1497 (1990)
Rasmussen, C.E., Kuss, M.: Gaussian processes in reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 16, pp. 751–759. MIT Press, Cambridge (2004)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
Reisinger, J., Stone, P., Miikkulainen, R.: Online kernel selection for Bayesian reinforcement learning. In: Proc. of ICML, vol. 25 (2008)
Seeger, M., Williams, C.K.I., Lawrence, N.: Fast forward selection to speed up sparse Gaussian process regression. In: Proc. of 9th Int’l Workshhop on AI and Statistics. Soc. for AI and Statistics (2003)
Snelson, E., Ghahramani, Z.: Sparse Gaussian processes using pseudo-inputs. In: NIPS, vol. 18 (2006)
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Williams, C., Seeger, M.: Using the Nyström method to speed up kernel machines. In: NIPS, vol. 13, pp. 682–688 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jung, T., Stone, P. (2009). Feature Selection for Value Function Approximation Using Bayesian Model Selection. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04180-8_60
Download citation
DOI: https://doi.org/10.1007/978-3-642-04180-8_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04179-2
Online ISBN: 978-3-642-04180-8
eBook Packages: Computer ScienceComputer Science (R0)