Abstract
Discretization of state and action spaces is a critical issue in Q-Learning. In our contribution, we propose a real-time adaptation of the discretization by the progressive widening technique which has been already used in bandit-based methods. Results are consistently converging to the optimum of the problem, without changing the parametrization for each new problem.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Davies, S.: Multidimensional Triangulation and Interpolation for Reinforcement Learning. In: Advances in Neural Information Processing Systems (1997)
Munos, R., Moore, A.: Variable Resolution Discretization in Optimal Control. Technical report, Robotics Institute, CMU (1999)
Munos, R., Moore, A.W.: Variable Resolution Discretization for High-accuracy Solutions of Optimal Control Problems. In: IJCAI, pp. 1348–1355 (1999)
Albus, J.S.: A New Approach to Manipulator Control: the Cerebellar Model Articulation Controller. Journal of Dynamic Systems, Measurement, and Control 97, 220–227 (1975)
Burgin, G.: Using Cerebellar Arithmetic Computers. AI Expert 7 (1992)
Gaskett, C., Wettergreen, D., Zelinsky, A.: Q-learning in Continuous State and Action Spaces. In: Foo, N.Y. (ed.) AI 1999. LNCS, vol. 1747, pp. 417–428. Springer, Heidelberg (1999)
Gersho, A., Gray, R.M.: Vector Quantization and Signal Compression. Kluwer Academic Publishers (1991)
Stone, P., Sutton, R.S., Kuhlmann, G.: Reinforcement Learning for Robocup-soccer Keepaway. Adaptive Behavior 3, 165–188 (2005)
Fernández, F., Borrajo, D.: Two Steps Reinforcement Learning. International Journal of Intelligent Systems 2, 213–245 (2008)
Lampton, A., Valasek, J.: Multiresolution State-Space Discretization Method for Q-Learning. In: American Control Conference (2009)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: an Introduction. MIT Press (1998)
Watkings, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, Cambridge University (1989)
Couëtoux, A., Hoock, J.B., Sokolovska, N., Teytaud, O., Bonnard, N.: Continuous Upper Confidence Trees. In: International Conference on Learning and Intelligent Optimization (2011)
Coulom, R.: Monte-Carlo Tree Search in Crazy Stone. In: Game Programming Workshop (2007)
Rolet, P., Sebag, M., Teytaud, O.: Boosting Active Learning to Optimality: a Tractable Monte-Carlo, Billiard-based Algorithm. In: European Conference on Machine Learning (2009)
Wang, Y., Audibert, J.Y., Munos, R.: Algorithms for Infinitely Many-armed Bandits. In: Advances in Neural Information Processing Systems (2008)
Coulom, R.: Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In: Ciancarini, P., van den Herik, H.J. (eds.) Proceedings of the 5th International Conference on Computers and Games, Turin, Italy (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sokolovska, N., Teytaud, O., Milone, M. (2011). Q-Learning with Double Progressive Widening: Application to Robotics. In: Lu, BL., Zhang, L., Kwok, J. (eds) Neural Information Processing. ICONIP 2011. Lecture Notes in Computer Science, vol 7064. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24965-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-24965-5_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24964-8
Online ISBN: 978-3-642-24965-5
eBook Packages: Computer ScienceComputer Science (R0)