Abstract
Actor-Critics constitute an important class of reinforcement learning algorithms that can deal with continuous actions and states in an easy and natural way. In their original, sequential form, these algorithms are usually to slow to be applicable to real-life problems. However, they can be augmented by the technique of experience replay to obtain a satisfactory of learning without degrading their convergence properties. In this paper experimental results are presented that show that the combination of experience replay and Actor-Critics yields very fast learning algorithms that achieve successful policies for nontrivial control tasks in considerably short time. Namely, a policy for a model of 6-degree-of-freedom walking robot is obtained after 4 hours of the robot’s time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bartlett, P.L., Baxter, J.: Stochastic optimization of controlled partially observable markov decision processes. In: Proc. of the 39th IEEE Conf. on Decision and Control (CDC 2000), vol. 1, pp. 124–129 (2000)
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can learn difficult learning control problems. IEEE Trans. on SMC 13, 834–846 (1983)
Bhatnagar, S., Sutton, R.S., Ghavamzadeh, M., Lee, M.: Incremental natural actor-critic algorithms. In: Advances in NIPS, vol. 21 (2008)
Cichosz, P.: An analysis of experience replay in temporal difference learning. Cybernetics and Systems 30, 341–363 (1999)
Kimura, H., Kobayashi, S.: An analysis of actor/critic algorithm using eligibility traces: Reinforcement learning with imperfect value functions. In: Proc. of the 15th ICML, pp. 278–286 (1998)
Konda, V., Tsitsiklis, J.: Actor-critic algorithms. SIAM Journal on Control and Optimization 42(4), 1143–1166 (2003)
Lin, L.-J.: Reinforcement learning for robots using neural networks. Ph.D thesis, Carnegie Mellon University, Pittsburgh, PA, USA (1992)
Mahadevan, S., Connell, J.: Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence 55(2-3), 311–365 (1992)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Watkins, C., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)
Wawrzyński, P.: Learning to control a 6-degree-of-freedom walking robot. In: Proc. of EUROCON 2007, pp. 698–705 (2007)
Wawrzyński, P., Pacut, A.: Truncated importance sampling for reinforcement learning with experience replay. In: Proc. CSIT Int. Multiconf., pp. 305–315 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wawrzyński, P. (2009). A Cat-Like Robot Real-Time Learning to Run. In: Kolehmainen, M., Toivanen, P., Beliczynski, B. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2009. Lecture Notes in Computer Science, vol 5495. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04921-7_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-04921-7_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04920-0
Online ISBN: 978-3-642-04921-7
eBook Packages: Computer ScienceComputer Science (R0)