Natural Actor-Critic

Peters, Jan; Vijayakumar, Sethu; Schaal, Stefan

doi:10.1007/11564096_29

Jan Peters²³,
Sethu Vijayakumar²⁴ &
Stefan Schaal²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3720))

Included in the following conference series:

European Conference on Machine Learning

7217 Accesses
71 Citations

Abstract

This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari’s natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and Bradtke’s Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Empirical evaluations illustrate the effectiveness of our techniques in comparison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.

Download to read the full chapter text

Chapter PDF

Combine Deep Q-Networks with Actor-Critic

Integrated Actor-Critic for Deep Reinforcement Learning

An Improved Off-Policy Actor-Critic Algorithm with Historical Behaviors Reusing for Robotic Control

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10, 251–276 (1998)
Article Google Scholar
Bagnell, J., Schneider, J.: Covariant policy search. In: International Joint Conference on Artificial Intelligence (2003)
Google Scholar
Baird, L.C.: Advantage Updating. Wright Lab. Tech. Rep. WL-TR-93-1146 (1993)
Google Scholar
Baird, L.C., Moore, A.W.: Gradient descent for general reinforcement learning. In: Advances in Neural Information Processing Systems 11 (1999)
Google Scholar
Bartlett, P.: An introduction to reinforcement learning theory: Value function methods. In: Mendelson, S., Smola, A.J. (eds.) Advanced Lectures on Machine Learning. LNCS (LNAI), vol. 2600, pp. 184–202. Springer, Heidelberg (2003)
Chapter Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
MATH Google Scholar
Boyan, J.: Least-squares temporal difference learning. In: Machine Learning: Proceedings of the Sixteenth International Conference, pp. 49–56 (1999)
Google Scholar
Bradtke, S., Ydstie, E., Barto, A.G.: Adaptive Linear Quadratic Control Using Policy Iteration. University of Massachusetts, Amherst, MA (1994)
Google Scholar
Ijspeert, A., Nakanishi, J., Schaal, S.: Learning rhythmic movements by demonstration using nonlinear oscillators. In: IEEE International Conference on Intelligent Robots and Systems (IROS 2002), pp. 958–963 (2002)
Google Scholar
Kakade, S.A.: Natural policy gradient. In: Advances in Neural Information Processing Systems 14 (2002)
Google Scholar
Konda, V., Tsitsiklis, J.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems 12 (2000)
Google Scholar
Moon, T., Stirling, W.: Mathematical Methods and Algorithms for Signal Processing. Prentice Hall, Englewood Cliffs (2000)
Google Scholar
Peters, J., Vijaykumar, S., Schaal, S.: Reinforcement learning for humanoid robotics. In: IEEE International Conference on Humandoid Robots (2003)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge (1998)
Google Scholar
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems 12 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Southern California, Los Angeles, CA, 90089, USA
Jan Peters & Stefan Schaal
University of Edinburgh, Edinburgh, EH9 3JZ, United Kingdom
Sethu Vijayakumar

Authors

Jan Peters
View author publications
You can also search for this author in PubMed Google Scholar
Sethu Vijayakumar
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Schaal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics of the University of Porto, Portugal
João Gama
Faculdade de Engenharia & LIAAD, Universidade do Porto, Portugal
Rui Camacho
LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Rua de Ceuta, 118-6, 4050-190, Porto, Portugal
Pavel B. Brazdil
LIACC/FEP, Universidade do Porto, Portugal
Alípio Mário Jorge
LIAAD-INESC Porto LA / FEP, University of Porto, R. de Ceuta, 118, 6., 4050-190, Porto, Portugal
Luís Torgo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peters, J., Vijayakumar, S., Schaal, S. (2005). Natural Actor-Critic. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds) Machine Learning: ECML 2005. ECML 2005. Lecture Notes in Computer Science(), vol 3720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564096_29

Download citation

DOI: https://doi.org/10.1007/11564096_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29243-2
Online ISBN: 978-3-540-31692-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Natural Actor-Critic

Abstract

Chapter PDF

Similar content being viewed by others

Combine Deep Q-Networks with Actor-Critic

Integrated Actor-Critic for Deep Reinforcement Learning

An Improved Off-Policy Actor-Critic Algorithm with Historical Behaviors Reusing for Robotic Control

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Natural Actor-Critic

Abstract

Chapter PDF

Similar content being viewed by others

Combine Deep Q-Networks with Actor-Critic

Integrated Actor-Critic for Deep Reinforcement Learning

An Improved Off-Policy Actor-Critic Algorithm with Historical Behaviors Reusing for Robotic Control

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation