Adaptive exploration through covariance matrix adaptation enables developmental motor learning

Stulp, Freek; Oudeyer, Pierre-Yves

doi:10.2478/s13230-013-0108-6

Adaptive exploration through covariance matrix adaptation enables developmental motor learning

Research Article
Published: 20 April 2013

Volume 3, pages 128–135, (2012)
Cite this article

Paladyn

Freek Stulp^1,2 &
Pierre-Yves Oudeyer^1,2

50 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

The “Policy Improvement with Path Integrals” (PI²) [25] and “Covariance Matrix Adaptation — Evolutionary Strategy” [8] are considered to be state-of-the-art in direct reinforcement learning and stochastic optimization respectively. We have recently shown that incorporating covariance matrix adaptation into PI²- which yields the PI ²_CMA algorithm — enables adaptive exploration by continually and autonomously reconsidering the exploration/exploitation trade-off. In this article, we provide an overview of our recent work on covariance matrix adaptation for direct reinforcement learning [22–24], highlight its relevance to developmental robotics, and conduct further experiments to analyze the results. We investigate two complementary phenomena from developmental robotics. First, we demonstrate PI ²_CMA ’s ability to adapt to slowly or abruptly changing tasks due to its continual and adaptive exploration. This is an important component of life-long skill learning in dynamic environments. Second, we show on a reaching task how PI ²_CMA subsequently releases degrees of freedom from proximal to more distal limbs as learning progresses. A similar effect is observed in human development, where it is known as ‘proximodistal maturation’.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

L. Arnold, A. Auger, N. Hansen, and Y. Ollivier. Informationgeometric optimization algorithms: A unifying picture via invariance principles. Technical report, INRIA Saclay, 2011.
Google Scholar
A. Baranes and P-Y. Oudeyer. The interaction of maturational constraints and intrinsic motivations in active motor development. In IEEE International Conference on Development and Learning, 2011.
Google Scholar
N. E. Berthier, R.K. Clifton, D.D. McCall, and D.J. Robin. Proximodistal structure of early reaching in human infants. Exp Brain Res, 1999.
Google Scholar
L. Berthouze and M. Lungarella. Motor skill acquisition under environmental perturbations: On the necessity of alternate freezing and freeing degrees of freedom. Adaptive Behavior, 12(1): 47–63, 2004.
Article Google Scholar
Josh C. Bongard. Morphological change in machines accelerates the evolution of robust behavior. Proceedigns of the National Academy of Sciences of the United States of America (PNAS), January 2010.
Google Scholar
Ronen I. Brafman and Moshe Tennenholtz. R-max — a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res., 3:213–231, March 2003. ISSN 1532-4435.
MathSciNet MATH Google Scholar
T. Glasmachers, T. Schaul, S. Yi, D. Wierstra, and J. Schmidhuber. Exponential natural evolution strategies. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages 393–400. ACM, 2010.
Chapter Google Scholar
N. Hansen and A. Ostermeier. Completely derandomized selfadaptation in evolution strategies. Evolutionary Computation, 9(2):159–195, 2001.
Article Google Scholar
A. J. Ijspeert, J. Nakanishi, and S. Schaal. Movement imitation with nonlinear dynamical systems in humanoid robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2002.
Google Scholar
Michael Kearns and Satinder Singh. Near-optimal reinforcement learning in polynomial time. Mach. Learn., 49(2–3):209–232, 2002. ISSN 0885-6125.
Article MATH Google Scholar
J. Konczak, M. Borutta, T Helge, and J. Dichgans. The development of goal-directed reaching in infants: hand trajectory formation and joint torque control. Experimental Brain Research, 1995.
Google Scholar
A. Miyamae, Y. Nagata, I. Ono, and S. Kobayashi. Natural policy gradient methods with parameter-based exploration for control tasks. Advances in Neural Information Processing Systems, 2:437–441, 2010.
Google Scholar
Y. Nagai, M. Asada, and K. Hosoda. Learning for joint attention helped by functional development. Advanced Robotic, 20(10), 2006.
Google Scholar
Jan Peters and Stefan Schaal. Natural actor-critic. Neurocomputing, 71(7–9):1180–1190, 2008.
Article Google Scholar
R. Ros and N. Hansen. A Simple Modification in CMA-ES Achieving Linear Time and Space Complexity. In Proceedings on Parallel Problem Solving from Nature, 296–305, 2008.
Chapter Google Scholar
Thomas Rückstiess, Frank Sehnke, Tom Schaul, Daan Wierstra, Yi Sun, and Jürgen Schmidhuber. Exploring parameter space in reinforcement learning. Paladyn. Journal of Behavioral Robotics, 1:14–24, 2010. ISSN 2080-9778.
Article Google Scholar
A. Saltelli, K. Chan, and E. M. Scott. Sensitivity analysis. Chichester: Wiley, 2000.
MATH Google Scholar
Stefan Schaal. The sl simulation and real-time control software package. Technical report, University of Southern California, 2007.
Google Scholar
Matthew Schlesinger, Domenico Parisi, and Jonas Langer. Learning to reach by constraining the movement search space. Developmental Science, 3:67–80, 2000.
Article Google Scholar
F. Sehnke, C. Osendorfer, T. Rückstie, A. Graves, J. Peters, and J. Schmidhuber. Parameter-exploring policy gradients. Neural Networks, 23(4):551–559, 2010.
Article Google Scholar
Andrew Stout, George D. Konidaris, and Andrew G. Barto. Intrinsically motivated reinforcement learning: A promising framework for developmental robot learning. In AAAI, 2005.
Google Scholar
Freek Stulp. Adaptive exploration for continual reinforcement learning. In International Conference on Intelligent Robots and Systems (IROS), 2012.
Google Scholar
Freek Stulp and Pierre-Yves Oudeyer. Emergent proximo-distal maturation through adaptive exploration. In International Conference on Development and Learning (ICDL), 2012. Paper of Excellence Award.
Google Scholar
Freek Stulp and Olivier Sigaud. Path integral policy improvement with covariance matrix adaptation. In Proceedings of the 29th International Conference on Machine Learning (ICML), 2012.
Google Scholar
Evangelos Theodorou, Jonas Buchli, and Stefan Schaal. A generalized path integral control approach to reinforcement learning. Journal of Machine Learning Research, 11:3137–3181, 2010.
MathSciNet MATH Google Scholar
Sebastian B. Thrun. Effcient exploration in reinforcement learning. Technical Report CMU-CS-92-102, Carnegie-Mellon University, 1992.
Google Scholar
R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8: 229–256, 1992.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Robotics and Computer Vision, ENSTA-ParisTech, Paris, France
Freek Stulp & Pierre-Yves Oudeyer
FLOWERS Team, INRIA Bordeaux Sud-Ouest, Talence, France
Freek Stulp & Pierre-Yves Oudeyer

Authors

Freek Stulp
View author publications
You can also search for this author in PubMed Google Scholar
Pierre-Yves Oudeyer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Freek Stulp.

About this article

Cite this article

Stulp, F., Oudeyer, PY. Adaptive exploration through covariance matrix adaptation enables developmental motor learning. Paladyn 3, 128–135 (2012). https://doi.org/10.2478/s13230-013-0108-6

Download citation

Received: 15 December 2012
Accepted: 27 March 2013
Published: 20 April 2013
Issue Date: September 2012
DOI: https://doi.org/10.2478/s13230-013-0108-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive exploration through covariance matrix adaptation enables developmental motor learning

Abstract

Access this article

Similar content being viewed by others

Autonomous Learning of Internal Dynamic Models for Reaching Tasks

Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints

Teaching Humanoid Robot Reaching Motion by Imitation and Reinforcement Learning

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords

Navigation

Adaptive exploration through covariance matrix adaptation enables developmental motor learning

Abstract

Access this article

Similar content being viewed by others

Autonomous Learning of Internal Dynamic Models for Reaching Tasks

Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints

Teaching Humanoid Robot Reaching Motion by Imitation and Reinforcement Learning

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

Search

Navigation