Skip to main content
Log in

Adaptive exploration through covariance matrix adaptation enables developmental motor learning

  • Research Article
  • Published:
Paladyn

Abstract

The “Policy Improvement with Path Integrals” (PI2) [25] and “Covariance Matrix Adaptation — Evolutionary Strategy” [8] are considered to be state-of-the-art in direct reinforcement learning and stochastic optimization respectively. We have recently shown that incorporating covariance matrix adaptation into PI2- which yields the PI 2CMA algorithm — enables adaptive exploration by continually and autonomously reconsidering the exploration/exploitation trade-off. In this article, we provide an overview of our recent work on covariance matrix adaptation for direct reinforcement learning [22–24], highlight its relevance to developmental robotics, and conduct further experiments to analyze the results. We investigate two complementary phenomena from developmental robotics. First, we demonstrate PI 2CMA ’s ability to adapt to slowly or abruptly changing tasks due to its continual and adaptive exploration. This is an important component of life-long skill learning in dynamic environments. Second, we show on a reaching task how PI 2CMA subsequently releases degrees of freedom from proximal to more distal limbs as learning progresses. A similar effect is observed in human development, where it is known as ‘proximodistal maturation’.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. L. Arnold, A. Auger, N. Hansen, and Y. Ollivier. Informationgeometric optimization algorithms: A unifying picture via invariance principles. Technical report, INRIA Saclay, 2011.

    Google Scholar 

  2. A. Baranes and P-Y. Oudeyer. The interaction of maturational constraints and intrinsic motivations in active motor development. In IEEE International Conference on Development and Learning, 2011.

    Google Scholar 

  3. N. E. Berthier, R.K. Clifton, D.D. McCall, and D.J. Robin. Proximodistal structure of early reaching in human infants. Exp Brain Res, 1999.

    Google Scholar 

  4. L. Berthouze and M. Lungarella. Motor skill acquisition under environmental perturbations: On the necessity of alternate freezing and freeing degrees of freedom. Adaptive Behavior, 12(1): 47–63, 2004.

    Article  Google Scholar 

  5. Josh C. Bongard. Morphological change in machines accelerates the evolution of robust behavior. Proceedigns of the National Academy of Sciences of the United States of America (PNAS), January 2010.

    Google Scholar 

  6. Ronen I. Brafman and Moshe Tennenholtz. R-max — a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res., 3:213–231, March 2003. ISSN 1532-4435.

    MathSciNet  MATH  Google Scholar 

  7. T. Glasmachers, T. Schaul, S. Yi, D. Wierstra, and J. Schmidhuber. Exponential natural evolution strategies. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages 393–400. ACM, 2010.

    Chapter  Google Scholar 

  8. N. Hansen and A. Ostermeier. Completely derandomized selfadaptation in evolution strategies. Evolutionary Computation, 9(2):159–195, 2001.

    Article  Google Scholar 

  9. A. J. Ijspeert, J. Nakanishi, and S. Schaal. Movement imitation with nonlinear dynamical systems in humanoid robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2002.

    Google Scholar 

  10. Michael Kearns and Satinder Singh. Near-optimal reinforcement learning in polynomial time. Mach. Learn., 49(2–3):209–232, 2002. ISSN 0885-6125.

    Article  MATH  Google Scholar 

  11. J. Konczak, M. Borutta, T Helge, and J. Dichgans. The development of goal-directed reaching in infants: hand trajectory formation and joint torque control. Experimental Brain Research, 1995.

    Google Scholar 

  12. A. Miyamae, Y. Nagata, I. Ono, and S. Kobayashi. Natural policy gradient methods with parameter-based exploration for control tasks. Advances in Neural Information Processing Systems, 2:437–441, 2010.

    Google Scholar 

  13. Y. Nagai, M. Asada, and K. Hosoda. Learning for joint attention helped by functional development. Advanced Robotic, 20(10), 2006.

    Google Scholar 

  14. Jan Peters and Stefan Schaal. Natural actor-critic. Neurocomputing, 71(7–9):1180–1190, 2008.

    Article  Google Scholar 

  15. R. Ros and N. Hansen. A Simple Modification in CMA-ES Achieving Linear Time and Space Complexity. In Proceedings on Parallel Problem Solving from Nature, 296–305, 2008.

    Chapter  Google Scholar 

  16. Thomas Rückstiess, Frank Sehnke, Tom Schaul, Daan Wierstra, Yi Sun, and Jürgen Schmidhuber. Exploring parameter space in reinforcement learning. Paladyn. Journal of Behavioral Robotics, 1:14–24, 2010. ISSN 2080-9778.

    Article  Google Scholar 

  17. A. Saltelli, K. Chan, and E. M. Scott. Sensitivity analysis. Chichester: Wiley, 2000.

    MATH  Google Scholar 

  18. Stefan Schaal. The sl simulation and real-time control software package. Technical report, University of Southern California, 2007.

    Google Scholar 

  19. Matthew Schlesinger, Domenico Parisi, and Jonas Langer. Learning to reach by constraining the movement search space. Developmental Science, 3:67–80, 2000.

    Article  Google Scholar 

  20. F. Sehnke, C. Osendorfer, T. Rückstie, A. Graves, J. Peters, and J. Schmidhuber. Parameter-exploring policy gradients. Neural Networks, 23(4):551–559, 2010.

    Article  Google Scholar 

  21. Andrew Stout, George D. Konidaris, and Andrew G. Barto. Intrinsically motivated reinforcement learning: A promising framework for developmental robot learning. In AAAI, 2005.

    Google Scholar 

  22. Freek Stulp. Adaptive exploration for continual reinforcement learning. In International Conference on Intelligent Robots and Systems (IROS), 2012.

    Google Scholar 

  23. Freek Stulp and Pierre-Yves Oudeyer. Emergent proximo-distal maturation through adaptive exploration. In International Conference on Development and Learning (ICDL), 2012. Paper of Excellence Award.

    Google Scholar 

  24. Freek Stulp and Olivier Sigaud. Path integral policy improvement with covariance matrix adaptation. In Proceedings of the 29th International Conference on Machine Learning (ICML), 2012.

    Google Scholar 

  25. Evangelos Theodorou, Jonas Buchli, and Stefan Schaal. A generalized path integral control approach to reinforcement learning. Journal of Machine Learning Research, 11:3137–3181, 2010.

    MathSciNet  MATH  Google Scholar 

  26. Sebastian B. Thrun. Effcient exploration in reinforcement learning. Technical Report CMU-CS-92-102, Carnegie-Mellon University, 1992.

    Google Scholar 

  27. R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8: 229–256, 1992.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Freek Stulp.

About this article

Cite this article

Stulp, F., Oudeyer, PY. Adaptive exploration through covariance matrix adaptation enables developmental motor learning. Paladyn 3, 128–135 (2012). https://doi.org/10.2478/s13230-013-0108-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.2478/s13230-013-0108-6

Keywords

Navigation