Learning a Humanoid Kick with Controlled Distance

  • Abbas AbdolmalekiEmail author
  • David Simões
  • Nuno Lau
  • Luis Paulo Reis
  • Gerhard Neumann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9776)


We investigate the learning of a flexible humanoid robot kick controller, i.e., the controller should be applicable for multiple contexts, such as different kick distances, initial robot position with respect to the ball or both. Current approaches typically tune or optimise the parameters of the biped kick controller for a single context, such as a kick with longest distance or a kick with a specific distance. Hence our research question is that, how can we obtain a flexible kick controller that controls the robot (near) optimally for a continuous range of kick distances? The goal is to find a parametric function that given a desired kick distance, outputs the (near) optimal controller parameters. We achieve the desired flexibility of the controller by applying a contextual policy search method. With such a contextual policy search algorithm, we can generalize the robot kick controller for different distances, where the desired distance is described by a real-valued vector. We will also show that the optimal parameters of the kick controller is a non-linear function of the desired distances and a linear function will fail to properly generalize the kick controller over desired kick distances.


Contextual policy search Motor learning Humanoid robot Non-linear policies 



The first author was supported by FCT under grant SFRH/BD/81155/2011. The work was also partially funded by the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 and by FCT Portuguese Foundation for Science and Technology under projects PEst-OE/EEI/UI0027/2013 and UID/CEC/00127/2013 (IEETA and LIACC). The work was also funded by project EuRoC, reference 608849 from call FP7-2013-NMP-ICT-FOF.


  1. 1.
    Ferreira, R., Reis, L.P., Moreira, A.P., Lau, N.: Development of an omnidirectional kick for a NAO humanoid robot. In: Pavón, J., Duque-Méndez, N.D., Fuentes-Fernández, R. (eds.) IBERAMIA 2012. LNCS, vol. 7637, pp. 571–580. Springer, Heidelberg (2012). Scholar
  2. 2.
    Hansen, N., Muller, S.D., Koumoutsakos, P.: Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evol. Comput. 11, 1–18 (2003)CrossRefGoogle Scholar
  3. 3.
    Sun, Y., Wierstra, D., Schaul, T., Schmidhuber, J.: Efficient natural evolution strategies. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation (GECCO) (2009)Google Scholar
  4. 4.
    Stulp, F., Sigaud, O.: Path integral policy improvement with covariance matrix adaptation. In: International Conference on Machine Learning (ICML) (2012)Google Scholar
  5. 5.
    Rückstieß, T., Felder, M., Schmidhuber, J.: State-dependent exploration for policy gradient methods. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS, vol. 5212, pp. 234–249. Springer, Heidelberg (2008). Scholar
  6. 6.
    Mannor, S., Rubinstein, R., Gat, Y.: The cross entropy method for fast policy search. In: Proceedings of the 20th International Conference on Machine Learning (ICML) (2003)Google Scholar
  7. 7.
    Theodorou, E., Buchli, J., Schaal, S.: A generalized path integral control approach to reinforcement learning. J. Mach. Learn. Res. 11, 3137–3181 (2010)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Kupcsik, A., Deisenroth, M.P., Peters, J., Neumann, G.: Data-efficient contextual policy search for robot movement skills. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2013)Google Scholar
  9. 9.
    Abdolmaleki, A., Lioutikov, R., Peters, J., Lua, N., Reis, L.P., Neumann, G.: Regularized covariance estimation for weighted maximum likelihood policy search methods. In: Advances in Neural Information Processing Systems (NIPS), MIT Press (2015)Google Scholar
  10. 10.
    Depinet, M., MacAlpine, P., Stone, P.: Keyframe sampling, optimization, and behavior integration: towards long-distance kicking in the RoboCup 3D simulation league. In: Bianchi, R.A.C., Akin, H.L., Ramamoorthy, S., Sugiura, K. (eds.) RoboCup 2014. LNCS, vol. 8992, pp. 571–582. Springer, Cham (2015). Scholar
  11. 11.
    Wang, J.M., Fleet, D.J., Hertzmann, A.: Optimizing walking controllers. ACM Trans. Graph. (TOG) 28(5), 168 (2009)Google Scholar
  12. 12.
    Niehaus, C., Röfer, T., Laue, T.: Gait optimization on a humanoid robot using particle swarm optimization. In: Proceedings of the Second Workshop on Humanoid Soccer Robots in conjunction with the, pp. 1–7 (2007)Google Scholar
  13. 13.
    Abdolmaleki, A., Lua, N., Reis, L.P., Peters, J., Neumann, G.: Contextual policy search for generalizing a parameterized biped walking controller. In: IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC) (2015)Google Scholar
  14. 14.
    Daniel, C., Neumann, G., Peters, J.: Hierarchical relative entropy policy search. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2012)Google Scholar
  15. 15.
    Abdolmaleki, A., Lua, N., Reis, L.P., Neumann, G.: Regularized covariance estimation for weighted maximum likelihood policy search methods. In: Proceedings of the International Conference on Humanoid Robots (HUMANOIDS) (2015)Google Scholar
  16. 16.
    Molga, M., Smutnicki, C.: Test functions for optimization needs (2005).
  17. 17.
    The MagmaOffenburg RoboCup 3D Simulation Team. Magma challenge tool [computer software].

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Abbas Abdolmaleki
    • 1
    • 2
    • 3
    Email author
  • David Simões
    • 1
  • Nuno Lau
    • 1
  • Luis Paulo Reis
    • 2
    • 3
  • Gerhard Neumann
    • 4
  1. 1.IEETA, DETIUniversity of AveiroAveiroPortugal
  2. 2.DSIUniversity of MinhoBragaPortugal
  3. 3.LIACCUniversity of PortoPortoPortugal
  4. 4.CLASTU DarmstadtDarmstadtGermany

Personalised recommendations