Sampling Efficiency in Learning Robot Motion

  • Adrià ColoméEmail author
  • Carme Torras
Part of the Springer Tracts in Advanced Robotics book series (STAR, volume 134)


Policy Search (PS) algorithms are nowadays widely used for their simplicity and effectiveness in finding solutions for robotic problems. However, most current PS algorithms derive policies by statistically fitting the data from the best experiments only. This means that those experiments yielding a poor performance are usually discarded or given too little influence on the policy update. In this chapter, we propose a generalization of the Relative Entropy Policy Search (REPS) algorithm that takes bad experiences into consideration when computing a policy. The proposed approach, named Dual REPS (DREPS) [1], following the philosophical interpretation of the duality between good and bad, finds clusters of experimental data yielding a poor behavior and adds them to the optimization problem as a repulsive constraint. Thus, considering there is a duality between good and bad data samples, both are taken into account in the stochastic search for a policy. Additionally, a cluster with the best samples may be included as an attractor to enforce faster convergence to a single optimal solution in multimodal problems.


  1. 1.
    Colomé, A., Torras, C.: Dual REPS: a generalization of relative entropy policy search exploiting bad experiences. IEEE Trans. Robot. 33(4), 978–985 (2017)CrossRefGoogle Scholar
  2. 2.
    Daniel, C., Neumann, G., Kroemer, O., Peters, J.: Hierarchical relative entropy policy search. J. Mach. Learn. Res. 17(93), 1–50 (2016)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Deisenroth, M.P., Neumann, G., Peters, J.: A survey on policy search for robotics. Found. Trends Robot. 2(1–2), 1–142 (2013)Google Scholar
  4. 4.
    Gómez,V., Kappen, H.J., Peters, J., Neumann, G.: Policy search for path integral control. In: European Conference in Machine Learning and Knowledge Discovery in Databases (ECML), pp. 482–497 (2014)CrossRefGoogle Scholar
  5. 5.
    Jevtic, A., Colomé, A., Alenyà, G., Torras, C.: Learning robot motion through user intervention and policy search. In: ICRA Workshop on Nature versus Nurture in Robotics (2016)Google Scholar
  6. 6.
    Jevtic, A., Colomé, A., Alenyà, G., Torras, C.: User evaluation of an interactive learning framework for single-arm and dual-arm robots. In: 8th International Conference on Social Robotics, pp. 52–61 (2016)CrossRefGoogle Scholar
  7. 7.
    Jevtic, A., Colomé, A., Alenyà, G., Torras, C.: Robot motion adaptation through user intervention and reinforcement learning. Pattern Recogn. Lett. 105, 67–75 (2018)CrossRefGoogle Scholar
  8. 8.
    Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for k-means clustering. Pattern Recogn. Lett. 25(11), 1293–1302 (2004)CrossRefGoogle Scholar
  9. 9.
    Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (2006)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Neumann, G.: Variational inference for policy search in changing situations. In: International Conference on Machine Learning, pp. 817–824 (2011)Google Scholar
  11. 11.
    Peters, J., Mülling, K., Altün, Y.: Relative entropy policy search. In: AAAI Conference on Artificial Intelligence, pp. 1607–1612 (2010)Google Scholar
  12. 12.
    Schaal, S., Peters, J., Nakanishi, J., Ijspeert, A.J.: Learning movement primitives. In: 11th International Symposium on Robotics Research, pp. 561–572 (2005)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Institut de Robòtica i Informàtica Industrial (UPC-CSIC)BarcelonaSpain

Personalised recommendations