Skip to main content

Sampling Efficiency in Learning Robot Motion

  • Chapter
  • First Online:
Reinforcement Learning of Bimanual Robot Skills

Part of the book series: Springer Tracts in Advanced Robotics ((STAR,volume 134))

  • 993 Accesses

Abstract

Policy Search (PS) algorithms are nowadays widely used for their simplicity and effectiveness in finding solutions for robotic problems. However, most current PS algorithms derive policies by statistically fitting the data from the best experiments only. This means that those experiments yielding a poor performance are usually discarded or given too little influence on the policy update. In this chapter, we propose a generalization of the Relative Entropy Policy Search (REPS) algorithm that takes bad experiences into consideration when computing a policy. The proposed approach, named Dual REPS (DREPS) [1], following the philosophical interpretation of the duality between good and bad, finds clusters of experimental data yielding a poor behavior and adds them to the optimization problem as a repulsive constraint. Thus, considering there is a duality between good and bad data samples, both are taken into account in the stochastic search for a policy. Additionally, a cluster with the best samples may be included as an attractor to enforce faster convergence to a single optimal solution in multimodal problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Colomé, A., Torras, C.: Dual REPS: a generalization of relative entropy policy search exploiting bad experiences. IEEE Trans. Robot. 33(4), 978–985 (2017)

    Article  Google Scholar 

  2. Daniel, C., Neumann, G., Kroemer, O., Peters, J.: Hierarchical relative entropy policy search. J. Mach. Learn. Res. 17(93), 1–50 (2016)

    MathSciNet  MATH  Google Scholar 

  3. Deisenroth, M.P., Neumann, G., Peters, J.: A survey on policy search for robotics. Found. Trends Robot. 2(1–2), 1–142 (2013)

    Google Scholar 

  4. Gómez,V., Kappen, H.J., Peters, J., Neumann, G.: Policy search for path integral control. In: European Conference in Machine Learning and Knowledge Discovery in Databases (ECML), pp. 482–497 (2014)

    Chapter  Google Scholar 

  5. Jevtic, A., Colomé, A., Alenyà, G., Torras, C.: Learning robot motion through user intervention and policy search. In: ICRA Workshop on Nature versus Nurture in Robotics (2016)

    Google Scholar 

  6. Jevtic, A., Colomé, A., Alenyà, G., Torras, C.: User evaluation of an interactive learning framework for single-arm and dual-arm robots. In: 8th International Conference on Social Robotics, pp. 52–61 (2016)

    Chapter  Google Scholar 

  7. Jevtic, A., Colomé, A., Alenyà, G., Torras, C.: Robot motion adaptation through user intervention and reinforcement learning. Pattern Recogn. Lett. 105, 67–75 (2018)

    Article  Google Scholar 

  8. Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for k-means clustering. Pattern Recogn. Lett. 25(11), 1293–1302 (2004)

    Article  Google Scholar 

  9. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (2006)

    Article  MathSciNet  Google Scholar 

  10. Neumann, G.: Variational inference for policy search in changing situations. In: International Conference on Machine Learning, pp. 817–824 (2011)

    Google Scholar 

  11. Peters, J., Mülling, K., Altün, Y.: Relative entropy policy search. In: AAAI Conference on Artificial Intelligence, pp. 1607–1612 (2010)

    Google Scholar 

  12. Schaal, S., Peters, J., Nakanishi, J., Ijspeert, A.J.: Learning movement primitives. In: 11th International Symposium on Robotics Research, pp. 561–572 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrià Colomé .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Colomé, A., Torras, C. (2020). Sampling Efficiency in Learning Robot Motion. In: Reinforcement Learning of Bimanual Robot Skills. Springer Tracts in Advanced Robotics, vol 134. Springer, Cham. https://doi.org/10.1007/978-3-030-26326-3_6

Download citation

Publish with us

Policies and ethics