Advertisement

Journal of Intelligent & Robotic Systems

, Volume 95, Issue 1, pp 77–97 | Cite as

An Interactive Framework for Learning Continuous Actions Policies Based on Corrective Feedback

  • Carlos CeleminEmail author
  • Javier Ruiz-del-Solar
Article

Abstract

The main goal of this article is to present COACH (COrrective Advice Communicated by Humans), a new learning framework that allows non-expert humans to advise an agent while it interacts with the environment in continuous action problems. The human feedback is given in the action domain as binary corrective signals (increase/decrease the current action magnitude), and COACH is able to adjust the amount of correction that a given action receives adaptively, taking state-dependent past feedback into consideration. COACH also manages the credit assignment problem that normally arises when actions in continuous time receive delayed corrections. The proposed framework is characterized and validated extensively using four well-known learning problems. The experimental analysis includes comparisons with other interactive learning frameworks, with classical reinforcement learning approaches, and with human teleoperators trying to solve the same learning problems by themselves. In all the reported experiments COACH outperforms the other methods in terms of learning speed and final performance. It is of interest to add that COACH has been applied successfully for addressing a complex real-world learning problem: the dribbling of the ball by humanoid soccer players.

Keywords

Learning from demonstration Interactive machine learning Human feedback Human teachers Decision making systems 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

This work was partially funded by FONDECYT project 1161500 and CONICYT-PCHA/Doctorado Nacional/2015-21151488.

Supplementary material

10846_2018_839_MOESM1_ESM.mp4 (212.3 mb)
(MP4 212 MB)

References

  1. 1.
    Knox, W.B., Stone, P.: Interactively shaping agents via human reinforcement: the TAMER framework. In: The Fifth International Conference on Knowledge Capture (2009)Google Scholar
  2. 2.
    Argall, B.D., Browning, B., Veloso, M.: Learning robot motion control with demonstration and advice-operators. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems (2008)Google Scholar
  3. 3.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: an Introduction, vol. 1, no. 1. MIT Press, Cambridge (1998)Google Scholar
  4. 4.
    Leottau, L., Celemin, C., Ruiz-del-Solar, J.: Ball dribbling for humanoid biped robots: a reinforcement learning and fuzzy control approach. In: Robocup 2014: Robot World Cup XVIII, pp. 549–561. Springer (2015)Google Scholar
  5. 5.
    Randløv, J., Alstrøm, P.: Learning to drive a bicycle using reinforcement learning and shaping. In: ICML, vol. 98, pp. 463–471 (1998)Google Scholar
  6. 6.
    Vien, N.A., Ertel, W., Chung, T.C.: Learning via human feedback in continuous state and action spaces. Appl. Intell. 39(2), 267–278 (2013)CrossRefGoogle Scholar
  7. 7.
    Celemin, C., Ruiz-del-Solar, J.: Interactive learning of continuous actions from corrective advice communicated by humans. In: Robocup 2015: Robot World Cup XIX (2015)Google Scholar
  8. 8.
    Celemin, C., Ruiz-del-Solar, J.: COACH: learning continuous actions from corrective advice communicated by humans. In: 2015 International Conference on Advanced Robotics (ICAR), pp. 581–586 (2015)Google Scholar
  9. 9.
    Chernova, S., Thomaz, A.L.: Robot learning from human teachers. Synth. Lect. Artif. Intell. Mach. Learn. 8(3), 1–121 (2014)CrossRefGoogle Scholar
  10. 10.
    Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Rob. Auton. Syst. 57(5), 469–483 (2009)CrossRefGoogle Scholar
  11. 11.
    Billard, A., Calinon, S., Dillmann, R., Schaal, S.: Robot programming by demonstration. In: Springer handbook of robotics, pp. 1371–1394. Springer (2008)Google Scholar
  12. 12.
    Billing, E.A., Hellström, T.: A formalism for learning from demonstration. Paladyn J. Behav. Robot. 1(1), 1–13 (2010)CrossRefGoogle Scholar
  13. 13.
    Cuayáhuitl, H., van Otterlo, M., Dethlefs, N., Frommberger, L.: Machine learning for interactive systems and robots: a brief introduction. In: Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication, pp. 19–28, ACM (2013)Google Scholar
  14. 14.
    Amershi, S., Cakmak, M., Knox, W.B., Kulesza, T.: Power to the people: the role of humans in interactive machine learning. AI Mag. 35(4), 105–120 (2014)CrossRefGoogle Scholar
  15. 15.
    Fails, J.A., Olsen, D.R. Jr: Interactive machine learning. In: Proceedings of the 8th International Conference on Intelligent User Interfaces, pp. 39–45 (2003)Google Scholar
  16. 16.
    Ware, M., Frank, E., Holmes, G., Hall, M., Witten, I.H.: Interactive machine learning: letting users build classifiers. Int. J. Hum. Comput. Stud. 55(3), 281–292 (2001)CrossRefzbMATHGoogle Scholar
  17. 17.
    Amershi, S., Fogarty, J., Weld, D.: Regroup: interactive machine learning for on-demand group creation in social networks. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 21–30 (2012)Google Scholar
  18. 18.
    Ngo, H., Luciw, M., Nagi, J., Forster, A., Schmidhuber, J., Vien, N.A.: Efficient interactive multiclass learning from binary feedback. ACM Trans. Interact. Intell. Syst. 4(3), 1–25 (2014)CrossRefGoogle Scholar
  19. 19.
    Aler, R., Garcia, O., Valls, J.M.: Correcting and improving imitation models of humans for robosoccer agents. In: The 2005 IEEE Congress on Evolutionary Computation, 2005, vol. 3, pp. 2402–2409 (2005)Google Scholar
  20. 20.
    Grollman, D.H., Jenkins, O.C.: Learning robot soccer skills from demonstration. In: IEEE 6th International Conference on Development and Learning, 2007. ICDL 2007, pp. 276–281 (2007)Google Scholar
  21. 21.
    Chernova, S., Veloso, M.: Multi-thresholded approach to demonstration selection for interactive robot learning. In: 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 225–232 (2008)Google Scholar
  22. 22.
    Weiss, A., Igelsböck, J., Calinon, S., Billard, A., Tscheligi, M.: Teaching a humanoid: a user study on learning by demonstration with hoap-3. In: The 18th IEEE International Symposium on Robot and Human Interactive Communication, 2009. RO-MAN 2009, pp. 147–152 (2009)Google Scholar
  23. 23.
    Breazeal, C., Berlin, M., Brooks, A., Gray, J., Thomaz, A.L.: Using perspective taking to learn from ambiguous demonstrations. Rob. Auton. Syst. 54(5), 385–393 (2006)CrossRefGoogle Scholar
  24. 24.
    Silver, D., Bagnell, J.A., Stentz, A.: Learning from demonstration for autonomous navigation in complex unstructured terrain. Int. J. Rob. Res. 29(12), 1565–1592 (2010)CrossRefGoogle Scholar
  25. 25.
    Yu, C.-C., Wang, C.-C.: Interactive learning from demonstration with a multilevel mechanism for collision-free navigation in dynamic environments. In: 2013 Conference on Technologies and Applications of Artificial Intelligence (TAAI), pp. 240–245 (2013)Google Scholar
  26. 26.
    Sweeney, J.D., Grupen, R.: A model of shared grasp affordances from demonstration. In: 2007 7th IEEE-RAS International Conference on Humanoid Robots, pp. 27–35 (2007)Google Scholar
  27. 27.
    Lin, Y., Ren, S., Clevenger, M., Sun, Y.: Learning grasping force from demonstration. In: 2012 IEEE International Conference on Robotics and Automation (ICRA), pp. 1526–1531 (2012)Google Scholar
  28. 28.
    Chernova, S.: Interactive policy learning through con?dence-based autonomy (2009).pdf. J. Artif. Intell. Res. 34, 1–25 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Meriçli, C., Veloso, M., Akin, H.: Complementary humanoid behavior shaping using corrective demonstration. In: 2010 10th IEEE-RAS International Conference on Humanoid Robots (Humanoids), pp. 334–339 (2010)Google Scholar
  30. 30.
    Meriçli, Ç., Veloso, M., Akin, H.: Task refinement for autonomous robots using complementary corrective human feedback. Int. J. Adv. Robot. Syst. 8(2), 68–79 (2011)CrossRefGoogle Scholar
  31. 31.
    Mericli, C.: Multi-Resolution Model Plus Correction Paradigm for Task and Skill Refinement on Autonomous Robots, Citeseer p. 135 (2011)Google Scholar
  32. 32.
    Argall, B.D.: Learning mobile robot motion control from demonstration and corrective feedback. Thesis (2009)Google Scholar
  33. 33.
    Argall, B.D., Browning, B., Veloso, M.M.: Teacher feedback to scaffold and refine demonstrated motion primitives on a mobile robot. Rob. Auton. Syst. 59(3–4), 243–255 (2011)CrossRefGoogle Scholar
  34. 34.
    Meriçli, Ç., Veloso, M.: Improving biped walk stability using real-time corrective human feedback. In: Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 6556 LNAI, pp. 194–205 (2011)Google Scholar
  35. 35.
    Akrour, R., Schoenauer, M., Sebag, M.: Preference-based policy learning. In: Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), Vol. 6911 LNAI, No. PART 1, pp. 12–27 (2011)Google Scholar
  36. 36.
    Akrour, R., Schoenauer, M., Souplet, J.-C., Sebag, M.: Programming by feedback. In: Proceedings of the 31St International Conference on Machine Learning, vol. 32, pp. 1503–1511 (2014)Google Scholar
  37. 37.
    Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences. In: Advances in Neural Information Processing Systems, pp. 4302–4310 (2017)Google Scholar
  38. 38.
    Jain, A., Wojcik, B., Joachims, T., Saxena, A.: Learning trajectory preferences for manipulators via iterative improvement. In: Advances in neural information processing systems, pp. 575–583 (2013)Google Scholar
  39. 39.
    Mitsunaga, N., Smith, C., Kanda, T.: Adapting robot behavior for human – robot interaction. IEEE Trans. Robot. 24(4), 911–916 (2008)CrossRefGoogle Scholar
  40. 40.
    Tenorio-Gonzalez, A.C., Morales, E.F., Villaseñor-Pineda, L.: Dynamic reward shaping: training a robot by voice. In: Advances in Artificial Intelligence–IBERAMIA 2010, No. 214262, pp. 483–492. Springer (2010)Google Scholar
  41. 41.
    León, A., Morales, E.F., Altamirano, L., Ruiz, J.R.: Teaching a robot to perform task through imitation and on-line feedback. In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pp. 549–556 (2011)Google Scholar
  42. 42.
    Suay, H., Chernova, S.: Effect of human guidance and state space size on interactive reinforcement learning. In: RO-MAN, 2011 IEEE, pp. 1–6 (2011)Google Scholar
  43. 43.
    Pilarski, P.M., Dawson, M.R., Degris, T., Fahimi, F., Carey, J.P., Sutton, R.S.: Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning. In: IEEE International Conference on Rehabilitation Robotics, vol. 2011, p. 5975338 (2011)Google Scholar
  44. 44.
    Yanik, P.M., Manganelli, J., Merino, J., Threatt, A.L., Brooks, J.O., Green, K.E., Walker, I.D.: A gesture learning interface for simulated robot path shaping with a human teacher. IEEE Trans. Human-Machine Syst. 44(1), 41–54 (2014)CrossRefGoogle Scholar
  45. 45.
    Najar, A., Sigaud, O., Chetouani, M.: Training a robot with evaluative feedback and unlabeled guidance signals. In: IEEE International Symposium on Robot and Human Interactive Communication (ROMAN), pp. 261–266 (2016)Google Scholar
  46. 46.
    Knox, W.B., Stone, P.: TAMER: training an agent manually via evaluative reinforcement. In: 2008 7th IEEE International Conference on Development and Learning, pp. 292–297 (2008)Google Scholar
  47. 47.
    Knox, W.B.: Learning from human-generated reward. In: PhD Dissertation, The University of Texas at Austin (2012)Google Scholar
  48. 48.
    Haykin, S.: Neural networks: a comprehensive foundation. Knowl. Eng. Rev. 13, 4 (1999)zbMATHGoogle Scholar
  49. 49.
    Vien, N.A., Ertel, W.: Reinforcement learning combined with human feedback in continuous state and action spaces. In: 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL), pp. 1–6 (2012)Google Scholar
  50. 50.
    Thomaz, A., Hoffman, G., Breazeal, C.: Reinforcement learning with human teachers: understanding how people want to teach robots. In: Proceedings - IEEE International Workshop on Robot and Human Interactive Communication, pp. 352–357 (2006)Google Scholar
  51. 51.
    Toris, R., Suay, H. B., Chernova, S.: A practical comparison of three robot learning from demonstration algorithms. In: 2012 7th ACM/IEEE International Conference on Human-Robot Interact. (HRI), pp. 261–262 (2012)Google Scholar
  52. 52.
    Busoniu, L., Babuska, R., De Schutter, B., Ernst, D.: Reinforcement Learning and Dynamic Programming using Function Approximators, vol. 39. CRC Press (2010)Google Scholar
  53. 53.
    Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Rob. Res. 32, 1238–1274 (2013)CrossRefGoogle Scholar
  54. 54.
    Takagi, T., Sugeno, M.: Fuzzy identification of systems and its applications to modeling and control. IEEE Trans. Syst. Man Cybern. 1, 116–132 (1985)CrossRefzbMATHGoogle Scholar
  55. 55.
    Babuska, R.: Fuzzy and Neural Control. Disc Course Lecture Notes. Delft University Technology, Delft, Netherlands (2001)Google Scholar
  56. 56.
    Rahat, A.A.M.: Matlab implementation of controlling a bicycle using reinforcement learning. https://bitbucket.org/arahat/matlab-implementation-of-controlling-a-bicycle-using (2010)

Copyright information

© Springer Science+Business Media B.V., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Advanced Mining Technology Center & Department of Electrical EngineeringUniversidad de ChileSantiagoChile

Personalised recommendations