Abstract
For social robots, learning from an ordinary user should be socially appealing. Unfortunately, machine learning demands an enormous amount of human data, and a prolonged interactive teaching session becomes anti-social. We have addressed this problem in the context of reward shaping for reinforcement learning. For efficient reward shaping, a continuous stream of rewards is expected from the teacher. We present a simple framework which seeks rewards for a small number of steps from each of a large number of human teachers. Therefore, it simplifies the job of an individual teacher. The framework was tested with online crowd workers on a transport puzzle. We thoroughly analyzed the quality of the learned policies and crowd’s teaching behavior. Our results showed that nearly perfect policies can be learned using this framework. The framework was generally acceptable in the crowd’s opinion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chung, M.J.-Y., Forbes, M., Cakmak, M., Rao, R.P.: Accelerating imitation learning through crowdsourcing. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 4777–4784. IEEE (2014)
Forbes, M., Chung, M.J.-Y., Cakmak, M., Rao, R.P.: Robot programming by demonstration with crowdsourced action fixes. In: Second AAAI Conference on Human Computation and Crowdsourcing (2014)
Gabriel, V., Peng, B., Lasecki, W.S., Taylor, M.E.: Towards integrating real-time crowd advice with reinforcement learning. In: IUI Companion, pp. 17–20 (2015)
Knox, W.B., Stone, P.: Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In: International Foundation for Autonomous Agents and Multiagent Systems Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, vol. 1, pp. 5–12 (2010)
Loftin, R.T., MacGlashan, J., Peng, B., Taylor, M.E., Littman, M.L., Huang, J., Roberts, D.L.: A strategy-aware technique for learning behaviors from discrete human feedback. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)
Peng, B., MacGlashan, J., Loftin, R., Littman, M.L., Roberts, D.L., Taylor, M.E.: A need for speed: adapting agent action speed to improve task learning from non-expert humans. In: Paper Presented to the International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016) (2016)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press Cambridge, Cambridge (1998)
Thomaz, A.L., Breazeal, C.: Reinforcement learning with human teachers: evidence of feedback and guidance with implications for learning performance. In: Paper Presented to the Proceedings of the 21st National Conference on Artificial Intelligence vol. 1, Boston, Massachusetts (2006)
Toris, R., Kent, D., Chernova, S.: Unsupervised learning of multi-hypothesized pick-and-place task templates via crowdsourcing. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 4504–4510. IEEE (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Raza, S.A., Clark, J., Williams, MA. (2016). On Designing Socially Acceptable Reward Shaping. In: Agah, A., Cabibihan, JJ., Howard, A., Salichs, M., He, H. (eds) Social Robotics. ICSR 2016. Lecture Notes in Computer Science(), vol 9979. Springer, Cham. https://doi.org/10.1007/978-3-319-47437-3_84
Download citation
DOI: https://doi.org/10.1007/978-3-319-47437-3_84
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47436-6
Online ISBN: 978-3-319-47437-3
eBook Packages: Computer ScienceComputer Science (R0)