An Intrinsically Motivated Robot Explores Non-reward Environments with Output Arbitration

Seno, Takuma; Osawa, Masahiko; Imai, Michita

doi:10.1007/978-3-319-99316-4_37

An Intrinsically Motivated Robot Explores Non-reward Environments with Output Arbitration

Takuma Seno¹⁵,
Masahiko Osawa^15,16 &
Michita Imai¹⁵

Conference paper
First Online: 24 August 2018

545 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 848))

Abstract

In real worlds, rewards are easily sparse because the state space is huge. Reinforcement learning agents have to achieve exploration skills to get rewards in such an environment. In that case, curiosity defined as internally generated rewards for state prediction error can encourage agents to explore environments. However, when a robot learns its policy by reinforcement learning, changing outputs of the policy cause jerking because of inertia. Jerking prevents state prediction from convergence, which would make the policy learning unstable. In this paper, we propose Arbitrable Intrinsically Motivated Exploration (AIME), which enables robots to stably learn curiosity-based exploration. AIME uses Accumulator Based Arbitration Model (ABAM) that we previously proposed as an ensemble learning method inspired by prefrontal cortex. ABAM adjusts motor controls to improve stability of reward generation and reinforcement learning. In experiments, we show that a robot can explore a non-reward simulated environment with AIME.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://gazebosim.org.

References

Tai, L., Liu, M.: Towards cognitive exploration through deep reinforcement learning for mobile robots. arXiv preprint arXiv:1610.01733 (2016)
Xie, L., Wang, S., Markham, A., Trigoni, N.: Towards monocular vision based obstacle avoidance through deep reinforcement learning. arXiv preprint arXiv:1706.09829 (2017)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Article Google Scholar
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning (ICML), vol. 2017 (2017)
Google Scholar
Ostrovski, G., Bellemare, M.G., Oord, A., Munos, R.: Count-based exploration with neural density models. arXiv preprint arXiv:1703.01310 (2017)
Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jakowski, W.: Vizdoom: a doom-based AI research platform for visual reinforcement learning. In: 2016 IEEE Conference on Computational Intelligence and Games (CIG), pp. 1–8. IEEE (2016)
Google Scholar
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Article Google Scholar
Osawa, M., Ashihara, Y., Seno, T., Imai, M., Kurihara, S.: Accumulator based arbitration model for both supervised and reinforcement learning inspired by prefrontal cortex. In: International Conference on Neural Information Processing, pp. 608–617 (2017)
Google Scholar
Heess, N., Hunt, J.J., Lillicrap, T.P., Silver, D.: Memory-based control with recurrent neural networks. arXiv preprint arXiv:1512.04455 (2015)
Heess, N., Wayne, G., Silver, D., Lillicrap, T., Erez, T., Tassa, Y.: Learning continuous control policies by stochastic value gradients. In: Advances in Neural Information Processing Systems, pp. 2944–2952 (2015)
Google Scholar
Singh, S.P., Jaakkola, T., Jordan, M.I.: Learning without state-estimation in partially observable Markovian decision processes. In: Machine Learning Proceedings 1994, pp. 284–292 (1994)
Google Scholar
Ueno, S., Osawa, M., Imai, M., Kato, T., Yamakawa, H.: Reinforcement learning framework for robots in the real world that extends cognitive architecture: prototype simulation environment ‘Re:ROS’. Biologically Inspired Cognitive Architectures (BICA) for Young Scientists, vol. 636, pp. 198–206 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Keio University, Kanagawa, Japan
Takuma Seno, Masahiko Osawa & Michita Imai
Japan Society for Promotion of Science, Tokyo, Japan
Masahiko Osawa

Authors

Takuma Seno
View author publications
You can also search for this author in PubMed Google Scholar
Masahiko Osawa
View author publications
You can also search for this author in PubMed Google Scholar
Michita Imai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takuma Seno .

Editor information

Editors and Affiliations

Department of Cybernetics, National Research Nuclear University “MEPhI”, Moscow, Russia
Alexei V. Samsonovich

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Seno, T., Osawa, M., Imai, M. (2019). An Intrinsically Motivated Robot Explores Non-reward Environments with Output Arbitration. In: Samsonovich, A. (eds) Biologically Inspired Cognitive Architectures 2018. BICA 2018. Advances in Intelligent Systems and Computing, vol 848. Springer, Cham. https://doi.org/10.1007/978-3-319-99316-4_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-99316-4_37
Published: 24 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99315-7
Online ISBN: 978-3-319-99316-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics