Skip to main content

An Intrinsically Motivated Robot Explores Non-reward Environments with Output Arbitration

  • Conference paper
  • First Online:
  • 545 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 848))

Abstract

In real worlds, rewards are easily sparse because the state space is huge. Reinforcement learning agents have to achieve exploration skills to get rewards in such an environment. In that case, curiosity defined as internally generated rewards for state prediction error can encourage agents to explore environments. However, when a robot learns its policy by reinforcement learning, changing outputs of the policy cause jerking because of inertia. Jerking prevents state prediction from convergence, which would make the policy learning unstable. In this paper, we propose Arbitrable Intrinsically Motivated Exploration (AIME), which enables robots to stably learn curiosity-based exploration. AIME uses Accumulator Based Arbitration Model (ABAM) that we previously proposed as an ensemble learning method inspired by prefrontal cortex. ABAM adjusts motor controls to improve stability of reward generation and reinforcement learning. In experiments, we show that a robot can explore a non-reward simulated environment with AIME.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://gazebosim.org.

References

  1. Tai, L., Liu, M.: Towards cognitive exploration through deep reinforcement learning for mobile robots. arXiv preprint arXiv:1610.01733 (2016)

  2. Xie, L., Wang, S., Markham, A., Trigoni, N.: Towards monocular vision based obstacle avoidance through deep reinforcement learning. arXiv preprint arXiv:1706.09829 (2017)

  3. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    Article  Google Scholar 

  4. Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning (ICML), vol. 2017 (2017)

    Google Scholar 

  5. Ostrovski, G., Bellemare, M.G., Oord, A., Munos, R.: Count-based exploration with neural density models. arXiv preprint arXiv:1703.01310 (2017)

  6. Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jakowski, W.: Vizdoom: a doom-based AI research platform for visual reinforcement learning. In: 2016 IEEE Conference on Computational Intelligence and Games (CIG), pp. 1–8. IEEE (2016)

    Google Scholar 

  7. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)

    Article  Google Scholar 

  8. Osawa, M., Ashihara, Y., Seno, T., Imai, M., Kurihara, S.: Accumulator based arbitration model for both supervised and reinforcement learning inspired by prefrontal cortex. In: International Conference on Neural Information Processing, pp. 608–617 (2017)

    Google Scholar 

  9. Heess, N., Hunt, J.J., Lillicrap, T.P., Silver, D.: Memory-based control with recurrent neural networks. arXiv preprint arXiv:1512.04455 (2015)

  10. Heess, N., Wayne, G., Silver, D., Lillicrap, T., Erez, T., Tassa, Y.: Learning continuous control policies by stochastic value gradients. In: Advances in Neural Information Processing Systems, pp. 2944–2952 (2015)

    Google Scholar 

  11. Singh, S.P., Jaakkola, T., Jordan, M.I.: Learning without state-estimation in partially observable Markovian decision processes. In: Machine Learning Proceedings 1994, pp. 284–292 (1994)

    Google Scholar 

  12. Ueno, S., Osawa, M., Imai, M., Kato, T., Yamakawa, H.: Reinforcement learning framework for robots in the real world that extends cognitive architecture: prototype simulation environment ‘Re:ROS’. Biologically Inspired Cognitive Architectures (BICA) for Young Scientists, vol. 636, pp. 198–206 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takuma Seno .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Seno, T., Osawa, M., Imai, M. (2019). An Intrinsically Motivated Robot Explores Non-reward Environments with Output Arbitration. In: Samsonovich, A. (eds) Biologically Inspired Cognitive Architectures 2018. BICA 2018. Advances in Intelligent Systems and Computing, vol 848. Springer, Cham. https://doi.org/10.1007/978-3-319-99316-4_37

Download citation

Publish with us

Policies and ethics