Advertisement

Proposal of an Action Selection Strategy with Expected Failure Probability and Its Evaluation in Multi-agent Reinforcement Learning

  • Kazuteru MiyazakiEmail author
  • Koudai Furukawa
  • Hiroaki Kobayashi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10207)

Abstract

When multiple agents learn a task simultaneously in an environment, the learning results often become unstable. The problem is known as a concurrent learning problem and several methods have been proposed to resolve the problem so far. In this paper, we propose a new method that incorporates the expected failure probability (EFP) into the action selection strategy to give agents a kind of mutual adaptability. We confirm the effectiveness of the proposed method using Keepaway task.

Notes

Acknowledgment

This work was supported by JSPS KAKENHI Grant Number 26330267.

References

  1. 1.
    Arai, S., Miyazaki, K., Kobayashi, S.: Generating cooperative behavior by multi-agent reinforcement learning. In: Proceedings of the 6th European Workshop on Learning Robots, pp. 143–157 (1997)Google Scholar
  2. 2.
    Arai, S., Miyazaki, K., Kobayashi, S.: Methodology in multi-agent reinforcement learning-approaches by Q-learning and profit sharing. Trans. Jpn. Soc. Artif. Intell. 13(4), 609–618 (1998). (in Japanese)Google Scholar
  3. 3.
    Arai, S., Tanaka, N.: Experimental analysis of reward design for continuing task in multiagent domains. Trans. Jpn. Soc. Artif. Intell. 21(6), 537–546 (2006). RoboCup Soccer Keepaway - (in Japanese)CrossRefGoogle Scholar
  4. 4.
    Kuroda, S., Miyazaki, K., Kobayashi, H.: Introduction of fixed mode states into online reinforcement learning with penalty and reward and its application to waist trajectory generation of biped robot. J. Adv. Comput. Intell. Intell. Inform. 16(6), 758–768 (2013)CrossRefGoogle Scholar
  5. 5.
    Matsui, T., Goto, T., Izumi, K.: Acquiring a government bond trading strategy using reinforcement learning. J. Adv. Comput. Intell. Intell. Inform. 13(6), 691–696 (2009)CrossRefGoogle Scholar
  6. 6.
    Merrick, K., Maher, M.L.: Motivated reinforcement learning for adaptive characters in open-ended simulation games. In: Proceedings of the International Conference on Advanced in Computer Entertainment Technology, pp. 127–134 (2007)Google Scholar
  7. 7.
    Miyazaki, K., Yamamura, M., Kobayashi, S.: On the rationality of profit sharing in reinforcement learning. In: Proceedings of the 3rd International Conference on Fuzzy Logic, Neural Nets and Soft Computing, pp. 285–288 (1994)Google Scholar
  8. 8.
    Miyazaki, K., Yamamura, M., Kobayashi, H.: A theory of profit sharing in reinforcement learning. Trans. Jpn. Soc. Artif. Intell. 9(4), 580–587 (1994). (in Japanese)Google Scholar
  9. 9.
    Miyazaki, K., Kobayashi, S.: Rationality of reward sharing in multi-agent reinforcement learning. New Gener. Comput. 19(2), 157–172 (2001)zbMATHCrossRefGoogle Scholar
  10. 10.
    Miyazaki, K., Arai, S., Kobayashi, S.: A theory of profit sharing in multi-agent reinforcement. Learning 14(6), 1156–1164 (1999). (in Japanese)Google Scholar
  11. 11.
    Miyazaki, K., Kobayashi, S.: An extension of profit sharing to partially observable Markov decision processes: proposition of PS-r* and its evaluation. J. Jpn. Soc. Artif. Intell. 18(5), 285–296 (2003). (in Japanese)Google Scholar
  12. 12.
    Miyazaki, K., Kobayashi, S.: Reinforcement learning for penalty avoiding policy making. In: Proceedings of the 2000 IEEE International Conference on Systems, Man and Cybernetics, pp. 206–211 (2000)Google Scholar
  13. 13.
    Miyazaki, K., Tsuboi, S., Kobayashi, S.: Reinforcement learning for penalty avoiding rational policy making. Trans. Jpn. Soc. Artif. Intell. 16(2), 185–192 (2001). (in Japanese)CrossRefGoogle Scholar
  14. 14.
    Miyazaki, K., Kobayashi, S.: Exploitation-oriented learning PS-r#. J. Adv. Comput. Intell. Intell. Inform. 13(6), 624–630 (2009)CrossRefGoogle Scholar
  15. 15.
    Miyazaki, K.: Proposal of an exploitation-oriented learning method on multiple rewards and penalties environments and the design guideline. J. Comput. 8(7), 1683–1690 (2013)CrossRefGoogle Scholar
  16. 16.
    Miyazaki, K., Ida, M.: Proposal and evaluation of the active course classification support system with exploitation-oriented learning. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 333–344. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-29946-9_32 CrossRefGoogle Scholar
  17. 17.
    Miyazaki, K., Muraoka, H., Kobayashi, H.: Proposal of a propagation algorithm of the expected failure probability and the effectiveness on multi-agent environments. In: SICE Annual Conference 2013, pp. 1067–1072 (2013)Google Scholar
  18. 18.
    Miyazaki, K.: Exploitation-oriented Learning XoL with deep learning - comparison with a deep Q-network. The Papers of Technical Meeting on “Systems”, IEE Japan, pp. 7–12 (2016). (in Japanese)Google Scholar
  19. 19.
    Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop 2013 (2013)Google Scholar
  20. 20.
    Muraoka, H., Miyazaki, K., Kobayashi, H.: Proposal of a propagation algorithm of the expected failure probability and the effectiveness on multi-agent environments. Trans. Inst. Electr. Eng. Jpn. C 136(3), 273–281 (2016). (in Japanese)Google Scholar
  21. 21.
    Randl\(\phi \)v, J., Alstr\(\phi \)m, P.: Learning to drive a bicycle using reinforcement learning and shaping. In: Proceedings of the 15th International Conference on Machine Learning, pp. 463–471 (1998)Google Scholar
  22. 22.
    Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016)CrossRefGoogle Scholar
  23. 23.
    Singh, S., Lewis, R.L., Barto, A.G., Sorg, J.: Intrinsically motivated reinforcement learning: an evolutionary perspective. IEEE Trans. Auton. Ment. Dev. 2(2), 70–82 (2010)CrossRefGoogle Scholar
  24. 24.
    Stone, P., Sutton, R.S., Kuhlamann, G.: Reinforcement learning toward robocup soccer keepaway. Adapt. Behav. 13(3), 165–188 (2005)CrossRefGoogle Scholar
  25. 25.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, A Bradford Book. MIT Press, Cambridge (1998)Google Scholar
  26. 26.
    Watanabe, T., Miyazaki, K., Kobayashi, H.: A new improved penalty avoiding rational policy making algorithm for keepaway with continuous state spaces. J. Adv. Comput. Intell. Intell. Inform. 13(6), 678–682 (2009)CrossRefGoogle Scholar
  27. 27.
    Yoshimoto, J., Nishimura, M., Tokita, Y., Ishii, S.: Acrobot control by learning the switching of multiple controllers. J. Artif. Life Robot. 9(2), 67–71 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Kazuteru Miyazaki
    • 1
    Email author
  • Koudai Furukawa
    • 2
  • Hiroaki Kobayashi
    • 3
  1. 1.National Institution for Academic Degrees and Quality Enhancement of Higher EducationTokyoJapan
  2. 2.IHI Transport Machinery Co., Ltd.TokyoJapan
  3. 3.Meiji UniversityKanagawaJapan

Personalised recommendations