Proposal of an Action Selection Strategy with Expected Failure Probability and Its Evaluation in Multi-agent Reinforcement Learning

Miyazaki, Kazuteru; Furukawa, Koudai; Kobayashi, Hiroaki

doi:10.1007/978-3-319-59294-7_15

Kazuteru Miyazaki¹⁷,
Koudai Furukawa¹⁸ &
Hiroaki Kobayashi¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10207))

Included in the following conference series:

787 Accesses
1 Citations

Abstract

When multiple agents learn a task simultaneously in an environment, the learning results often become unstable. The problem is known as a concurrent learning problem and several methods have been proposed to resolve the problem so far. In this paper, we propose a new method that incorporates the expected failure probability (EFP) into the action selection strategy to give agents a kind of mutual adaptability. We confirm the effectiveness of the proposed method using Keepaway task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arai, S., Miyazaki, K., Kobayashi, S.: Generating cooperative behavior by multi-agent reinforcement learning. In: Proceedings of the 6th European Workshop on Learning Robots, pp. 143–157 (1997)
Google Scholar
Arai, S., Miyazaki, K., Kobayashi, S.: Methodology in multi-agent reinforcement learning-approaches by Q-learning and profit sharing. Trans. Jpn. Soc. Artif. Intell. 13(4), 609–618 (1998). (in Japanese)
Google Scholar
Arai, S., Tanaka, N.: Experimental analysis of reward design for continuing task in multiagent domains. Trans. Jpn. Soc. Artif. Intell. 21(6), 537–546 (2006). RoboCup Soccer Keepaway - (in Japanese)
Article Google Scholar
Kuroda, S., Miyazaki, K., Kobayashi, H.: Introduction of fixed mode states into online reinforcement learning with penalty and reward and its application to waist trajectory generation of biped robot. J. Adv. Comput. Intell. Intell. Inform. 16(6), 758–768 (2013)
Article Google Scholar
Matsui, T., Goto, T., Izumi, K.: Acquiring a government bond trading strategy using reinforcement learning. J. Adv. Comput. Intell. Intell. Inform. 13(6), 691–696 (2009)
Article Google Scholar
Merrick, K., Maher, M.L.: Motivated reinforcement learning for adaptive characters in open-ended simulation games. In: Proceedings of the International Conference on Advanced in Computer Entertainment Technology, pp. 127–134 (2007)
Google Scholar
Miyazaki, K., Yamamura, M., Kobayashi, S.: On the rationality of profit sharing in reinforcement learning. In: Proceedings of the 3rd International Conference on Fuzzy Logic, Neural Nets and Soft Computing, pp. 285–288 (1994)
Google Scholar
Miyazaki, K., Yamamura, M., Kobayashi, H.: A theory of profit sharing in reinforcement learning. Trans. Jpn. Soc. Artif. Intell. 9(4), 580–587 (1994). (in Japanese)
Google Scholar
Miyazaki, K., Kobayashi, S.: Rationality of reward sharing in multi-agent reinforcement learning. New Gener. Comput. 19(2), 157–172 (2001)
Article MATH Google Scholar
Miyazaki, K., Arai, S., Kobayashi, S.: A theory of profit sharing in multi-agent reinforcement. Learning 14(6), 1156–1164 (1999). (in Japanese)
Google Scholar
Miyazaki, K., Kobayashi, S.: An extension of profit sharing to partially observable Markov decision processes: proposition of PS-r* and its evaluation. J. Jpn. Soc. Artif. Intell. 18(5), 285–296 (2003). (in Japanese)
Google Scholar
Miyazaki, K., Kobayashi, S.: Reinforcement learning for penalty avoiding policy making. In: Proceedings of the 2000 IEEE International Conference on Systems, Man and Cybernetics, pp. 206–211 (2000)
Google Scholar
Miyazaki, K., Tsuboi, S., Kobayashi, S.: Reinforcement learning for penalty avoiding rational policy making. Trans. Jpn. Soc. Artif. Intell. 16(2), 185–192 (2001). (in Japanese)
Article Google Scholar
Miyazaki, K., Kobayashi, S.: Exploitation-oriented learning PS-r#. J. Adv. Comput. Intell. Intell. Inform. 13(6), 624–630 (2009)
Article Google Scholar
Miyazaki, K.: Proposal of an exploitation-oriented learning method on multiple rewards and penalties environments and the design guideline. J. Comput. 8(7), 1683–1690 (2013)
Article Google Scholar
Miyazaki, K., Ida, M.: Proposal and evaluation of the active course classification support system with exploitation-oriented learning. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 333–344. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29946-9_32
Chapter Google Scholar
Miyazaki, K., Muraoka, H., Kobayashi, H.: Proposal of a propagation algorithm of the expected failure probability and the effectiveness on multi-agent environments. In: SICE Annual Conference 2013, pp. 1067–1072 (2013)
Google Scholar
Miyazaki, K.: Exploitation-oriented Learning XoL with deep learning - comparison with a deep Q-network. The Papers of Technical Meeting on “Systems”, IEE Japan, pp. 7–12 (2016). (in Japanese)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop 2013 (2013)
Google Scholar
Muraoka, H., Miyazaki, K., Kobayashi, H.: Proposal of a propagation algorithm of the expected failure probability and the effectiveness on multi-agent environments. Trans. Inst. Electr. Eng. Jpn. C 136(3), 273–281 (2016). (in Japanese)
Google Scholar
Randl\(\phi \)v, J., Alstr\(\phi \)m, P.: Learning to drive a bicycle using reinforcement learning and shaping. In: Proceedings of the 15th International Conference on Machine Learning, pp. 463–471 (1998)
Google Scholar
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016)
Article Google Scholar
Singh, S., Lewis, R.L., Barto, A.G., Sorg, J.: Intrinsically motivated reinforcement learning: an evolutionary perspective. IEEE Trans. Auton. Ment. Dev. 2(2), 70–82 (2010)
Article Google Scholar
Stone, P., Sutton, R.S., Kuhlamann, G.: Reinforcement learning toward robocup soccer keepaway. Adapt. Behav. 13(3), 165–188 (2005)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, A Bradford Book. MIT Press, Cambridge (1998)
Google Scholar
Watanabe, T., Miyazaki, K., Kobayashi, H.: A new improved penalty avoiding rational policy making algorithm for keepaway with continuous state spaces. J. Adv. Comput. Intell. Intell. Inform. 13(6), 678–682 (2009)
Article Google Scholar
Yoshimoto, J., Nishimura, M., Tokita, Y., Ishii, S.: Acrobot control by learning the switching of multiple controllers. J. Artif. Life Robot. 9(2), 67–71 (2005)
Article Google Scholar

Download references

Acknowledgment

This work was supported by JSPS KAKENHI Grant Number 26330267.

Author information

Authors and Affiliations

National Institution for Academic Degrees and Quality Enhancement of Higher Education, 1-29-1 Gakuennishimachi, Kodaira, Tokyo, 185-8587, Japan
Kazuteru Miyazaki
IHI Transport Machinery Co., Ltd., 8-1 Akashi-cho, Chuo-ku, Tokyo, 104-0044, Japan
Koudai Furukawa
Meiji University, 1-1-1 Higashimita, Tama-ku, Kawasaki, Kanagawa, 214-8571, Japan
Hiroaki Kobayashi

Authors

Kazuteru Miyazaki
View author publications
You can also search for this author in PubMed Google Scholar
Koudai Furukawa
View author publications
You can also search for this author in PubMed Google Scholar
Hiroaki Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kazuteru Miyazaki .

Editor information

Editors and Affiliations

King’s College London, London, United Kingdom
Natalia Criado Pacheco
Polytechnic University of Valencia, Valencia, Portugal
Carlos Carrascosa
Artificial Intelligence Research Institute (IIIA-CSIC), Barcelona, Spain
Nardine Osman
Polytechnic University of Valencia, Valencia, Spain
Vicente Julián Inglada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miyazaki, K., Furukawa, K., Kobayashi, H. (2017). Proposal of an Action Selection Strategy with Expected Failure Probability and Its Evaluation in Multi-agent Reinforcement Learning. In: Criado Pacheco, N., Carrascosa, C., Osman, N., Julián Inglada, V. (eds) Multi-Agent Systems and Agreement Technologies. EUMAS AT 2016 2016. Lecture Notes in Computer Science(), vol 10207. Springer, Cham. https://doi.org/10.1007/978-3-319-59294-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-59294-7_15
Published: 23 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59293-0
Online ISBN: 978-3-319-59294-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics