Evolution Direction of Reward Appraisal in Reinforcement Learning Agents

Miyawaki, Masaya; Moriyama, Koichi; Mutoh, Atsuko; Matsui, Tohgoroh; Inuzuka, Nobuhiro

doi:10.1007/978-3-319-92031-3_2

Masaya Miyawaki⁹,
Koichi Moriyama⁹,
Atsuko Mutoh⁹,
Tohgoroh Matsui¹⁰ &
…
Nobuhiro Inuzuka⁹

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 96))

Included in the following conference series:

KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications

505 Accesses

Abstract

Humans appraise the environment in daily life. We are implementing appraisal mechanisms into reinforcement learning agents. One of such mechanisms we proposed is the utility-based Q-learning, which learns behaviors from subjective utilities derived from payoffs the agent gains and a utility-derivation function the agent has. In the previous work, we know that payoff-based evolution brings utility-derivation functions that facilitate mutual cooperation in iterated prisoner’s dilemma games. However, the evolution process itself has not yet been known well. In this work, we investigate the process in terms of what determines the evolution direction. We introduce two metrics showing preference of actions based on the evolved subjective utilities, which divide the evolution space into four regions. In each region, the metrics will explain the evolution directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Here we ignore the border areas.

References

Rilling, J.K., Gutman, D.A., Zeh, T.R., Pagnoni, G., Berns, G.S., Kilts, C.D.: A neural basis for social cooperation. Neuron 35, 395–405 (2002)
Article Google Scholar
Sandholm, T.W., Crites, R.H.: Multiagent reinforcement learning in the Iterated Prisoner’s Dilemma. BioSystems 37, 147–166 (1996)
Article Google Scholar
Moriyama, K.: Utility based Q-learning to facilitate cooperation in Prisoner’s Dilemma games. Web Intell. Agent Syst. 7(3), 233–242 (2009)
Google Scholar
Moriyama, K., Kurihara, S., Numao, M.: Evolving subjective utilities: Prisoner’s Dilemma game examples. In: Proceedings of the 10th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), pp. 233–240 (2011)
Google Scholar
Axelrod, R.: The Evolution of Cooperation. Basic Books, New York (1984)
MATH Google Scholar
Watkins, C.J.C.H., Dayan, P.: Technical note: Q-learning. Mach. Learn. 8, 279–292 (1992)
MATH Google Scholar
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading (1989)
MATH Google Scholar
Axelrod, R.: The evolution of strategies in the Iterated Prisoner’s Dilemma. In: Davis, L. (ed.) Genetic Algorithms and Simulated Annealing, pp. 32–41. Pitman/Morgan Kaufmann, London/Los Altos (1987)
Google Scholar
Nowak, M.A.: Five rules for the evolution of cooperation. Science 314, 1560–1563 (2006)
Article Google Scholar
Singh, S., Lewis, R.L., Barto, A.G., Sorg, J.: Intrinsically motivated reinforcement learning: an evolutionary perspective. IEEE Trans. Auton. Ment. Dev. 2, 70–82 (2010)
Article Google Scholar
Vassiliades, V., Christodoulou, C.: Multiagent reinforcement learning in the Iterated Prisoner’s Dilemma: fast cooperation through evolved payoffs. In: Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN) (2010)
Google Scholar

Download references

Acknowledgments

This work was partly supported by JSPS KAKENHI Grant Number JP16K00302, Kayamori Foundation of Informational Science Advancement, and the Hori Sciences & Arts Foundation.

Author information

Authors and Affiliations

Department of Computer Science, Nagoya Institute of Technology, Nagoya, Japan
Masaya Miyawaki, Koichi Moriyama, Atsuko Mutoh & Nobuhiro Inuzuka
Department of Clinical Engineering, Chubu University, Kasugai, Japan
Tohgoroh Matsui

Authors

Masaya Miyawaki
View author publications
You can also search for this author in PubMed Google Scholar
Koichi Moriyama
View author publications
You can also search for this author in PubMed Google Scholar
Atsuko Mutoh
View author publications
You can also search for this author in PubMed Google Scholar
Tohgoroh Matsui
View author publications
You can also search for this author in PubMed Google Scholar
Nobuhiro Inuzuka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masaya Miyawaki .

Editor information

Editors and Affiliations

University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb, Croatia
Gordan Jezic
The Heriot-Watt University, Edinburgh, United Kingdom
Yun-Heh Jessica Chen-Burger
Bournemouth University, Poole, United Kingdom
Robert J. Howlett
Centre for Artificial Intelligence, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, New South Wales, Australia
Lakhmi C. Jain
Griffith Sciences - Centres and Institutes, Griffith University, South Brisbane, Queensland, Australia
Ljubo Vlacic
Department of Business Economics and Management and Silesian University in Opava, School of Business Administration in Karvina, Karvina, Czech Republic
Roman Šperka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miyawaki, M., Moriyama, K., Mutoh, A., Matsui, T., Inuzuka, N. (2019). Evolution Direction of Reward Appraisal in Reinforcement Learning Agents. In: Jezic, G., Chen-Burger, YH., Howlett, R., Jain, L., Vlacic, L., Šperka, R. (eds) Agents and Multi-Agent Systems: Technologies and Applications 2018. KES-AMSTA-18 2018. Smart Innovation, Systems and Technologies, vol 96. Springer, Cham. https://doi.org/10.1007/978-3-319-92031-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-92031-3_2
Published: 31 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92030-6
Online ISBN: 978-3-319-92031-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics