A Distributed Q-Learning Approach for Variable Attention to Multiple Critics

Tavakol, Maryam; Ahmadabadi, Majid Nili; Mirian, Maryam; Asadpour, Masoud

doi:10.1007/978-3-642-34487-9_30

Maryam Tavakol²⁰,
Majid Nili Ahmadabadi^20,21,
Maryam Mirian²⁰ &
…
Masoud Asadpour²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7665))

Included in the following conference series:

International Conference on Neural Information Processing

3266 Accesses
1 Citations

Abstract

One of the substantial concerns of researchers in machine learning area is designing an artificial agent with an autonomous behaviour in a complex environment. In this paper, we considered a learning problem with multiple critics. The importance of each critic for the agent is different, and attention of agent to them is variable during its life. Inspired from neurological studies, we proposed a distributed learning approach for this problem that is flexible against the variable attention. In this approach, there is a distinct learner for each critic that an algorithm is introduced for aggregating of their knowledge based on combination of model-free and model-based learning methods. We showed that this aggregation method could provide the optimal policy for this problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT Press (1998)
Google Scholar
Bayer, H.M., Glimcher, P.W.: Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141 (2005)
Article Google Scholar
Niv, Y.: Reinforcement learning in the brain. Journal of Mathematical Psychology 53, 139–154 (2009)
Article MathSciNet MATH Google Scholar
Dayan, P., Daw, N.D.: Decision theory, reinforcement learning, and the brain. Cognitive, Affective, & Behavioral Neuroscience 8, 429–453 (2008)
Article Google Scholar
Gläscher, J., Daw, N., Dayan, P., O’Doherty, J.P.: States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010)
Article Google Scholar
Shelton, C.R.: Balancing multiple sources of reward in reinforcement learning. DTIC Document (2006)
Google Scholar
Raicevic, P.: Parallel reinforcement learning using multiple reward signals. Neurocomputing 69, 2171–2179 (2006)
Article Google Scholar
Sprague, N., Ballard, D.: Multiple-goal reinforcement learning with modular sarsa (0). In: International Joint Conference on Artificial Intelligence, vol. 18, pp. 1445–1447 (2003)
Google Scholar
Park, K.H., Kim, Y.J., Kim, J.H.: Modular Q-learning based multi-agent cooperation for robot soccer. In: Robotics and Autonomous Systems, vol. 35, pp. 109–122 (2001)
Google Scholar
Samejima, K., Doya, K., Kawato, M.: Inter-module credit assignment in modular reinforcement learning. Neural Networks 16, 985–994 (2003)
Article Google Scholar
Bhat, S., Isbell, C.L., Mateas, M.: On the difficulty of modular reinforcement learning for real-world partial programming. In: Proceedings of the National Conference on Artificial Intelligence, vol. 21, p. 318. AAAI Press (2006)
Google Scholar
Daw, N.D.: Model-based reinforcement learning as cognitive search: Neurocomputational theories. Evolution, Algorithms and the Brain (2011)
Google Scholar
Simon, D.A., Daw, N.D.: Environmental statistics and the trade-off between model-based and TD learning in humans. In: Advances in Neural Information Processing Systems, vol. 24, pp. 127–135 (2011)
Google Scholar
Szepesvári, C.: Algorithms for reinforcement learning. Algorithms for Reinforcement Learning 4, 1–103 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Cognitive Robotics Lab, Control and Intelligent Processing Center of Excellence, School of ECE., College of Eng., Univ. of Tehran, Iran
Maryam Tavakol, Majid Nili Ahmadabadi, Maryam Mirian & Masoud Asadpour
School of Cognitive Sciences, Institute for Research in Fundamental Sciences, IPM, Iran
Majid Nili Ahmadabadi

Authors

Maryam Tavakol
View author publications
You can also search for this author in PubMed Google Scholar
Majid Nili Ahmadabadi
View author publications
You can also search for this author in PubMed Google Scholar
Maryam Mirian
View author publications
You can also search for this author in PubMed Google Scholar
Masoud Asadpour
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Texas A&M University at Qatar, Education City, P.O. Box 23874, Doha, Qatar
Tingwen Huang
Department of Control Science and Engineering, Huazhong University of Science and Technology, 1037 Luoyu Road, 430074, Wuhan, Hubei, China
Zhigang Zeng
College of Computer Science, Chongqing University, 174 Shazhengjie Street, 400044, Chongqing, China
Chuandong Li
Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong, China
Chi Sing Leung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tavakol, M., Ahmadabadi, M.N., Mirian, M., Asadpour, M. (2012). A Distributed Q-Learning Approach for Variable Attention to Multiple Critics. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds) Neural Information Processing. ICONIP 2012. Lecture Notes in Computer Science, vol 7665. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34487-9_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-34487-9_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34486-2
Online ISBN: 978-3-642-34487-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics