MTIRL: Multi-trainer Interactive Reinforcement Learning System

Guo, Zhaori; Norman, Timothy J.; Gerding, Enrico H.

doi:10.1007/978-3-031-21203-1_14

Zhaori Guo¹²,
Timothy J. Norman¹² &
Enrico H. Gerding¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13753))

Included in the following conference series:

International Conference on Principles and Practice of Multi-Agent Systems

738 Accesses

Abstract

Interactive reinforcement learning can effectively facilitate the agent training via human feedback. However, such methods often require the human teacher to know what is the correct action that the agent should take. In other words, if the human teacher is not always reliable, then it will not be consistently able to guide the agent through its training. In this paper, we propose a more effective interactive reinforcement learning system by introducing multiple trainers, namely Multi-Trainer Interactive Reinforcement Learning (MTIRL), which could aggregate the binary feedback from multiple non-perfect trainers into a more reliable reward for an agent training in a reward-sparse environment. In particular, our trainer feedback aggregation experiments show that our aggregation method has the best accuracy when compared with the majority voting, the weighted voting, and the Bayesian method. Finally, we conduct a grid-world experiment to show that the policy trained by the MTIRL with the review model is closer to the optimal policy than that without a review model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Fuzzy Curiosity-Driven Mechanism for Multi-Agent Reinforcement Learning

Article 13 February 2021

DDMA: Discrepancy-Driven Multi-agent Reinforcement Learning

A Top-Down Approach to Attain Decentralized Multi-agents

References

Bignold, A., Cruz, F., Dazeley, R., Vamplew, P., Foale, C.: Persistent rule-based interactive reinforcement learning. Neural Comput. Appl. 1–18 (2021)
Google Scholar
Burnett, C., Norman, T.J., Sycara, K.: Bootstrapping trust evaluations through stereotypes. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010), pp. 241–248. International Foundation for Autonomous Agents and Multiagent Systems (2010)
Google Scholar
Cao, X., Fang, M., Liu, J., Gong, N.Z.: Fltrust: byzantine-robust federated learning via trust bootstrapping. In: ISOC Network and Distributed System Security Symposium (NDSS), (2021)
Google Scholar
Cheng, M., Yin, C., Zhang, J., Nazarian, S., Deshmukh, J., Bogdan, P.: A general trust framework for multi-agent systems. In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pp. 332–340 (2021)
Google Scholar
Cui, Y., Zhang, Q., Allievi, A., Stone, P., Niekum, S., Knox, W.: The empathic framework for task learning from implicit human feedback. In: Conference on Robot Learning (2020)
Google Scholar
Fan, X., Liu, L., Zhang, R., Jing, Q., Bi, J.: Decentralized trust management: risk analysis and trust aggregation. ACM Comput. Surv. (CSUR) 53(1), 1–33 (2020)
Article Google Scholar
Goel, N., Faltings, B.: Personalized peer truth serum for eliciting multi-attribute personal data. In: Uncertainty in Artificial Intelligence, pp. 18–27. PMLR (2020)
Google Scholar
Güneş, T.D., Norman, T.J., Tran-Thanh, L.: Budget limited trust-aware decision making. In: International Conference on Autonomous Agents and Multiagent Systems, pp. 101–110. Springer (2017). https://doi.org/10.1007/978-3-319-71679-4_7
Jøsang, A.: Subjective logic, vol. 3. Springer (2016). https://doi.org/10.1007/978-3-319-42337-1
Kazantzidis, I., Norman, T., Du, Y., Freeman, C.: How to train your agent: active learning from human preferences and justifications in safety-critical environments. In: International Conference on Autonomous Agents and Multiagent Systems (2022)
Google Scholar
Knox, W.B., Stone, P.: Tamer: training an agent manually via evaluative reinforcement. In: 2008 7th IEEE international conference on development and learning, pp. 292–297. IEEE (2008)
Google Scholar
Knox, W.B., Stone, P.: Combining manual feedback with subsequent mdp reward signals for reinforcement learning. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, vol. 1, pp. 5–12. Citeseer (2010)
Google Scholar
Knox, W.B., Stone, P.: Framing reinforcement learning from human reward: reward positivity, temporal discounting, episodicity, and performance. Artif. Intell. 225, 24–50 (2015)
Article MathSciNet MATH Google Scholar
Kurenkov, A., Mandlekar, A., Martin-Martin, R., Savarese, S., Garg, A.: Ac-teach: a bayesian actor-critic method for policy learning with an ensemble of suboptimal teachers. In: Conference on Robot Learning, pp. 717–734. PMLR (2020)
Google Scholar
Li, S., Zhang, C.: An optimal online method of selecting source policies for reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Ma, C., Li, J., Ding, M., Wei, K., Chen, W., Poor, H.V.: Federated learning with unreliable clients: performance analysis and mechanism design. IEEE Internet Things J. 8(24), 17308–17319 (2021)
Article Google Scholar
MacGlashan, J., et al.: Interactive learning from policy-dependent human feedback. In: International Conference on Machine Learning, pp. 2285–2294. PMLR (2017)
Google Scholar
Palmer, A.W., Hill, A.J., Scheding, S.J.: Methods for stochastic collection and replenishment (scar) optimisation for persistent autonomy. Robot. Auton. Syst. 87, 51–65 (2017)
Article Google Scholar
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems, vol. 37. University of Cambridge, Department of Engineering Cambridge, UK (1994)
Google Scholar
Tittaferrante, A., Yassine, A.: Multi-advisor reinforcement learning for multi-agent multi-objective smart home energy control. IEEE Trans. Artif. Intell. 3(4), 581–594 (2021)
Google Scholar
Zhan, Y., Ammar, H.B., Taylor, M.E.: Theoretically-grounded policy advice from multiple teachers in reinforcement learning settings with applications to negative transfer. In: Proceedings of the Twenty Fifth International Joint Conference on Artificial Intelligence (2016)
Google Scholar
Zhong, X., Xu, X., Pan, B.: A non-threshold consensus model based on the minimum cost and maximum consensus-increasing for multi-attribute large group decision-making. Inf. Fusion 77, 90–106 (2022)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Southampton, Southampton, UK
Zhaori Guo, Timothy J. Norman & Enrico H. Gerding

Authors

Zhaori Guo
View author publications
You can also search for this author in PubMed Google Scholar
Timothy J. Norman
View author publications
You can also search for this author in PubMed Google Scholar
Enrico H. Gerding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhaori Guo .

Editor information

Editors and Affiliations

Özyeğin University, Istanbul, Turkey
Reyhan Aydoğan
Universitat Politècnica de València, Valencia, Spain
Natalia Criado
Université Paris-Dauphine, Paris, France
Jérôme Lang
Universitat Politècnica de València, Valencia, Spain
Victor Sanchez-Anguix
King's College London, London, UK
Marc Serramia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, Z., Norman, T.J., Gerding, E.H. (2023). MTIRL: Multi-trainer Interactive Reinforcement Learning System. In: Aydoğan, R., Criado, N., Lang, J., Sanchez-Anguix, V., Serramia, M. (eds) PRIMA 2022: Principles and Practice of Multi-Agent Systems. PRIMA 2022. Lecture Notes in Computer Science(), vol 13753. Springer, Cham. https://doi.org/10.1007/978-3-031-21203-1_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-21203-1_14
Published: 12 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21202-4
Online ISBN: 978-3-031-21203-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MTIRL: Multi-trainer Interactive Reinforcement Learning System

Abstract

Access this chapter

Similar content being viewed by others

A Fuzzy Curiosity-Driven Mechanism for Multi-Agent Reinforcement Learning

DDMA: Discrepancy-Driven Multi-agent Reinforcement Learning

A Top-Down Approach to Attain Decentralized Multi-agents

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

MTIRL: Multi-trainer Interactive Reinforcement Learning System

Abstract

Access this chapter

Similar content being viewed by others

A Fuzzy Curiosity-Driven Mechanism for Multi-Agent Reinforcement Learning

DDMA: Discrepancy-Driven Multi-agent Reinforcement Learning

A Top-Down Approach to Attain Decentralized Multi-agents

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation