Subgoal Identification for Reinforcement Learning and Planning in Multiagent Problem Solving

Chiu, Chung-Cheng; Soo, Von-Wun

doi:10.1007/978-3-540-74949-3_4

Subgoal Identification for Reinforcement Learning and Planning in Multiagent Problem Solving

Chung-Cheng Chiu¹ &
Von-Wun Soo^1,2

Conference paper

589 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4687))

Abstract

We provide a new probability flow analysis algorithm to automatically identify subgoals in a problem space. Our flow analysis, inspired by preflow-push algorithms, measures the topological structure of the problem space to identify states that connect different subset of state space as the subgoals within linear-time complexity. Then we apply a hybrid approach known as subgoal-based SMDP (semi-Markov Decision Process) that is composed of reinforcement learning and planning based on the identified subgoals to solve the problem in a multiagent environment. The effectiveness of this new method used in a multiagent system is demonstrated and evaluated using a capture-the-flag scenario. We showed also that the cooperative coordination emerged between two agents in the scenario through distributed policy learning.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications. Prentice-Hall, Englewood Cliffs (1993)
Google Scholar
Barto, A.G., Mahadevan, S.: Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems 13(4), 341–379 (2003)
Article MathSciNet Google Scholar
Botea, A., Müller, M., Schaeffer, J.: Using Component Abstraction for Automatic Generation of Macro-Actions. In: Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling, pp. 181–190. AAAI Press, Stanford, California, USA (2004)
Google Scholar
Digney, B.: Learning Hierarchical Control Structure for Multiple Tasks and Changing Environments. In: Proceedings of the Fifth Conference on the Simulation of Adaptive Behavior (1998)
Google Scholar
Dietterich, T.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
MATH MathSciNet Google Scholar
Erol, K., Hendler, J., Nau, D.: Complexity results for HTN planning. Annals of Mathematics and Artificial Intelligence 18(1), 69–93 (1996)
Article MATH MathSciNet Google Scholar
Knoblock, C.A.: Automatically Generating Abstractions for Planning. Artificial Intelligence 68(2), 243–302 (1994)
Article MATH Google Scholar
Mannor, S., Menache, I., Hoze, A., Klein, U.: Dynamic abstraction in reinforcement learning via clustering. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 560–567. ACM Press, New York (2004)
Google Scholar
McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 361–368. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Menache, I., Mannor, S., Shimkin, N.: Q-Cut - Dynamic discovery of sub-goals in reinforcement learning. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 295–306. Springer, Heidelberg (2002)
Chapter Google Scholar
Ghavamzadeh, M., Mahadevan, S., Makar, R.: Hierarchical multi-agent reinforcement learning. Journal of Autonomous Agents and Multiagent Systems 13(2), 197–229 (2006)
Article Google Scholar
Şimşek, Ö., Wolfe, A.P., Barto, A.G.: Identifying Useful Subgoals in Reinforcement Learning by Local Graph Partitioning. In: Proceedings of the Twenty-Second International Conference on Machine Learning, pp. 816–823. ACM Press, New York (2005)
Google Scholar
Şimşek, Ö., Barto, A.G.: Using relative novelty to identify useful temporal abstractions in reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 751–758. ACM Press, New York (2004)
Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
Article Google Scholar
Sutton, R.S., Precup, D., Singh, S.P.: Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1), 181–211 (1999)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, National Tsing Hua University, 101, Section 2 Kuang Fu Road, Hsinchu, Taiwan, R.O.C.
Chung-Cheng Chiu & Von-Wun Soo
Department of Computer Science and Information Engineering, National Kaohsiung University, 700, Kaohsiung University Rd, Nan Tzu Dist., 811. Kaohsiung, Taiwan, R.O.C.
Von-Wun Soo

Authors

Chung-Cheng Chiu
View author publications
You can also search for this author in PubMed Google Scholar
Von-Wun Soo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Paolo Petta Jörg P. Müller Matthias Klusch Michael Georgeff

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chiu, CC., Soo, VW. (2007). Subgoal Identification for Reinforcement Learning and Planning in Multiagent Problem Solving. In: Petta, P., Müller, J.P., Klusch, M., Georgeff, M. (eds) Multiagent System Technologies. MATES 2007. Lecture Notes in Computer Science(), vol 4687. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74949-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-74949-3_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74948-6
Online ISBN: 978-3-540-74949-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics