Solving F $$^3$$ MDPs: Collaborative Multiagent Markov Decision Processes with Factored Transitions, Rewards and Stochastic Policies

Radoszycki, Julia; Peyrard, Nathalie; Sabbadin, Régis

doi:10.1007/978-3-319-25524-8_1

Julia Radoszycki¹⁸,
Nathalie Peyrard¹⁸ &
Régis Sabbadin¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9387))

Included in the following conference series:

International Conference on Principles and Practice of Multi-Agent Systems

1534 Accesses
1 Citations

Abstract

Multiagent Markov Decision Processes provide a rich framework to model problems of multiagent sequential decision under uncertainty, as in robotics. However, when the state space is also factored and of high dimension, even dedicated solution algorithms (exact or approximate) do not apply when the dimension of the state space and the number of agents both exceed 30, except under strong assumptions about state transitions or value function. In this paper we introduce the F$^3$MDP framework and associated approximate solution algorithms which can tackle much larger problems. An F$^3$MDP is a collaborative multiagent MDP whose state space is factored, reward function is additively factored and solution policies are constrained to be factored and can be stochastic. The proposed algorithms belong to the family of Policy Iteration (PI) algorithms. On small problems, where the optimal policy is available, they provide policies close to optimal. On larger problems belonging to the subclass of GMDPs they compete well with state-of-the-art resolution algorithms in terms of quality. Finally, we show that our algorithms can tackle very large F$^3$MDPs, with 100 agents and a state space of size $2^{100}$.

This work was funded by ANR-13-AGRO-0001-04.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bernstein, D., Givan, R., Immerman, N., Zilberstein, S.: The complexity of Decentralized Control of Markov Decision Processes. Mathematics of Operations Research 27(4), 819–840 (2002)
Article MathSciNet MATH Google Scholar
Buffet, O., Aberdeen, D.: The Factored Policy-Gradient Planner. Artificial Intelligence 173, 722–747 (2009)
Article MathSciNet MATH Google Scholar
Cheng, Q., Liu, Q., Chen, F., Ihler, A.: Variational Planning for Graph-Based MDPs. Advances in Neural Information Processing Systems 26, 2976–2984 (2013)
Google Scholar
Dibangoye, J.S., Amato, C., Buffet, O., Charpillet, F.: Exploiting separability in multiagent planning with continous-state MDPs. In: Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems (2014)
Google Scholar
Dibangoye, J. S., Amato, C., Doniec, A.: Scaling up decentralized MDPs through heuristic search. In: Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence, pp. 217–226 (2012)
Google Scholar
Forsell, N., Sabbadin, R.: Approximate linear-programming algorithms for graph-based Markov decision processes. In: Proceedings of the 17h European Conference on Artificial Intelligence, pp. 590–594 (2006)
Google Scholar
Frey, B., Mackay, D.: A revolution: belief propagation in graphs with cycles. In: Advances in Neural Information Processing Systems, pp. 479–485 (1998)
Google Scholar
Guestrin, C., Koller, D., Parr, R.: Multiagent Planning with factored MDPs. In: Advances in Neural Information Processing Systems, pp. 1523–1530 (2001)
Google Scholar
Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: SPUDD: stochastic planning using algebraic decision diagrams. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, pp. 279–288 (1999)
Google Scholar
Kim, K-E., Dean, T., Meuleau, N.: Approximate solutions to factored Markov decision processes via greedy search in the space of finite state controllers. In: Proceedings of the 5th International Conference on Artificial Intelligence Planning Systems, pp. 323–330 (2000)
Google Scholar
Kim, K.-E., Dean, T.R.: Solving factored MDPs with large action space using algebraic decision diagrams. In: Ishizuka, M., Sattar, A. (eds.) PRICAI 2002. LNCS (LNAI), vol. 2417, pp. 80–89. Springer, Heidelberg (2002)
Chapter Google Scholar
Kumar, A., Zilberstein, S., Toussaint, M.: Scalable multiagent planning using probabilistic inference. In: Proceedings of the 22th International Joint Conference on Artificial Intelligence (2011)
Google Scholar
Littman, M., Goldsmith, J., Mundhenk, M.: The Computational Complexity of Probabilistic Planning. Journal of Artificial Intelligence Research 9, 1–36 (1998)
MathSciNet MATH Google Scholar
Luenberger, D.G., Ye, Y.: Linear and Nonlinear Programming, 3rd edition. Springer (2008)
Google Scholar
Mooij, J.M.: libDAI: A Free and open Source C++ Library for Discrete Approximate Inference in Graphical Models. Journal of Machine Learning Research 11, 2169–2173 (2010)
MATH Google Scholar
Murphy, K.: Dynamic Bayesian networks: representation, inference and learning. PhD Thesis, School of Computer Science, University of California, Berkeley (2002)
Google Scholar
Oliehoek, F.A., Whiteson, S., Spaan, M.T.J.: Approximate solutions for factored dec-PODMPs with many agents. In: Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems (2013)
Google Scholar
Peyrard, N., Sabbadin, R.: Mean field approximation of the policy iteration algorithm for graph-based Markov decision processes. In: Proceedings of the European Conference on Artificial Intelligence, pp. 595–599 (2006)
Google Scholar
Puterman, M.: Markov Decision Processes. John Wiley and Sons (1994)
Google Scholar
Raghavan, A., Joshi, S., Fern, A., Tadepalli, P., Khardon, R.: Planning in factored action spaces with symbolic dynamic programming. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence (2012)
Google Scholar
Sabbadin, R., Peyrard, N., Forsell, N.: A Framework and a Mean-Field Algorithm For The Local Conrtol of Spatial Processes. International Journal of Approximate Reasoning 53(1), 66–86 (2012)
Article MathSciNet MATH Google Scholar
Sallans, B., Hinton, G.E.: Reinforcement Learning with Factored States and Actions. Journal of Machine Learning Research 5, 1063–1088 (2004)
MathSciNet MATH Google Scholar
St-Aubin, R., Hoey, J., Boutilier, C.: APRICODD: approximate policy construction using decision diagrams. In: Advances in Neural Information Processing Systems, pp. 1089–1095 (2000)
Google Scholar
Yedidia, J.S., Freeman, W.T., Weiss, Y.: Constructing Free-Energy Approximations and Generalized Belief Propagation Algorithms. IEEE Transactions on Information Theory 51(7), 2282–2312 (2005)
Article MathSciNet MATH Google Scholar
Kok, J.R., Vlassis, N.: Collaborative Multiagent Reinforcement Learning by Payoff Propagation. Journal of Machine Learning Rsearch 7, 1789–1828 (2006)
MathSciNet MATH Google Scholar
Guestrin, C., Lagoudakis, M., Parr, R.: Coordinated reinforcement learning. In: Proceedings of the 19th International Conference on Machine Learning (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

INRA-MIAT (UR 875), 31326, Castanet-Tolosan, France
Julia Radoszycki, Nathalie Peyrard & Régis Sabbadin

Authors

Julia Radoszycki
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Peyrard
View author publications
You can also search for this author in PubMed Google Scholar
Régis Sabbadin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Régis Sabbadin .

Editor information

Editors and Affiliations

Jinan University, Guangzhou, China
Qingliang Chen
Università di Bologna, Bologna, Italy
Paolo Torroni
Inria - Sophia Antipolis-Méditerran, Sophia Antipolis, France
Serena Villata
National Taiwan University, Taipei, Taiwan
Jane Hsu
Università di Bologna, Bologna, Italy
Andrea Omicini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Radoszycki, J., Peyrard, N., Sabbadin, R. (2015). Solving F$^3$MDPs: Collaborative Multiagent Markov Decision Processes with Factored Transitions, Rewards and Stochastic Policies. In: Chen, Q., Torroni, P., Villata, S., Hsu, J., Omicini, A. (eds) PRIMA 2015: Principles and Practice of Multi-Agent Systems. PRIMA 2015. Lecture Notes in Computer Science(), vol 9387. Springer, Cham. https://doi.org/10.1007/978-3-319-25524-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-25524-8_1
Published: 28 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25523-1
Online ISBN: 978-3-319-25524-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Solving F\(^3\)MDPs: Collaborative Multiagent Markov Decision Processes with Factored Transitions, Rewards and Stochastic Policies

Abstract

Access this chapter

Preview

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Solving F\(^3\)MDPs: Collaborative Multiagent Markov Decision Processes with Factored Transitions, Rewards and Stochastic Policies

Abstract

Access this chapter

Preview

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation