Abstract
Multiagent Markov Decision Processes provide a rich framework to model problems of multiagent sequential decision under uncertainty, as in robotics. However, when the state space is also factored and of high dimension, even dedicated solution algorithms (exact or approximate) do not apply when the dimension of the state space and the number of agents both exceed 30, except under strong assumptions about state transitions or value function. In this paper we introduce the F\(^3\)MDP framework and associated approximate solution algorithms which can tackle much larger problems. An F\(^3\)MDP is a collaborative multiagent MDP whose state space is factored, reward function is additively factored and solution policies are constrained to be factored and can be stochastic. The proposed algorithms belong to the family of Policy Iteration (PI) algorithms. On small problems, where the optimal policy is available, they provide policies close to optimal. On larger problems belonging to the subclass of GMDPs they compete well with state-of-the-art resolution algorithms in terms of quality. Finally, we show that our algorithms can tackle very large F\(^3\)MDPs, with 100 agents and a state space of size \(2^{100}\).
This work was funded by ANR-13-AGRO-0001-04.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bernstein, D., Givan, R., Immerman, N., Zilberstein, S.: The complexity of Decentralized Control of Markov Decision Processes. Mathematics of Operations Research 27(4), 819–840 (2002)
Buffet, O., Aberdeen, D.: The Factored Policy-Gradient Planner. Artificial Intelligence 173, 722–747 (2009)
Cheng, Q., Liu, Q., Chen, F., Ihler, A.: Variational Planning for Graph-Based MDPs. Advances in Neural Information Processing Systems 26, 2976–2984 (2013)
Dibangoye, J.S., Amato, C., Buffet, O., Charpillet, F.: Exploiting separability in multiagent planning with continous-state MDPs. In: Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems (2014)
Dibangoye, J. S., Amato, C., Doniec, A.: Scaling up decentralized MDPs through heuristic search. In: Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence, pp. 217–226 (2012)
Forsell, N., Sabbadin, R.: Approximate linear-programming algorithms for graph-based Markov decision processes. In: Proceedings of the 17h European Conference on Artificial Intelligence, pp. 590–594 (2006)
Frey, B., Mackay, D.: A revolution: belief propagation in graphs with cycles. In: Advances in Neural Information Processing Systems, pp. 479–485 (1998)
Guestrin, C., Koller, D., Parr, R.: Multiagent Planning with factored MDPs. In: Advances in Neural Information Processing Systems, pp. 1523–1530 (2001)
Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: SPUDD: stochastic planning using algebraic decision diagrams. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, pp. 279–288 (1999)
Kim, K-E., Dean, T., Meuleau, N.: Approximate solutions to factored Markov decision processes via greedy search in the space of finite state controllers. In: Proceedings of the 5th International Conference on Artificial Intelligence Planning Systems, pp. 323–330 (2000)
Kim, K.-E., Dean, T.R.: Solving factored MDPs with large action space using algebraic decision diagrams. In: Ishizuka, M., Sattar, A. (eds.) PRICAI 2002. LNCS (LNAI), vol. 2417, pp. 80–89. Springer, Heidelberg (2002)
Kumar, A., Zilberstein, S., Toussaint, M.: Scalable multiagent planning using probabilistic inference. In: Proceedings of the 22th International Joint Conference on Artificial Intelligence (2011)
Littman, M., Goldsmith, J., Mundhenk, M.: The Computational Complexity of Probabilistic Planning. Journal of Artificial Intelligence Research 9, 1–36 (1998)
Luenberger, D.G., Ye, Y.: Linear and Nonlinear Programming, 3rd edition. Springer (2008)
Mooij, J.M.: libDAI: A Free and open Source C++ Library for Discrete Approximate Inference in Graphical Models. Journal of Machine Learning Research 11, 2169–2173 (2010)
Murphy, K.: Dynamic Bayesian networks: representation, inference and learning. PhD Thesis, School of Computer Science, University of California, Berkeley (2002)
Oliehoek, F.A., Whiteson, S., Spaan, M.T.J.: Approximate solutions for factored dec-PODMPs with many agents. In: Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems (2013)
Peyrard, N., Sabbadin, R.: Mean field approximation of the policy iteration algorithm for graph-based Markov decision processes. In: Proceedings of the European Conference on Artificial Intelligence, pp. 595–599 (2006)
Puterman, M.: Markov Decision Processes. John Wiley and Sons (1994)
Raghavan, A., Joshi, S., Fern, A., Tadepalli, P., Khardon, R.: Planning in factored action spaces with symbolic dynamic programming. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence (2012)
Sabbadin, R., Peyrard, N., Forsell, N.: A Framework and a Mean-Field Algorithm For The Local Conrtol of Spatial Processes. International Journal of Approximate Reasoning 53(1), 66–86 (2012)
Sallans, B., Hinton, G.E.: Reinforcement Learning with Factored States and Actions. Journal of Machine Learning Research 5, 1063–1088 (2004)
St-Aubin, R., Hoey, J., Boutilier, C.: APRICODD: approximate policy construction using decision diagrams. In: Advances in Neural Information Processing Systems, pp. 1089–1095 (2000)
Yedidia, J.S., Freeman, W.T., Weiss, Y.: Constructing Free-Energy Approximations and Generalized Belief Propagation Algorithms. IEEE Transactions on Information Theory 51(7), 2282–2312 (2005)
Kok, J.R., Vlassis, N.: Collaborative Multiagent Reinforcement Learning by Payoff Propagation. Journal of Machine Learning Rsearch 7, 1789–1828 (2006)
Guestrin, C., Lagoudakis, M., Parr, R.: Coordinated reinforcement learning. In: Proceedings of the 19th International Conference on Machine Learning (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Radoszycki, J., Peyrard, N., Sabbadin, R. (2015). Solving F\(^3\)MDPs: Collaborative Multiagent Markov Decision Processes with Factored Transitions, Rewards and Stochastic Policies. In: Chen, Q., Torroni, P., Villata, S., Hsu, J., Omicini, A. (eds) PRIMA 2015: Principles and Practice of Multi-Agent Systems. PRIMA 2015. Lecture Notes in Computer Science(), vol 9387. Springer, Cham. https://doi.org/10.1007/978-3-319-25524-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-25524-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25523-1
Online ISBN: 978-3-319-25524-8
eBook Packages: Computer ScienceComputer Science (R0)