Skip to main content

Solving F\(^3\)MDPs: Collaborative Multiagent Markov Decision Processes with Factored Transitions, Rewards and Stochastic Policies

  • Conference paper
  • First Online:
PRIMA 2015: Principles and Practice of Multi-Agent Systems (PRIMA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9387))

Abstract

Multiagent Markov Decision Processes provide a rich framework to model problems of multiagent sequential decision under uncertainty, as in robotics. However, when the state space is also factored and of high dimension, even dedicated solution algorithms (exact or approximate) do not apply when the dimension of the state space and the number of agents both exceed 30, except under strong assumptions about state transitions or value function. In this paper we introduce the F\(^3\)MDP framework and associated approximate solution algorithms which can tackle much larger problems. An F\(^3\)MDP is a collaborative multiagent MDP whose state space is factored, reward function is additively factored and solution policies are constrained to be factored and can be stochastic. The proposed algorithms belong to the family of Policy Iteration (PI) algorithms. On small problems, where the optimal policy is available, they provide policies close to optimal. On larger problems belonging to the subclass of GMDPs they compete well with state-of-the-art resolution algorithms in terms of quality. Finally, we show that our algorithms can tackle very large F\(^3\)MDPs, with 100 agents and a state space of size \(2^{100}\).

This work was funded by ANR-13-AGRO-0001-04.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bernstein, D., Givan, R., Immerman, N., Zilberstein, S.: The complexity of Decentralized Control of Markov Decision Processes. Mathematics of Operations Research 27(4), 819–840 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  2. Buffet, O., Aberdeen, D.: The Factored Policy-Gradient Planner. Artificial Intelligence 173, 722–747 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Cheng, Q., Liu, Q., Chen, F., Ihler, A.: Variational Planning for Graph-Based MDPs. Advances in Neural Information Processing Systems 26, 2976–2984 (2013)

    Google Scholar 

  4. Dibangoye, J.S., Amato, C., Buffet, O., Charpillet, F.: Exploiting separability in multiagent planning with continous-state MDPs. In: Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems (2014)

    Google Scholar 

  5. Dibangoye, J. S., Amato, C., Doniec, A.: Scaling up decentralized MDPs through heuristic search. In: Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence, pp. 217–226 (2012)

    Google Scholar 

  6. Forsell, N., Sabbadin, R.: Approximate linear-programming algorithms for graph-based Markov decision processes. In: Proceedings of the 17h European Conference on Artificial Intelligence, pp. 590–594 (2006)

    Google Scholar 

  7. Frey, B., Mackay, D.: A revolution: belief propagation in graphs with cycles. In: Advances in Neural Information Processing Systems, pp. 479–485 (1998)

    Google Scholar 

  8. Guestrin, C., Koller, D., Parr, R.: Multiagent Planning with factored MDPs. In: Advances in Neural Information Processing Systems, pp. 1523–1530 (2001)

    Google Scholar 

  9. Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: SPUDD: stochastic planning using algebraic decision diagrams. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, pp. 279–288 (1999)

    Google Scholar 

  10. Kim, K-E., Dean, T., Meuleau, N.: Approximate solutions to factored Markov decision processes via greedy search in the space of finite state controllers. In: Proceedings of the 5th International Conference on Artificial Intelligence Planning Systems, pp. 323–330 (2000)

    Google Scholar 

  11. Kim, K.-E., Dean, T.R.: Solving factored MDPs with large action space using algebraic decision diagrams. In: Ishizuka, M., Sattar, A. (eds.) PRICAI 2002. LNCS (LNAI), vol. 2417, pp. 80–89. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  12. Kumar, A., Zilberstein, S., Toussaint, M.: Scalable multiagent planning using probabilistic inference. In: Proceedings of the 22th International Joint Conference on Artificial Intelligence (2011)

    Google Scholar 

  13. Littman, M., Goldsmith, J., Mundhenk, M.: The Computational Complexity of Probabilistic Planning. Journal of Artificial Intelligence Research 9, 1–36 (1998)

    MathSciNet  MATH  Google Scholar 

  14. Luenberger, D.G., Ye, Y.: Linear and Nonlinear Programming, 3rd edition. Springer (2008)

    Google Scholar 

  15. Mooij, J.M.: libDAI: A Free and open Source C++ Library for Discrete Approximate Inference in Graphical Models. Journal of Machine Learning Research 11, 2169–2173 (2010)

    MATH  Google Scholar 

  16. Murphy, K.: Dynamic Bayesian networks: representation, inference and learning. PhD Thesis, School of Computer Science, University of California, Berkeley (2002)

    Google Scholar 

  17. Oliehoek, F.A., Whiteson, S., Spaan, M.T.J.: Approximate solutions for factored dec-PODMPs with many agents. In: Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems (2013)

    Google Scholar 

  18. Peyrard, N., Sabbadin, R.: Mean field approximation of the policy iteration algorithm for graph-based Markov decision processes. In: Proceedings of the European Conference on Artificial Intelligence, pp. 595–599 (2006)

    Google Scholar 

  19. Puterman, M.: Markov Decision Processes. John Wiley and Sons (1994)

    Google Scholar 

  20. Raghavan, A., Joshi, S., Fern, A., Tadepalli, P., Khardon, R.: Planning in factored action spaces with symbolic dynamic programming. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence (2012)

    Google Scholar 

  21. Sabbadin, R., Peyrard, N., Forsell, N.: A Framework and a Mean-Field Algorithm For The Local Conrtol of Spatial Processes. International Journal of Approximate Reasoning 53(1), 66–86 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  22. Sallans, B., Hinton, G.E.: Reinforcement Learning with Factored States and Actions. Journal of Machine Learning Research 5, 1063–1088 (2004)

    MathSciNet  MATH  Google Scholar 

  23. St-Aubin, R., Hoey, J., Boutilier, C.: APRICODD: approximate policy construction using decision diagrams. In: Advances in Neural Information Processing Systems, pp. 1089–1095 (2000)

    Google Scholar 

  24. Yedidia, J.S., Freeman, W.T., Weiss, Y.: Constructing Free-Energy Approximations and Generalized Belief Propagation Algorithms. IEEE Transactions on Information Theory 51(7), 2282–2312 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  25. Kok, J.R., Vlassis, N.: Collaborative Multiagent Reinforcement Learning by Payoff Propagation. Journal of Machine Learning Rsearch 7, 1789–1828 (2006)

    MathSciNet  MATH  Google Scholar 

  26. Guestrin, C., Lagoudakis, M., Parr, R.: Coordinated reinforcement learning. In: Proceedings of the 19th International Conference on Machine Learning (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to RĂ©gis Sabbadin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Radoszycki, J., Peyrard, N., Sabbadin, R. (2015). Solving F\(^3\)MDPs: Collaborative Multiagent Markov Decision Processes with Factored Transitions, Rewards and Stochastic Policies. In: Chen, Q., Torroni, P., Villata, S., Hsu, J., Omicini, A. (eds) PRIMA 2015: Principles and Practice of Multi-Agent Systems. PRIMA 2015. Lecture Notes in Computer Science(), vol 9387. Springer, Cham. https://doi.org/10.1007/978-3-319-25524-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25524-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25523-1

  • Online ISBN: 978-3-319-25524-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics