Decentralized Markov Decision Processes for Handling Temporal and Resource constraints in a Multiple Robot System

  • Aurélie Beynier
  • Abdel-Illah Mouaddib


We consider in this paper a multi-robot planning system where robots realize a common mission with the following characteristics: the mission is an acyclic graph of tasks with dependencies and temporal window validity. Tasks are distributed among robots which have uncertain durations and resource consumptions to achieve tasks. This class of problems can be solved by using decision-theoretic planning techniques that are able to handle local temporal constraints and dependencies between robots allowing them to synchronize their processing. A specific decision model and a value function allow robots to coordinate their actions at runtime to maximize the overall value of the mission realization. For that, we design in this paper a cooperative multi-robot planning system using distributed Markov Decision Processes (MDPs) without communicating. Robots take uncertainty on temporal intervals and dependencies into consideration and use a distributed value function to coordinate the actions of robots.


Execution Time Start Time Precedence Constraint Temporal Constraint Multiple Robot 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [BDM02]
    J. Bresina, R. Dearden, N. Meuleau, S. Ramakrishnan, D. Smith, and R. Washington. Planning under continuous time and resource uncertainty: A challenge for ai. In UAI, 2002.Google Scholar
  2. [BGT03]
    C. Bererton, G. Gordon, and S. Thrun. Auction mechanism design for multi-robot coordination. In S. Thrun, L. Saul, and B. Schölkopf, editors, Proceedings of Conference on Neural Information Processing Systems (NIPS). MIT Press, 2003.Google Scholar
  3. [Bou99]
    Graig Boutilier. Sequential optimality and coordination in multiagents systems. In IJCAI, 1999.Google Scholar
  4. [BZI00]
    D. Bernstein, S. Zilberstein, and N. Immerman. The complexity of decentralized control of mdps. In UAI, 2000.Google Scholar
  5. [BZLG03]
    R. Becker, S. Zilberstein, V. Lesser, and C. Goldman. Transitionindependent decentralized markov decision processes. In AAMAS, 2003.Google Scholar
  6. [CMZW01]
    S. Cardon, AI. Mouaddib, S. Zilberstein, and R. Washington. Adaptive control of acyclic progressive processing task structures. In IJCAI, pages 701–706, 2001.Google Scholar
  7. [GDP01]
    C. Guestrin, D. Koller, and R. Parr. Multiagent planning with factored mdps. In NIPS, 2001.Google Scholar
  8. [GZ03]
    C. Goldman and S. Zilberstein. Optimizing information exchange in cooperative multiagent systems. In AAMAS, 2003.Google Scholar
  9. [HM02]
    H. Hanna and AI Mouaddib. Task selection as decision making in multiagent system. In AAMAS, pages 616–623, 2002.Google Scholar
  10. [NPY03]
    R. Nair, D. Pynadath, M. Yokoo, M. Tambe, and S. Marsella. Taming decentralized pomdps: Towards efficient policy computation for multiagent settings. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, 2003.Google Scholar
  11. [PKMK00]
    L. Peshkin, K.E. Kim, N. Meuleu, and L.P. Kaelbling. Learning to cooperate via policy search. In UAI, pages 489–496, 2000.Google Scholar
  12. [SB98]
    R.S. Sutton and A.G. Barto. Reinforcement learning: An introduction. MIT press, Cambrige, MA, 1998.Google Scholar
  13. [XLZ00]
    P. Xuan, V. Lesser, and S. Zilberstein. Communication decisions in multiagent cooperation. In Autonomous Agents, pages 616–623, 2000.Google Scholar

Copyright information

© Springer 2007

Authors and Affiliations

  • Aurélie Beynier
    • 1
  • Abdel-Illah Mouaddib
    • 1
  1. 1.GREYC-CNRSBd Marechal Juin, Campus IICaen CedexFrance

Personalised recommendations