Combining Planning with Reinforcement Learning for Multi-robot Task Allocation

  • Malcolm Strens
  • Neil Windelinckx
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3394)


We describe an approach to the multi-robot task allocation (MRTA) problem in which a group of robots must perform tasks that arise continuously, at arbitrary locations across a large space. A dynamic scheduling algorithm is derived in which proposed plans are evaluated using a combination of short-term lookahead and a value function acquired by reinforcement learning. We demonstrate that this dynamic scheduler can learn not only to allocate robots to tasks efficiently, but also to position the robots appropriately in readiness for new tasks (tactical awareness), and conserve resources over the long run (strategic awareness).


Reinforcement Learn Planning Horizon Partially Observable Markov Decision Process Partial Observability Policy Search 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: Structural assump- tions and computational leverage. Journal of Artificial Intelligence Research 11, 1–94 (1999)zbMATHMathSciNetGoogle Scholar
  2. 2.
    Gerkey, B.P., Mataric, M.J.: A formal framework for study of task allocation in multi-robot systems. Technical Report CRES-03-13, University of Southern Cali-fornia (2003)Google Scholar
  3. 3.
    Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)zbMATHGoogle Scholar
  4. 4.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge (1998)Google Scholar
  5. 5.
    Watkins, C.J.C.H.: Models of Delayed Reinforcement Learning. PhD thesis, Psy- chology Department, Cambridge University, Cambridge, United Kingdom (1989)Google Scholar
  6. 6.
    Strens, M.J.A., Moore, A.W.: Policy search using paired comparisons. Journal of Machine Learning Research 3, 921–950 (2002)CrossRefGoogle Scholar
  7. 7.
    Martin, J.J.: Bayesian Decision problems and Markov Chains. John Wiley, New York (1967)zbMATHGoogle Scholar
  8. 8.
    Meuleau, N., Hauskrecht, M., Kim, K.E., Peshkin, L., Kaelbling, L.P., Dean, T., Boutilier, C.: Solving very large weakly coupled Markov decision processes. In: Proceedings of the 15th National Conference on Artificial Intelligence (AAAI 1998), pp. 165–172. AAAI Press, Menlo Park (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Malcolm Strens
    • 1
  • Neil Windelinckx
    • 1
  1. 1.Future Systems & Technology DivisionQinetiQFarnborough, HampshireU.K.

Personalised recommendations