Learning Multi-agent Search Strategies

  • Malcolm J. A. Strens
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3394)


We identify a specialised class of reinforcement learning problem in which the agent(s) have the goal of gathering information (identifying the hidden state). The gathered information can affect rewards but not optimal behaviour. Exploiting this characteristic, an algorithm is developed for evaluating an agent’s policy against all possible hidden state histories at the same time. Experimental results show the method is effective in a two-dimensional multi-pursuer evader searching task. A comparison is made between identical policies, joint policies and “relational” policies that exploit relativistic information about the pursuers’ positions.


Observable State Hide State Optimal Behaviour Simulation Trial Relational Policy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge (1998)Google Scholar
  2. 2.
    Thrun, S.: Monte carlo POMDPs. In: Solla, S., Leen, T., Müller, K. (eds.) Advances in Neural Information Processing Systems, vol. 12, pp. 1064–1070. MIT Press, Cambridge (2000)Google Scholar
  3. 3.
    Doucet, A., de Freitas, J.F.G. (eds.): N.J.G.: Sequential Monte Carlo Methods in Practice. Springer, New York (2001)Google Scholar
  4. 4.
    Roy, N., Gordon, G.: Exponential family PCA for belief compression in POMDPs. In: Advances in Neural Information Processing Systems (2002)Google Scholar
  5. 5.
    Strens, M.J.A., Moore, A.W.: Direct policy search using paired statistical tests. In: Proceedings of the 18th International Conference on Machine Learning, pp. 545–552. Morgan Kaufmann, San Francisco (2001)Google Scholar
  6. 6.
    Storn, R., Price, K.: Differential evolution - a simple and efficient adaptive scheme for global optimization over continuous spaces. Technical Report TR-95-012, International Computer Science Institute, Berkeley, CA (1995)Google Scholar
  7. 7.
    Strens, M.J.A., Moore, A.W.: Policy search using paired comparisons. Journal of Machine Learning Research 3, 921–950 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Malcolm J. A. Strens
    • 1
  1. 1.Future Systems & Technology DivisionQinetiQ Ltd.Farnborough, Hants.U.K.

Personalised recommendations