Skip to main content

Model-Based Reinforcement Learning for Evolving Soccer Strategies

  • Chapter

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 62))

Abstract

We use reinforcement learning (RL) to evolve soccer team strategies. RL may profit significantly from world models (WMs). In high-dimensional, continuous input spaces, however, learning accurate WMs is intractable. In this chapter, we show that incomplete WMs can help to quickly find good policies. Our approach is based on a novel combination of CMACs and prioritized sweeping. Variants thereof outperform other algorithms used in previous work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Albus, J.S. (1975), “A new approach to manipulator control: the cerebellar model articulation controller (CMAC),” Dynamic Systems, Measurement and Control, pp. 220–227.

    Google Scholar 

  2. Baluja, S. and Caruana, R. (1995), “Removing the genetics from the standard genetic algorithm,” in Prieditis, A. and Russell, S. (Eds.), Machine Learning: Proceedings of the Twelfth International Conference, pp. 38–46. Morgan Kaufmann Publishers, San Francisco, CA.

    Google Scholar 

  3. Barto, A.G., Sutton, R.S., and Anderson, C.W. (1983), “Neuron-like adaptive elements that can solve difficult learning control problems,” IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, pp. 834–846.

    Google Scholar 

  4. Baxter, J., Tridgell, A., and Weaver, L. (1997), “Knightcap: A chess program that learns by combining TD(\) with minimax search,” Technical Report, Australian National University, Canberra.

    Google Scholar 

  5. Bellman, R. (1961), Adaptive Control Processes, Princeton University Press.

    Google Scholar 

  6. Berliner, H. (1977), “Experiences in evaluation with BKG — a program that plays backgammon,” Proceedings of IJCAI, pp. 428–433.

    Google Scholar 

  7. Bertsekas, D.P. and Tsitsiklis, J.N. (1996), Neuro-dynamic Programming, Athena Scientific, Belmont, MA.

    Google Scholar 

  8. Chapman, D. and Kaelbling, L.P. (1991), “Input generalization in delayed reinforcement learning,” Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI), vol. 2, pp. 726–731, Morgan Kaufman.

    Google Scholar 

  9. Cramer, N.L. (1985), “A representation for the adaptive generation of simple sequential programs,” Proceedings of an International Conference on Genetic Algorithms and Their Applications, pp. 183187, Hillsdale NJ.

    Google Scholar 

  10. Crites, R.H. and Barto, A.G. (1996), “Improving elevator performance using reinforcement learning,” in Touretzky, D.S., Mozer, M.C., and Hasselmo, M.E. (Eds.), Advances in Neural Information Processing Systems 8, pp. 1017–1023, MIT Press, Cambridge MA.

    Google Scholar 

  11. Dickmanns, D., Schmidhuber, J., and Winklhofer, A. (1986), “Der genetische Algorithmus: eine Implementierung in Prolog,” Fortgeschrittenenpraktikum, Institut fur Informatik, Lehrstuhl Prof. Radig, Technische Universität München.

    Google Scholar 

  12. Fritzke, B. (1994), “Supervised learning with growing cell structures,” in Cowan, J., Tesauro, G., and Alspector, J. (Eds.), Advances in Neural Information Processing Systems 6, pp. 255–262, San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  13. Holland, J.H. (1975), Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor.

    Google Scholar 

  14. Kaelbling, L.P., (1993), Learning in Embedded Systems, MIT Press.

    Google Scholar 

  15. Koza, J.R. (1992), “Genetic evolution and co-evolution of computer programs,” Artificial Life II, pp. 313–324, Addison Wesley Publishing Company.

    Google Scholar 

  16. Lin, L.-J. (1993), Reinforcement Learning for Robots Using Neural Networks. Ph.D. thesis, Carnegie Mellon University, Pittsburgh.

    Google Scholar 

  17. Moore, A. and Atkeson, C.G. (1993), “Prioritized sweeping: reinforcement learning with less data and less time,” Machine Learning, vol. 13, pp. 103–130.

    Google Scholar 

  18. Nowlan, S.J. and Hinton, G.E. (1992), “Simplifying neural networks by soft weight sharing,” Neural Computation, vol. 4, pp. 173193.

    Google Scholar 

  19. Peng, J. and Williams, R.J. (1996), “Incremental multi-step Q-learning,” Machine Learning, vol. 22, pp. 283–290.

    Google Scholar 

  20. Pollack, J.B. and Blair, A.D. (1996), “Why did TD-Gammon work,” In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pp. 10–16, MIT Press, Cambridge MA.

    Google Scholar 

  21. Puterman, M.L. (1994), Markov Decision Processes — Discrete Stochastic Dynamic Programming, John Wiley and Sons, New York.

    MATH  Google Scholar 

  22. Rechenberg, I. (1971), Evolutionsstrategie - Optimierung technischer Systeme nach Prinzipien der biologischen Evolution, Ph.D. thesis, published in 1973 by Fromman-Holzboog.

    Google Scholar 

  23. Rummery, G.A. and Niranjan, M. (1994), “On-line Q-learning using connectionist sytems,” Technical Report CUED/F-INFENG-TR 166, Cambridge University, U.K.

    Google Scholar 

  24. Salustowicz, R.P. and Schmidhuber, J. (1997). “Probabilistic incremental program evolution,” Evolutionary Computation, vol. 5, no. 2, pp. 123–141.

    Article  Google Scholar 

  25. Salustowicz, R.P. and Schmidhuber, J. (1999), “From probabilities to programs with probabilistic incremental program evolution,” in Corne, D., Dorigo, M., and Glover, F. (Eds.), New Ideas in Optimization, pp. 433–450. McGraw-Hill, London.

    Google Scholar 

  26. Salustowicz, R.P., Wiering, M.A., and Schmidhuber, J. (1997), “Evolving soccer strategies,” Proceedings of the Fourth International Conference on Neural Information Processing (ICONIP ‘87), pp. 502–506. Springer-Verlag Singapore.

    Google Scholar 

  27. Salustowicz, R.P., Wiering, M.A., and Schmidhuber, J. (1997), “On learning soccer strategies,” Proceedings of the Seventh International Conference on Artificial Neural Networks (ICANN’97), vol. 1327 of Lecture Notes in Computer Science, pp. 769–774, Springer-Verlag Berl in Heidelberg.

    Google Scholar 

  28. Salustowicz, R.P., Wiering, M.A., and Schmidhuber, J. (1998), “Learning team strategies: soccer case studies,” Machine Learning, vol. 33 (2/3), pp. 263–282.

    Article  MATH  Google Scholar 

  29. Samuel, A.L. (1959), “Some studies in machine learning using the game of checkers,” IBM Journal on Research and Development, vol. 3, pp. 210–229.

    Article  Google Scholar 

  30. Santamaria, J.C., Sutton, R.S., and Ram, A. (1996), “Experiments with reinforcement learning in problems with continuous state and action spaces,” Technical Report COINS 96–088, Georgia Institute of Technology, Atlanta.

    Google Scholar 

  31. Schmidhuber, J. (1994), “On learning how to learn learning strategies,” Technical Report FKI-198–94, Fakultät fir Informatik, Technische Universität München, revised January 1995.

    Google Scholar 

  32. Schmidhuber, J., Zhao, J., and Schraudolph, N. (1997), “Reinforcement learning with self-modifying policies,” in Thrun, S. and Pratt, L. (Eds.), Learning to learn, pp. 293–309, Kluwer.

    Google Scholar 

  33. Schmidhuber, J., Zhao, J., and Wiering, M.A. (1997), “Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement,” Machine Learning, vol. 28, pp. 105–130.

    Article  Google Scholar 

  34. Schraudolph, N.N., Dayan, P., and Sejnowski, T.J. (1994), “Temporal difference learning of position evaluation in the game of go,” Advances in Neural Information Processing Systems, vol. 6, pp. 817824, Morgan Kaufmann, San Francisco.

    Google Scholar 

  35. Singh, S.P., and Sutton, R.S. (1996), “Reinforcement learning with replacing elibibility traces,” Machine Learning, vol. 22, pp. 123158.

    Google Scholar 

  36. Stone, P., Veloso, M., and Riley, P. (1999), “CMUnited-98 champion simulator team,” RoboCup-98: Robot Soccer World Cup II, vol. 1604 of Lecture Notes in Artificial Intelligence, pp. 61–76, Springer-Verlag.

    Google Scholar 

  37. Sutton, R.S. (1988), “Learning to predict by the methods of temporal differences,” Machine Learning, vol. 3, pp. 9–44.

    Google Scholar 

  38. Sutton, R.S. (1996), “Generalization in reinforcement learning: Successful examples using sparse coarse coding,” Advances in Neural Information Processing Systems 8, pp. 1038–1045, MIT Press, Cambridge MA.

    Google Scholar 

  39. Sutton, R.S. and Barto, A.G. (1998), Reinforcement learning: an introduction, MIT Press/Bradford Books.

    Google Scholar 

  40. Tesauro, G. (1992), “Practical issues in temporal difference learning,” Advances in Neural Information Processing Systems 4, pp. 259–266, San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  41. Thrun, S. (1995), “Learning to play the game of chess,” Advances in Neural Information Processing Systems 7, pp. 1069–1076, San Fransisco, CA: Morgan Kaufmann.

    Google Scholar 

  42. Thrun, S., Fox, D., and Burgard, W. (1998), “A probabilistic approach to concurrent mapping and localization for mobile robots,” Machine Learning, vol. 31, pp. 29–53. Also appeared in Autonomous Robots, vol. 5, pp. 253–271, 1998 as joint issue.

    Google Scholar 

  43. Watkins, C.J.C.H. (1989), Learning from Delayed Rewards, Ph.D. thesis, King’s College, Cambridge, England.

    Google Scholar 

  44. Watkins, C.J.C.H. and Dayan, P. (1992) “Q-learning,” Machine Learning, vol. 8, pp. 279–292.

    MATH  Google Scholar 

  45. Wiering, M.A. (1999), Explorations in Efficient Reinforcement Learning, Ph.D. thesis, University of Amsterdam / IDSIA.

    Google Scholar 

  46. Wiering, M.A. and Schmidhuber, J. (1998), “Efficient model-based exploration,” Proceedings of the Sixth International Conference on Simulation of Adaptive Behavior: from Animals to Animats 6, pp. 223–228, MIT Press/Bradford Books.

    Google Scholar 

  47. Wiering, M.A. and Schmidhuber, J. (1998), “Fast online Q(A),” Machine Learning, vol. 33 (1), pp. 105–116.

    Article  MATH  Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Physica-Verlag Heidelberg

About this chapter

Cite this chapter

Wiering, M.A., Salustowicz, R.P., Schmidhuber, J. (2001). Model-Based Reinforcement Learning for Evolving Soccer Strategies. In: Baba, N., Jain, L.C. (eds) Computational Intelligence in Games. Studies in Fuzziness and Soft Computing, vol 62. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1833-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-7908-1833-8_5

  • Publisher Name: Physica, Heidelberg

  • Print ISBN: 978-3-662-00369-5

  • Online ISBN: 978-3-7908-1833-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics