Abstract
This paper discusses an experimental comparison of supervised and reinforcement learning algorithms for the game of Othello. Motivated from the results, a new learning algorithm Mouse(μ) (MO nte-Carlo learning U sing heuri S tic E rror reduction) has been developed. Mouse uses a heuristic model of past experience to improve generalization and reduce noisy estimations. The algorithm was able to tune the parameter vector of a huge linear system consisting of about 1.5 million parameters and to end up at the fourth place in a recent GGS Othello tournament, a significant result for a self-teaching algorithm. Besides the theoretical aspects of the used learning methods, experimental results and comparisons are presented and discussed. These results demonstrate the advantages and drawbacks of existing learning approaches in strategy games and the potential of the new algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Samuel, A.: Some studies in machine learning using the game of checkers. IBM Journal of Research and Development 3, 210–229 (1959)
Buro, M.: Experiments with multi-probcut and a new high-quality evaluation function for Othello. In: van den Herik, J., Iida, H. (eds.) Games in AI Research, Universiteit Maastricht (2000)
Tesauro, G.: TD-Gammon, A self-teaching backgammon program, achieves master-level play. Neural Computation 6, 215–219 (1994)
Fürnkranz, J.: Machine learning in games: A survey. In: Fürnkranz, J., Kubat, M. (eds.) Machines that Learn to Play Games, pp. 11–59. Nova Science Publishers (2001)
Schaeffer, J.: The games computers (and people) play. In: Zelkowitz, M. (ed.) Advances in Computers 50, pp. 189–266. Academic Press, London (2000)
Buro, M.: From simple features to sophisticated evaluation functions. In: van den Herik, H.J., Iida, H. (eds.) CG 1998. LNCS, vol. 1558, pp. 126–145. Springer, Heidelberg (1999)
Buro, M.: Techniken fuer die Bewertung von Spielsituationen anhand von Beispielen. PhD thesis, University of Paderborn (1994) (in German)
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1988)
Watkins, C.: Models of Delayed Reinforcement Learning. PhD thesis, Psychology Department, Cambridge University (1989)
Kearns, M., Singh, S.: Bias-variance error bounds for temporal difference updates. In: 13th Annual Conference on Computational Learning Theory, pp. 142–147. Morgan Kaufmann, San Francisco (2000)
Tsitsiklis, J., van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42, 674–690 (1997)
Buro, M.: Personal communication (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tournavitis, K. (2003). MOUSE(μ): A Self-teaching Algorithm that Achieved Master-Strength at Othello. In: Schaeffer, J., Müller, M., Björnsson, Y. (eds) Computers and Games. CG 2002. Lecture Notes in Computer Science, vol 2883. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-40031-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-40031-8_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20545-6
Online ISBN: 978-3-540-40031-8
eBook Packages: Springer Book Archive