Abstract
Automated tuning of parameters in computer game playing is an important technique for building strong computer programs. Comparison training is a supervised learning method for tuning the parameters of an evaluation function. It has proven to be effective in the game of Chess and Shogi. The training method requires a large number of training positions and moves extracted from game records of human experts; however, the number of such game records is limited. In this paper, we propose a practical approach to create additional training data for comparison training by using the program itself. We investigate three methods for generating additional positions and moves. Then we evaluate them using a Shogi program. Experimental results show that the self-generated training data can improve the playing strength of the program.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
We can get a sufficient variety of game positions by making the first 36 moves from game records of experts. A shogi game is usually still in the opening stage even after playing the first 36 moves. The generation of Leaf and Random is done with 35 moves while Self-play uses only 30 moves because the base player may make the same moves as experts. Some extra moves of the base player are needed in Self-play to generate different positions from game records of experts.
- 3.
It takes several tens of seconds for Gekisashi to perform a search with a depth of 20Â in a typical middle-game position.
- 4.
For example, when the training data included the Leaf training data and the Random training data, the test data included the Leaf test data and the Random test data.
- 5.
Players with a rating higher than 2550 as of June 10, 2013.
References
Baxter, J., Tridgell, A., Weaver, L.: Reinforcement learning and chess. In: Furnkranz, J., Kubat, M. (eds.) Machines That Learn to Play Games, pp. 91–116. Nova Science Publishers, Inc., Hauppauge (2001)
Beal, D.F., Smith, M.C.: Temporal difference learning applied to game playing and the results of application to shogi. Theor. Comput. Sci. 252(1–2), 105–119 (2001)
Bošković, B., Brest, J., Zamuda, A., Greiner, S., Žumer, V.: History mechanism supported differential evolution for chess evaluation function tuning. Soft Comput. 15(4), 667–683 (2010)
Buro, M.: From simple features to sophisticated evaluation functions. In: van den Herik, H.J., Iida, H. (eds.) CG 1998. LNCS, vol. 1558, pp. 126–145. Springer, Heidelberg (1999)
Buro, M.: Improving heuristic mini-max search by supervised learning. Artif. Intell. 134(1–2), 85–99 (2002)
Campbell, M., Hoane, A., et al.: Deep blue. Artif. Intell. 134(1–2), 57–83 (2002)
Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: EMNLP ’02, pp. 1–8. Association for Computational Linguistics (2002)
David-Tabibi, O., Koppel, M., Netanyahu, N.S.: Expert-driven genetic algorithms for simulating evaluation functions. Genet. Program. Evolvable Mach. 12(1), 5–22 (2011)
Fogel, D.B., Hays, T.J., Hahn, S.L., Quon, J.: A self-learning evolutionary chess program. Proc. IEEE 92(12), 1947–1954 (2004)
Fürnkranz, J.: Machine learning in games: a survey. In: Fürnkranz, J., Kubat, M. (eds.) Machines That Learn to Play Games, pp. 11–59. Nova Science Publishers, Inc., Hauppauge (2001)
Hoki, K., Kaneko, T.: The global landscape of objective functions for the optimization of shogi piece values with a game-tree search. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 184–195. Springer, Heidelberg (2012)
Kaneko, T.: Evaluation functions of computer shogi programs and supervised learning using game records. J. Jpn. Soc. Artif. Intell. 27(1), 75–82 (2012) (In Japanese)
Kaneko, T., Hoki, K.: Analysis of evaluation-function learning by comparison of sibling nodes. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 158–169. Springer, Heidelberg (2012)
Lee, K.F., Mahajan, S.: A pattern classification approach to evaluation function learning. Artif. Intell. 36(1), 1–25 (1988)
Mandziuk, J.: Knowledge-Free and Learning-Based Methods in Intelligent Game Playing. Springer, Heidelberg (2010)
Sato, Y., Miwa, M., Takeuchi, S., Takahashi, D.: Optimizing objective function parameters for strength in computer game-playing. In: AAAI ’13, pp. 869–875 (2013)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. Cambridge University Press, Cambridge (1998)
Tesauro, G.: Comparison training of chess evaluation functions. Machines That Learn To play Games, pp. 117–130. Nova Science Publishers, Inc., New York (2001)
Tesauro, G.: Programming backgammon using self-teaching neural nets. Artif. Intell. 134(1–2), 181–199 (2002)
Tsuruoka, Y., Yokoyama, D., Chikayama, T.: Game-tree search algorithm based on realization probability. ICGA J. 25(3), 146–153 (2002)
Vázquez-Fernández, E., Coello, C.A.C., Troncoso, F.D.S.: An evolutionary algorithm coupled with the Hooke-Jeeves algorithm for tuning a chess evaluation function. In: IEEE CEC ’12, pp. 1–8 (2012)
Veness, J., Silver, D., Uther, W., Blair, A.: Bootstrapping from game tree search. Adv. Neural Inf. Process. Syst. 22, 1937–1945 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ura, A., Miwa, M., Tsuruoka, Y., Chikayama, T. (2014). Comparison Training of Shogi Evaluation Functions with Self-Generated Training Positions and Moves. In: van den Herik, H., Iida, H., Plaat, A. (eds) Computers and Games. CG 2013. Lecture Notes in Computer Science(), vol 8427. Springer, Cham. https://doi.org/10.1007/978-3-319-09165-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-09165-5_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09164-8
Online ISBN: 978-3-319-09165-5
eBook Packages: Computer ScienceComputer Science (R0)