Skip to main content

Comparison Training of Shogi Evaluation Functions with Self-Generated Training Positions and Moves

  • Conference paper
  • First Online:
Computers and Games (CG 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8427))

Included in the following conference series:

Abstract

Automated tuning of parameters in computer game playing is an important technique for building strong computer programs. Comparison training is a supervised learning method for tuning the parameters of an evaluation function. It has proven to be effective in the game of Chess and Shogi. The training method requires a large number of training positions and moves extracted from game records of human experts; however, the number of such game records is limited. In this paper, we propose a practical approach to create additional training data for comparison training by using the program itself. We investigate three methods for generating additional positions and moves. Then we evaluate them using a Shogi program. Experimental results show that the self-generated training data can improve the playing strength of the program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://wdoor.c.u-tokyo.ac.jp/shogi/floodgate.html

  2. 2.

    We can get a sufficient variety of game positions by making the first 36 moves from game records of experts. A shogi game is usually still in the opening stage even after playing the first 36 moves. The generation of Leaf and Random is done with 35 moves while Self-play uses only 30 moves because the base player may make the same moves as experts. Some extra moves of the base player are needed in Self-play to generate different positions from game records of experts.

  3. 3.

    It takes several tens of seconds for Gekisashi to perform a search with a depth of 20 in a typical middle-game position.

  4. 4.

    For example, when the training data included the Leaf training data and the Random training data, the test data included the Leaf test data and the Random test data.

  5. 5.

    Players with a rating higher than 2550 as of June 10, 2013.

References

  1. Baxter, J., Tridgell, A., Weaver, L.: Reinforcement learning and chess. In: Furnkranz, J., Kubat, M. (eds.) Machines That Learn to Play Games, pp. 91–116. Nova Science Publishers, Inc., Hauppauge (2001)

    Google Scholar 

  2. Beal, D.F., Smith, M.C.: Temporal difference learning applied to game playing and the results of application to shogi. Theor. Comput. Sci. 252(1–2), 105–119 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  3. Bošković, B., Brest, J., Zamuda, A., Greiner, S., Žumer, V.: History mechanism supported differential evolution for chess evaluation function tuning. Soft Comput. 15(4), 667–683 (2010)

    Article  Google Scholar 

  4. Buro, M.: From simple features to sophisticated evaluation functions. In: van den Herik, H.J., Iida, H. (eds.) CG 1998. LNCS, vol. 1558, pp. 126–145. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  5. Buro, M.: Improving heuristic mini-max search by supervised learning. Artif. Intell. 134(1–2), 85–99 (2002)

    Article  MATH  Google Scholar 

  6. Campbell, M., Hoane, A., et al.: Deep blue. Artif. Intell. 134(1–2), 57–83 (2002)

    Article  MATH  Google Scholar 

  7. Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: EMNLP ’02, pp. 1–8. Association for Computational Linguistics (2002)

    Google Scholar 

  8. David-Tabibi, O., Koppel, M., Netanyahu, N.S.: Expert-driven genetic algorithms for simulating evaluation functions. Genet. Program. Evolvable Mach. 12(1), 5–22 (2011)

    Article  Google Scholar 

  9. Fogel, D.B., Hays, T.J., Hahn, S.L., Quon, J.: A self-learning evolutionary chess program. Proc. IEEE 92(12), 1947–1954 (2004)

    Article  Google Scholar 

  10. Fürnkranz, J.: Machine learning in games: a survey. In: Fürnkranz, J., Kubat, M. (eds.) Machines That Learn to Play Games, pp. 11–59. Nova Science Publishers, Inc., Hauppauge (2001)

    Google Scholar 

  11. Hoki, K., Kaneko, T.: The global landscape of objective functions for the optimization of shogi piece values with a game-tree search. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 184–195. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  12. Kaneko, T.: Evaluation functions of computer shogi programs and supervised learning using game records. J. Jpn. Soc. Artif. Intell. 27(1), 75–82 (2012) (In Japanese)

    Google Scholar 

  13. Kaneko, T., Hoki, K.: Analysis of evaluation-function learning by comparison of sibling nodes. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 158–169. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  14. Lee, K.F., Mahajan, S.: A pattern classification approach to evaluation function learning. Artif. Intell. 36(1), 1–25 (1988)

    Article  Google Scholar 

  15. Mandziuk, J.: Knowledge-Free and Learning-Based Methods in Intelligent Game Playing. Springer, Heidelberg (2010)

    Book  MATH  Google Scholar 

  16. Sato, Y., Miwa, M., Takeuchi, S., Takahashi, D.: Optimizing objective function parameters for strength in computer game-playing. In: AAAI ’13, pp. 869–875 (2013)

    Google Scholar 

  17. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. Cambridge University Press, Cambridge (1998)

    Google Scholar 

  18. Tesauro, G.: Comparison training of chess evaluation functions. Machines That Learn To play Games, pp. 117–130. Nova Science Publishers, Inc., New York (2001)

    Google Scholar 

  19. Tesauro, G.: Programming backgammon using self-teaching neural nets. Artif. Intell. 134(1–2), 181–199 (2002)

    Article  MATH  Google Scholar 

  20. Tsuruoka, Y., Yokoyama, D., Chikayama, T.: Game-tree search algorithm based on realization probability. ICGA J. 25(3), 146–153 (2002)

    Google Scholar 

  21. Vázquez-Fernández, E., Coello, C.A.C., Troncoso, F.D.S.: An evolutionary algorithm coupled with the Hooke-Jeeves algorithm for tuning a chess evaluation function. In: IEEE CEC ’12, pp. 1–8 (2012)

    Google Scholar 

  22. Veness, J., Silver, D., Uther, W., Blair, A.: Bootstrapping from game tree search. Adv. Neural Inf. Process. Syst. 22, 1937–1945 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akira Ura .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ura, A., Miwa, M., Tsuruoka, Y., Chikayama, T. (2014). Comparison Training of Shogi Evaluation Functions with Self-Generated Training Positions and Moves. In: van den Herik, H., Iida, H., Plaat, A. (eds) Computers and Games. CG 2013. Lecture Notes in Computer Science(), vol 8427. Springer, Cham. https://doi.org/10.1007/978-3-319-09165-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09165-5_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09164-8

  • Online ISBN: 978-3-319-09165-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics