Spatial Average Pooling for Computer Go

Cazenave, Tristan

doi:10.1007/978-3-030-24337-1_6

Tristan Cazenave¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1017))

Included in the following conference series:

Workshop on Computer Games

460 Accesses
2 Citations

Abstract

Computer Go has improved up to a superhuman level thanks to Monte Carlo Tree Search (MCTS) combined with Deep Learning. The best computer Go programs use reinforcement learning to train a policy and a value network. These networks are used in a MCTS algorithm to provide strong computer Go players. In this paper we propose to improve the architecture of a value network using Spatial Average Pooling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Browne, C., et al.: A survey of Monte Carlo tree search methods. IEEE TCIAIG 4(1), 1–43 (2012)
Google Scholar
Cazenave, T.: Improved policy networks for computer go. In: Winands, M.H.M., van den Herik, H.J., Kosters, W.A. (eds.) ACG 2017. LNCS, vol. 10664, pp. 90–100. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71649-7_8
Chapter Google Scholar
Cazenave, T.: Residual networks for computer go. IEEE Trans. Games 10(1), 107–110 (2018)
Article Google Scholar
Cazenave, T., Jouandeau, N.: On the parallelization of UCT. In: Proceedings of the Computer Games Workshop, pp. 93–101. Citeseer (2007)
Google Scholar
Cazenave, T., Jouandeau, N.: A parallel Monte-Carlo tree search algorithm. In: van den Herik, H.J., Xu, X., Ma, Z., Winands, M.H.M. (eds.) CG 2008. LNCS, vol. 5131, pp. 72–80. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87608-3_7
Chapter Google Scholar
Chaslot, G.M.J.-B., Winands, M.H.M., van den Herik, H.J.: Parallel Monte-Carlo tree search. In: van den Herik, H.J., Xu, X., Ma, Z., Winands, M.H.M. (eds.) CG 2008. LNCS, vol. 5131, pp. 60–71. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87608-3_6
Chapter Google Scholar
Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a matlab-like environment for machine learning. In: BigLearn, NIPS Workshop, number EPFL-CONF-192376 (2011)
Google Scholar
Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7
Chapter Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Chapter Google Scholar
Pascutto, G.-C.: Leela zero (2018). http://zero.sjeng.org/
Rosin, C.D.: Multi-armed bandits with episode context. Ann. Math. Artif. Intell. 61(3), 203–230 (2011)
Article MathSciNet Google Scholar
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Silver, D., et al.: Mastering chess and Shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (2017)
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354 (2017)
Article Google Scholar
Tian, Y., Gong, Q., Shang, W., Wu, Y., Zitnick, C.L.: Elf: an extensive, lightweight and flexible research platform for real-time strategy games. In: Advances in Neural Information Processing Systems, pp. 2656–2666 (2017)
Google Scholar
Tian, Y., Ma, J., Gong, Q., Sengupta, S., Chen, Z., Zitnick, C.L.: ELF OpenGo (2018). https://github.com/pytorch/ELF
Tian, Y., Zhu, Y.: Better computer go player with neural network and long-term prediction. In: ICLR (2016)
Google Scholar
Wu, T.-R., et al.: Multi-labelled value networks for computer go. arXiv e-prints, May 2017
Google Scholar
Yamaguchi, Y.: AQ (2018). https://github.com/ymgaq/AQ
Zeng, Q., Zhang, J., Zeng, Z., Li, Y., Chen, M., Liu, S.: Phoenixgo (2018). https://github.com/Tencent/PhoenixGo

Download references

Author information

Authors and Affiliations

Université Paris-Dauphine, PSL Research University, CNRS, LAMSADE, Paris, France
Tristan Cazenave

Authors

Tristan Cazenave
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tristan Cazenave .

Editor information

Editors and Affiliations

LAMSADE, Université Paris-Dauphine, Paris, France
Tristan Cazenave
University of New South Wales, Sydney, Australia
Abdallah Saffidine
University of Alberta, Edmonton, Canada
Nathan Sturtevant

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 910 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cazenave, T. (2019). Spatial Average Pooling for Computer Go. In: Cazenave, T., Saffidine, A., Sturtevant, N. (eds) Computer Games. CGW 2018. Communications in Computer and Information Science, vol 1017. Springer, Cham. https://doi.org/10.1007/978-3-030-24337-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-24337-1_6
Published: 29 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24336-4
Online ISBN: 978-3-030-24337-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics