AlphaZero

Zhang, Hongming; Yu, Tianyang

doi:10.1007/978-981-15-4095-0_15

AlphaZero

Hongming Zhang⁴ &
Tianyang Yu⁵

Chapter
First Online: 30 June 2020

11k Accesses
7 Citations

Abstract

In this chapter, we introduce combinatorial games such as chess and Go and take Gomoku as an example to introduce the AlphaZero algorithm, a general algorithm that has achieved superhuman performance in many challenging games. This chapter is divided into three parts: the first part introduces the concept of combinatorial games, the second part introduces the family of algorithms known as Monte Carlo Tree Search, and the third part takes Gomoku as the game environment to demonstrate the details of the AlphaZero algorithm, which combines Monte Carlo Tree Search and deep reinforcement learning from self-play.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Albert M, Nowakowski R, Wolfe D (2007) Lessons in play: an introduction to combinatorial game theory. CRC Press, Boca Raton
Book Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2–3):235–256
Article Google Scholar
Browne CB, Powley E, Whitehouse D, Lucas SM, Colton S (2012) A survey of Monte Carlo tree search methods. IEEE Trans Comput Intell Ai Games 4(1):1–43
Article Google Scholar
Campbell M, Hoane Jr AJ, Hsu FH (2002) Deep blue. Artif. Intell. 134(1–2):57–83
Article Google Scholar
Couetoux A, Milone M, Brendel M, Doghmen H, Sebag M, Teytaud O (2011) Continuous rapid action value estimates. In: Asian conference on machine learning, pp 19–31
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Google Scholar
Hsu Fh (1999) IBM’s deep blue chess grandmaster chips. IEEE Micro 19(2):70–81
Article Google Scholar
Kocsis L, Szepesvári C (2006) Bandit based Monte-Carlo planning. In: European conference on machine learning. Springer, Berlin, pp 282–293
Google Scholar
Muthoo RBA (1996) A course in game theory by Martin J. Osborne; Ariel Rubinstein. Economica 63(249):164–165
Article Google Scholar
Osborne MJ, Rubinstein A (1994) A course in game theory. MIT press
MATH Google Scholar
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529:484
Article Google Scholar
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, et al (2017a) Mastering chess and shogi by self-play with a general reinforcement learning algorithm. Preprint. arXiv:171201815
Google Scholar
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D (2017b) Mastering the game of go without human knowledge. Nature 550(7676):354
Article Google Scholar
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, et al (2018) A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362(6419):1140–1144
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Peking University, Beijing, China
Hongming Zhang
Nanchang University, Nanchang, China
Tianyang Yu

Authors

Hongming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tianyang Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongming Zhang .

Editor information

Editors and Affiliations

EECS, Peking University, Beijing, China
Hao Dong
CS, Imperial College London, London, UK
Zihan Ding
EECS, University of California, Berkeley, Berkeley, USA
Shanghang Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhang, H., Yu, T. (2020). AlphaZero. In: Dong, H., Ding, Z., Zhang, S. (eds) Deep Reinforcement Learning. Springer, Singapore. https://doi.org/10.1007/978-981-15-4095-0_15

Download citation

DOI: https://doi.org/10.1007/978-981-15-4095-0_15
Published: 30 June 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-4094-3
Online ISBN: 978-981-15-4095-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics