Advertisement

Neural Fictitious Self-Play in Imperfect Information Games with Many Players

  • Keigo Kawamura
  • Naoki Mizukami
  • Yoshimasa Tsuruoka
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 818)

Abstract

Computing Nash equilibrium solutions is an important problem in the domain of imperfect information games. Counterfactual Regret Minimization+ (CFR+) can be used to (essentially weakly) solve two-player limit Texas Hold’em, but it cannot be applied to large multi-player games due to the problem of space complexity. In this paper, we use Neural Fictitious Self-Play (NFSP) to calculate approximate Nash equilibrium solutions for imperfect information games with more than two players. Although there are no theoretical guarantees of convergence for NFSP in such games, we empirically demonstrate that NFSP enables us to calculate strategy profiles that are significantly less exploitable than random players in simple poker variants with three or more players.

References

  1. 1.
    Bowling, M., Burch, N., Johanson, M., Tammelin, O.: Heads-up limit hold’em poker is solved. Science 347(6218), 145–149 (2015)CrossRefGoogle Scholar
  2. 2.
    Brown, G.W.: Iterative solution of games by fictitious play. Activity Anal. Prod. Allocation 13(1), 374–376 (1951)MathSciNetMATHGoogle Scholar
  3. 3.
    Heinrich, J., Lanctot, M., Silver, D.: Fictitious self-play in extensive-form games. In: Proceedings of ICML. JMLR Workshop and Conference Proceedings, pp. 805–813 (2015)Google Scholar
  4. 4.
    Heinrich, J., Silver, D.: Deep reinforcement learning from self-play in imperfect-information games. arXiv:1603.01121 (2016)
  5. 5.
    Johanson, M., Waugh, K., Bowling, M., Zinkevich, M.: Accelerating best response calculation in large extensive games. In: Proceedings of the 22nd IJCAI, vol. 1, pp. 258–265 (2011)Google Scholar
  6. 6.
    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR (2014)Google Scholar
  7. 7.
    Kuhn, H.W.: A simplified two-person poker. Contrib. Theory Games 1, 97–103 (1950)MathSciNetMATHGoogle Scholar
  8. 8.
    Lanctot, M., Waugh, K., Zinkevich, M., Bowling, M.: Monte Carlo sampling for regret minimization in extensive games. In: Advances in NIPS 22, pp. 1078–1086 (2009)Google Scholar
  9. 9.
    Leslie, D.S., Collins, E.: Generalised weakened fictitious play. Games Econ. Behav. 56(2), 285–298 (2006)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)CrossRefGoogle Scholar
  11. 11.
    Risk, N.A., Szafron, D.: Using counterfactual regret minimization to create competitive multiplayer poker agents. In: Proceedings of the 9th AAMAS, vol. 1, pp. 159–166 (2010)Google Scholar
  12. 12.
    Shamma, J.S., Arslan, G.: Dynamic fictitious play, dynamic gradient play, and distributed convergence to nash equilibria. IEEE Trans. Autom. Control 50(3), 312–327 (2005)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Southey, F., Bowling, M., Larson, B., Piccione, C., Burch, N., Billings, D., Rayner, C.: Bayes’ bluff: opponent modelling in poker. In: Proceedings of the 21st Conference on UAI, UAI 2005, pp. 550–558. AUAI Press, Arlington (2005)Google Scholar
  14. 14.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  15. 15.
    Tammelin, O.: Solving large imperfect information games using CFR+. arXiv:1407.5042 (2014)
  16. 16.
    Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Watkins, C.J., Dayan, P.: Technical note: Q-learning. Mach. Learn. 8(3), 279–292 (1992)MATHGoogle Scholar
  18. 18.
    Zinkevich, M., Johanson, M., Bowling, M., Piccione, C.: Regret minimization in games with incomplete information. In: Advances in NIPS 20, pp. 1729–1736. Curran Associates, Inc. (2008)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Keigo Kawamura
    • 1
  • Naoki Mizukami
    • 1
  • Yoshimasa Tsuruoka
    • 1
  1. 1.The University of TokyoTokyoJapan

Personalised recommendations