Sharing Information in Adversarial Bandit

  • David L. St-PierreEmail author
  • Olivier Teytaud
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8602)


2-Player games in general provide a popular platform for research in Artificial Intelligence (AI). One of the main challenges coming from this platform is approximating a Nash Equilibrium (NE) over zero-sum matrix games. While the problem of computing such a Nash Equilibrium is solvable in polynomial time using Linear Programming (LP), it rapidly becomes infeasible to solve as the size of the matrix grows; a situation commonly encountered in games. This paper focuses on improving the approximation of a NE for matrix games such that it outperforms the state-of-the-art algorithms given a finite (and rather small) number \(T\) of oracle requests to rewards. To reach this objective, we propose to share information between the different relevant pure strategies. We show both theoretically by improving the bound and empirically by experiments on artificial matrices and on a real-world game that information sharing leads to an improvement of the approximation of the NE.


Bandit problem Monte-Carlo Nash Equilibrium Games 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Audibert, J., Bubeck, S.: Minimax policies for adversarial and stochastic bandits. In: 22nd Annual Conference on Learning Theory (COLT), Montreal (June 2009)Google Scholar
  2. 2.
    Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. The Journal of Machine Learning Research 3, 397–422 (2003)zbMATHMathSciNetGoogle Scholar
  3. 3.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 322–331. IEEE Computer Society Press, Los Alamitos (1995)Google Scholar
  4. 4.
    Ciancarini, P., Favini, G.P.: Monte carlo tree search in kriegspiel. Artif. Intell. 174(11), 670–684 (2010)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Grenadier, S.R.: Option exercise games: An application to the equilibrium investment strategies of firms. Review of financial studies 15(3), 691–721 (2002)CrossRefGoogle Scholar
  6. 6.
    Grigoriadis, M.D., Khachiyan, L.G.: A sublinear-time randomized approximation algorithm for matrix games. Operations Research Letters 18(2), 53–58 (1995)CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Hedden, T., Zhang, J.: What do you think i think you think?: Strategic reasoning in matrix games. Cognition 85(1), 1–36 (2002)CrossRefGoogle Scholar
  8. 8.
    Lipton, R.J., Markakis, E., Mehta, A.: Playing large games using simple strategies. In: Proceedings of the 4th ACM Conference on Electronic Commerce, pp. 36–41. ACM (2003)Google Scholar
  9. 9.
    Perrick, P., St-Pierre, D., Maes, F., Ernst, D.: Comparison of different selection strategies in Monte-Carlo tree search for the game of Tron. In: Proceedings of the IEEE Conference on Computational Intelligence and Games (CIG 2012), Granada, Spain (2012)Google Scholar
  10. 10.
    Russell, S., Wolfe, J.: Efficient Belief-State AND-OR Search, with Application to Kriegspiel. In: IJCAI, pp. 278–285 (2005)Google Scholar
  11. 11.
    St-Pierre, D.L., Louveaux, Q., Teytaud, O.: Online Sparse Bandit for Card Games. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 295–305. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    Teytaud, O., Flory, S.: Upper Confidence Trees with Short Term Partial Information. In: Di Chio, C., et al. (eds.) EvoApplications 2011, Part I. LNCS, vol. 6624, pp. 153–162. Springer, Heidelberg (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Montefiore Institute, Department of Electrical Engineering and Computer ScienceLiège UniversityLiègeBelgium
  2. 2.TAO, InriaUniversité Paris-Sud, UMR CNRS 8623ParisFrance

Personalised recommendations