Advertisement

Theory of Computing Systems

, Volume 63, Issue 5, pp 1089–1130 | Cite as

The Operator Approach to Entropy Games

  • Marianne Akian
  • Stéphane GaubertEmail author
  • Julien Grand-Clément
  • Jérémie Guillaud
Article
  • 10 Downloads
Part of the following topical collections:
  1. Special Issue on Theoretical Aspects of Computer Science (STACS 2017)

Abstract

Entropy games and matrix multiplication games have been recently introduced by Asarin et al. They model the situation in which one player (Despot) wishes to minimize the growth rate of a matrix product, whereas the other player (Tribune) wishes to maximize it. We develop an operator approach to entropy games. This allows us to show that entropy games can be cast as stochastic mean payoff games in which some action spaces are simplices and payments are given by a relative entropy (Kullback-Leibler divergence). In this way, we show that entropy games with a fixed number of states belonging to Despot can be solved in polynomial time. This approach also allows us to solve these games by a policy iteration algorithm, which we compare with the spectral simplex algorithm developed by Protasov.

Keywords

Stochastic games Shapley operators Policy iteration Perron eigenvalues Risk sensitive control 

Notes

Acknowledgments

An announcement of the present results appeared in the proceedings of STACS, [4]. We are very grateful to the referees of this STACS paper and also to the referees of the present extended version, for their detailed comments which helped us to improve this manuscript.

References

  1. 1.
    Anantharam, V., Borkar, V.S.: A variational formula for risk-sensitive reward. SIAM J. Contro Optim. 55(2), 961–988 (2017). arXiv:1501.00676 MathSciNetzbMATHGoogle Scholar
  2. 2.
    Asarin, E., Cervelle, J., Degorre, A., Dima, C., Horn, F., Kozyakin, V.: Entropy games and matrix multiplication games. In: 33rd Symposium on Theoretical Aspects of Computer Science, STACS, Orlėans, France, pp. 11:1–11:14 (2016)Google Scholar
  3. 3.
    Akian, M., Gaubert, S., Guterman, A.: Tropical polyhedra are equivalent to mean payoff games. Int. J. Algebra Comput. 22(1), 125001 (43 pages) (2012)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Akian, M., Gaubert, S., Grand-Clément, J., Guillaud, J.: The Operator Approach to Entropy Games. In: Vollmer, H., Vallée, B. (eds.) 34th Symposium on Theoretical Aspects of Computer Science (STACS 2017), volume 66 of Leibniz International Proceedings in Informatics (LIPIcs), pp. 6:1–6:14. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl (2017)Google Scholar
  5. 5.
    Akian, M., Gaubert, S., Nussbaum, R.: A Collatz-Wielandt characterization of the spectral radius of order-preserving homogeneous maps on cones. arXiv:1112.5968 (2011)
  6. 6.
    Andersson, D., Miltersen, P.B.: The complexity of solving stochastic games on graphs. In: Proceedings of ISAAC’09, number 5878 in LNCS, pp 112–121. Springer (2009)Google Scholar
  7. 7.
    Borwein, J.M., Borwein, P.B.: On the complexity of familiar functions and numbers. SIAM Rev. 30(4), 589–601 (1988)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Baillon, J.B., Bruck, R.E.: Optimal rates of asymptotic regularity for averaged nonexpansive mappings. In: Tan, K. K. (ed.) Proceedings of the Second International Conference on Fixed Point Theory and Applications, pp. 27–66. World Scientific Press (1992)Google Scholar
  9. 9.
    Bolte, J., Gaubert, S., Vigeral, G.: Definable zero-sum stochastic games. Math. Oper. Res. 40(1), 171–191 (2014)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Bewley, T., Kohlberg, E.: The asymptotic theory of stochastic games. Math. Oper. Res. 1(3), 197–208 (1976)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Blondel, V.D., Nesterov, Y.: Polynomial-time computation of the joint spectral radius for some sets of nonnegative matrices. SIAM J. Matrix Anal. 31(3), 865–876 (2009)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Berman, A., Plemmons, R.J.: Nonnegative matrices in the mathematical sciences. Academic Press, New York (1994)zbMATHGoogle Scholar
  13. 13.
    Chen, T., Han, T.: On the complexity of computing maximum entropy for markovian models. In: 34th International Conference on Foundation of Software Technology and Theoretical Computer Science, FSTTCS 2014, pp. 571–583, New Delhi (2014)Google Scholar
  14. 14.
    Crandall, M.G., Tartar, L.: Some relations between non expansive and order preserving maps. Proc. AMS 78(3), 385–390 (1980)zbMATHGoogle Scholar
  15. 15.
    Donsker, M.D., Varadhan, R.: On a variational formula for the principal eigenvalue for operators with maximum principle. Proc. Nat. Acad. Sci. USA 72(3), 780–783 (1975)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Fleming, W.H., Hernández-Hernández, D.: Risk-sensitive control of finite state machines on an infinite horizon. I SIAM J. Control Optim. 35(5), 1790–1810 (1997)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Fleming, W.H., Hernández-Hernández, D.: Risk-sensitive control of finite state machines on an infinite horizon. II. SIAM J. Control Optim. 37(4), 1048–1069 (electronic) (1999)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Gaubert, S., Gunawardena, J.: A non-linear hierarchy for discrete event dynamical systems. In: Proceedings of the Fourth Workshop on Discrete Event Systems (WODES98), pp. 249–254. IEEE, Cagliari (1998)Google Scholar
  19. 19.
    Gaubert, S., Gunawardena, J.: The Perron-Frobenius theorem for homogeneous, monotone functions. Trans. AMS 356(12), 4931–4950 (2004)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Grötschel, M., Lovász, L., Schrijver, A.: The ellipsoid method and its consequences in combinatorial optimization. Combinatorica 1(2), 169–197 (1981)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Gaubert, S., Stott, N.: A convergent hierarchy of non-linear eigenproblems to compute the joint spectral radius of nonnegative matrices. Proceedings of the 23rd International Symposium on Mathematical Theory of Networks and Systems (MTNS2018), Hong Kong (2018)Google Scholar
  22. 22.
    Gaubert, S., Vigeral, G.: A maximin characterization of the escape rate of nonexpansive mappings in metrically convex spaces. Math Proc. Camb. Phil. Soc. 152, 341–363 (2012)zbMATHGoogle Scholar
  23. 23.
    Hoffman, A.J., Karp, R.M.: On nonterminating stochastic games. Manag. Sci. J. Inst. Manag. Sci. Appl. Theory Ser. 12, 359–370 (1966)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Howard, R.A., Matheson, J.E.: Risk-sensitive markov decision processes. Manag. Sci. 18(7), 356–369 (1972)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Hansen, T.D., Miltersen, P.B., Zwick, U.: Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. In: Innovations in Computer Science 2011, pp. 253–263. Tsinghua University Press (2011)Google Scholar
  26. 26.
    Ishikawa, S.: Fixed points and iteration of a nonexpansive mapping in a Banach space. Proc. Amer. Math. Soc. 59(1), 65–71 (1976)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Kingman, J.F.C.: A convexity property of positive matrices. Quart. J. Math. Oxford Ser. 2(12), 283–284 (1961)MathSciNetzbMATHGoogle Scholar
  28. 28.
    Kozyakin, V.: Hourglass alternative and the finiteness conjecture for the spectral characteristics of sets of non-negative matrices. Linear Algebra Appl. 489, 167–185 (2016)MathSciNetzbMATHGoogle Scholar
  29. 29.
    Krasnosel’skiĭ, M. A.: Two remarks on the method of successive approximations. Uspekhi Matematicheskikh Nauk 10, 123–127 (1955)MathSciNetGoogle Scholar
  30. 30.
    Kullback, S.: Information theory and statistics. Dover Publications, Inc., Mineola (1997). Reprint of the second (1968) editionzbMATHGoogle Scholar
  31. 31.
    Lemmens, B., Lins, B., Nussbaum, R., Wortel, M.: Denjoy-Wolff theorems for Hilbert’s and Thompson’s metric spaces. J. d’Anal. Math. 134, 671–718 (2018)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Lothaire, M.: Applied combinatorics on words. Cambridge, New York (2005)zbMATHGoogle Scholar
  33. 33.
    Mann, W.R.: Mean value methods in iteration. Proc. Amer. Math. Soc. 4, 506–510 (1953)MathSciNetzbMATHGoogle Scholar
  34. 34.
    Mertens, J.-F., Neyman, A.: Stochastic games. Internat. J. Game Theory 10(2), 53–66 (1981)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Müller, J. M.: Elementary functions: algorithms and implementation. Birkhaüser, Cambridge (2005)Google Scholar
  36. 36.
    Neyman, A.: Stochastic games and nonexpansive maps. In Stochastic games and applications (Stony Brook, NY, 1999), volume 570 of NATO Sci. Ser. C Math. Phys. Sci., pp. 397–415. Kluwer Acad. Publ., Dordrecht (2003)Google Scholar
  37. 37.
    Nussbaum, R.D.: Convexity and log convexity for the spectral radius. Linear Algebra Appl. 73, 59–122 (1986)MathSciNetzbMATHGoogle Scholar
  38. 38.
    Protasov, V. Yu.: Spectral simplex method. Math. Program. 156(1-2, Ser. A), 485–511 (2016)MathSciNetzbMATHGoogle Scholar
  39. 39.
    Puterman, M.L.: Markov decision processes. Wiley, New York (2005)zbMATHGoogle Scholar
  40. 40.
    Rothblum, U.G.: Multiplicative markov decision chains. Math. Oper. Res. 9 (1), 6–24 (1984)MathSciNetzbMATHGoogle Scholar
  41. 41.
    Rump, S.M.: Polynomial minimum root separation. Math. Comput. 145(33), 327–336 (1979)MathSciNetzbMATHGoogle Scholar
  42. 42.
    Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Berlin (1998)zbMATHGoogle Scholar
  43. 43.
    Sladký, K.: On Dynamic Programming Recursions for Multiplicative Markov Decision Chains, pp 216–226. Springer, Berlin (1976)zbMATHGoogle Scholar
  44. 44.
    van den Dries, L.: Tame topology and o-minimal structures, volume 248 of London Mathematical Society Lecture Note Series. Cambridge University Press, Cambridge (1998)Google Scholar
  45. 45.
    van den Dries, L.: o-minimal structures and real analytic geometry. In: Current developments in mathematics, 1998 (Cambridge, MA), pp. 105–152. Int. Press, Somerville (1999)Google Scholar
  46. 46.
    Vigeral, G.: A zero-sum stochastic game with compact action sets and no asymptotic value. Dyn. Games Appl. 3(2), 172–186 (2013)MathSciNetzbMATHGoogle Scholar
  47. 47.
    Whittle, P.: Optimization over time, I. Wiley, New York (1982)zbMATHGoogle Scholar
  48. 48.
    Wilkie, A.J.: Model completeness results for expansions of the ordered field of real numbers by restricted Pfaffian functions and the exponential function. J. Amer. Math. Soc. 9(4), 1051–1094 (1996)MathSciNetzbMATHGoogle Scholar
  49. 49.
    Ye, Y.: The simplex and policy-iteration methods are strongly polynomial for the markov decision problem with a fixed discount rate. Math. Oper. Res. 36(4), 593–603 (2011)MathSciNetzbMATHGoogle Scholar
  50. 50.
    Zijm, W.H.M.: Asymptotic expansions for dynamic programming recursions with general nonnegative matrices. J. Optim. Theory Appl. 54(1), 157–191 (1987)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Inria and CMAPÉcole polytechnique, CNRSPalaiseauFrance
  2. 2.IEOR DepartmentColumbia UniversityNew YorkUSA
  3. 3.Inria ParisParisFrance

Personalised recommendations