Tempered best response dynamics

Zusai, Dai

doi:10.1007/s00182-017-0575-9

Tempered best response dynamics

Original Paper
Published: 04 April 2017

Volume 47, pages 1–34, (2018)
Cite this article

International Journal of Game Theory Aims and scope Submit manuscript

Dai Zusai¹

952 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

We propose a new deterministic evolutionary dynamic—the tempered best response dynamic (tBRD)—to capture two features of economic decision making: optimization and continuous sensitivity to incentives. That is, in the tBRD, an agent is more likely to revise his action when his current payoff is further from the optimal payoff, and he always switches to an optimal action when revising. The tBRD is a payoff monotone selection like the replicator dynamic, which makes medium and long-run outcomes more consistent with predictions from equilibrium refinement than the BRD in some situations. The technical contribution of the tBRD is continuous sensitivity, which allows us to apply results of a system of piecewise differential equations in order to obtain conditions for uniqueness and stability of solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Uncertainty and the nature of the firm: From Frank Knight and Ronald Coase to an evolutionary approach

Article 13 October 2023

Reward Function Design in Reinforcement Learning

Notes

Recently decision theorists have incorporated status-quo biases into the axiomatic framework of choice theory: see Masatlioglu and Ok (2005), Sagi (2006), Ortoleva (2010). In addition, the theory of industrial organization notes the significance of consumers’ switching costs in market competition (Klemperer 1995).
Sawa and Zusai (2013) explicitly formulate a situation in which each agent engages in two separate games simultaneously and can switch to optimal action only in one game upon a single revision opportunity.
Among deterministic learning dynamics for mixed strategies with finitely many players, we could view the target projection dynamic (Tsakas and Voorneveld 2009) as a mixed-strategy (monomorphic) BRD with a deterministic status-quo bias.
For further details, see Sandholm (2010b, Ch.2).
The assumption of unit mass is made just for notational simplicity. We could easily extend the model and the results to general cases where different populations have different masses.
We omit the transpose when we write a column vector on the text. The vector in a bold font is a column vector, while the one with an arrow over the letter is a row vector. $\mathbf {1}$ is a column vector $(1,1,\ldots ,1)$. Note that $\mathbf {1}\cdot \mathbf {z}=\sum _{i=1}^n z_i$ for an arbitrary column vector $\mathbf {z}=(z_i)_{i=1}^n$. For a finite set $\mathcal {Z}=\{1,\ldots ,Z\}$, we define $\Delta \mathcal {Z}$ as $\Delta \mathcal {Z}:=\{\rho \in [0,1]^Z|\mathbf {1}\cdot \rho =1 \}$, i.e., the set of all probability distributions on $\mathcal {Z}$.
Precisely $\mathbf {x}$ is an A-dimensional column vector $(x^1_1,\ldots ,x^1_{A^1},x^2_1,\ldots ,x^2_{A^2},\ldots , x^P_1,\ldots ,x^P_{A^P})$.
Notice that $B^p(\mathbf {x})=\arg \max _{\mathbf {y}^p\in \Delta \mathcal {A}^p} \mathbf {y}^p\cdot \mathbf {F}^p(\mathbf {x})$ is a convex set for every $\mathbf {x}\in \mathcal {X}$.
Here $M^T$ is the transpose of matrix M.
See Sandholm (2010b, Ch.4). He defines each of the major evolutionary dynamics by these two components and then induces a differential equation/inclusion from the aggregate (the mean dynamic) of the individual revisions.
Gilboa and Matsui (1991) and Matsui (1992) also formulate the best response dynamic in slightly different forms.
We denote $\mathbb {R}_+:=[0,\infty )$ and $\mathbb {R}_{++}:=(0,\infty ).$
Roth and Sandholm (2013) consider finite-population optimization-based evolutionary dynamics, including tBRD, both in discrete and continuous time horizons. They prove that, as the size of a population goes to infinity, both the medium and long run behavior of the dynamic is well approximated by the infinite-population dynamic such as the one presented here.
Here $T\mathcal {X}:=\prod _{p\in \mathcal {P}}T\mathcal {X}^p$ and $T\mathcal {X}^p$ is the tangent space of $\mathcal {X}^p\subset \mathbb {R}^{A^p}$, i.e., $T\mathcal {X}^p:=\{\mathbf {z}^p\in \mathbb {R}^{A^p}| \mathbf {1}\cdot \mathbf {z}^p=0\}.$
Continuity of Q in Assumption 2 enables us to interpret Q as a distribution function. Since $\breve{F}$ is continuous and thus bounded on $\mathcal {X}$, this continuity guarantees boundedness of Q. We set the upper bound of Q to 1 but it is only for simplification.
When $\mathbf {x}$ is fixed or clear from the context, we abbreviate $Q(\breve{F}^p_a(\mathbf {x}))$ as $Q^p_a$ and $Q'(\breve{F}^p_a(\mathbf {x}))$ as $Q^{p\prime }_a$.
However, not every direction can be maintained once the state leaves the equilibrium; a transition vector has to be consistent with the best response at off-equilibrium states in the direction of its vector. We can see this in the examples that follow.
One might think the strong Nash stationarity as a direct implication of Assumption 3(a). But it only guarantees the weak stationarity (part 3(i) in Theorem 1), as a Carathéodory solution allows the dynamic to deviate from the given differential inclusion at a moment of time. For the strong stationarity [part 3(ii)], continuity of the revision rate function Q in Assumption 2(a) prohibits the transition vector from bursting when leaving a rest point.
See Sandholm (2010b, Sec. 5.3). Among major dynamics, the logit dynamic does not satisfy Nash stationarity or positive correlation. The replicator dynamic does not satisfy Nash stationarity, as it may have a rest point that is not a Nash equilibrium.
Actually, because Assumption 3 is not needed for the above theorem, our stability theorem is applicable to the standard BRD and thus a generalization of the results that have been established for the BRD. See Sandholm (2001) for potential games, Hofbauer and Sandholm (2009) for contractive games, Sandholm (2010a) for regular ESSs.
Appendix A summarizes Lyapunov functions for the standard BRD and the perturbed BRD.
Assumption 3(a) implies (7), and Assumption 3(b) implies (6); regularity comes from Assumptions 2(a) and 3(b).
In the preceding literature [e.g. Weibull (1995, Definition 4.2.) and Hofbauer and Sigmund (1998, p.88)], payoff monotonicity requires “two-sided” monotonicity: $F^p_a(\mathbf {x}^t)> F^p_b(\mathbf {x}^t) \Leftrightarrow \dot{x}^{t,p}_a/x^{t,p}_a> \dot{x}^{t,p}_b/x^{t,p}_b$, which implies both of (6) and (7). The tBRD does not satisfy the two-sided monotonicity, because multiple optimal actions can have the masses of their players grow at different rates. But our version is sufficient for the limit of an interior convergence path to be a Nash equilibrium; see Zusai (2013, Theorems 1 and 2).
See Balkenborg et al. (2013) for the refined BRD and Oyama et al. (2015) for the sampling BRD.
Golman and Page (2010) present several games with multiple equilibria where the best response and replicator dynamics may yield significantly different sizes of the basins of attraction across equilibria of the same game. The basin of attraction under the tBRD is the same as that under the BRD in one of their examples (the Haruvy–Stahl game) and is similar to that under the replicator dynamic in another example (the game used to prove their Theorem 2).
Originally this example is presented in Zeeman (1980) to show that an interior Nash equilibrium that is not an evolutionary stable state (ESS) can be asymptotically stable under the replicator dynamic. See also Sandholm (2010b, Example 5.1.7).
Note that the transition vectors to $\mathbf {e}_1$ and $\mathbf {e}_3$ form an obtuse angle.
A social state $\mathbf {x}\in \mathcal {X}$ is an $\varepsilon $-proper equilibrium with $\varepsilon >0$, if $\mathbf {x}$ lies in the interior of $\mathcal {X}$ and it satisfies $x^p_a <\varepsilon x^p_b$ whenever $F_a^p(\mathbf {x})< F_b^p(\mathbf {x})$ for all $p\in \mathcal {P},a,b\in \mathcal {A}^p$. A social state $\mathbf {x}^*\in \mathcal {X}$ is a proper equilibrium if there are sequences $\{\mathbf {x}^n \}\subset \mathcal {X}$ and $\{\varepsilon ^n\}\subset (0,\infty )$ such that each $\mathbf {x}^n$ is an $\varepsilon ^n$-proper equilibrium and the sequence $\{(\mathbf {x}^n,\varepsilon ^n)\}$ satisfies $\mathbf {x}^n\rightarrow \mathbf {x}^*$ and $\varepsilon ^n\rightarrow 0$ as $n\rightarrow \infty $.
In a simple two-stage chain-store game, an interior path converges to a Nash equilibrium with a weakly dominated strategy both under the replicator dynamic and under the tBRD, while only to a strict equilibrium under the BRD. (Zusai 2013; Cressman 2003, p. 291.) So this connection cannot be generalized.
Condition (i) means that the payoff ordering does not change in a finite time. Condition (ii) means that, if a strategy is suboptimal in the states on the convergent path, it should be extinguished in the limit, whether or not it becomes optimal in the limit. Condition (iii) prohibits the payoff difference between strategies from vanishing at the limit, unless one strategy is the optimal and the other is the second best on the path.
For epistemological foundation of properness, see Blume et al. (1991).
That is, Filippov solutions are usually adopted as the solution concept for a system of piecewise DEs. Under the tBRD they coincide with Carathéodory solutions because of the upper semicontinuity and the convexity of $V_Q$, though this is not true in general. See Bacciotti (2003).
See Honkapohja and Ito (1983) for a weaker sufficient condition for stability.
Sandholm (2010b, Thm. 7.B.5) is a special case of this theorem with $\tilde{W}=W$; (10) reduces to $\dot{W}(\mathbf {x}^t)\le -W(\mathbf {x}^t).$ This is not applicable to our proof of Nash stability in a contractive game; see Sect. 1.
You may notice that $\mathcal {X}$ is only a subspace of $\mathbb {R}^A$. The gradient $\nabla f(\mathbf {x})$ here means the coefficient vector in the linear approximation of change in f on the tangent space of $\mathcal {X}$, i.e., $f(\mathbf {x}+\mathbf {z})=f(\mathbf {x})+\nabla f(\mathbf {x})\cdot \mathbf {z}+o(|\mathbf {z}|)$ for all $\mathbf {z}\in T\mathcal {X}.$
See Sandholm (2015). Hofbauer and Sandholm (2009) call it a stable game. It is also called a negative semidefinite game.
$\mathbf {x}^*$ is a quasi-strict equilibrium, if $F_*(\mathbf {x}^*)=F_a(\mathbf {x}^*)>F_b(\mathbf {x}^*)$ for any population $p\in \mathcal {P},$ any used action a and any unused action b, i.e., whenever $x^*_a>0$ and $x^*_b=0$.
Because of switches of maximizer $b\in \mathcal {A}^p$, $L(\mathbf {x})$ is not continuously differentiable everywhere in $\mathcal {X}$.
Notice that Assumption 3(a) is not needed.
Here we omit $\mathbf {x}$ from the arguments of functions on $\mathcal {X}$, and let $Q^p_a:=Q(\breve{F}^p_a(\mathbf {x}))$.

References

Bacciotti A (2003) On several notions of generalized solutions for discontinuous differential equations and their relationship. Discussion paper, Dipartimento di Matematica, Politecnico di Torino
Balkenborg D, Hofbauer J, Kuzmics C (2013) Refined best-response correspondence and dynamics. Theor Econ 8(1):165–192
Article Google Scholar
Blume L, Brandenburger A, Dekel E (1991) Lexicographic probabilities and equilibrium refinements. Econometrica 59(1):81–98
Article Google Scholar
Cressman R (2003) Evolutionary dynamics and extensive form games. MIT Press, Cambridge
Google Scholar
Ely J, Sandholm WH (2005) Evolution in Bayesian games I: theory. Games Econ Behav 53:83–109
Article Google Scholar
Erev I, Roth AE (1998) How people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. Am Econ Rev 88(4):848–881
Google Scholar
Gilboa I, Matsui A (1991) Social stability and equilibrium. Econometrica 59(3):859–867
Article Google Scholar
Golman R, Page S (2010) Basins of attraction and equilibrium selection under different learning rules. J Evolut Econ 20:49–72
Article Google Scholar
Hartman RS, Doane MJ, Woo C-K (1991) Consumer rationality and the status quo. Q J Econ 106(1):141–162
Article Google Scholar
Hofbauer J (1995) Stability for the best response dynamics. Mimeo, University of Vienna
Hofbauer J, Sandholm WH (2009) Stable games and their dynamics. J Econ Theory 144(4):1665–1693
Article Google Scholar
Hofbauer J, Sigmund K (1998) Evolutionary games and population dynamics. Cambridge University Press, Cambridge
Book Google Scholar
Honkapohja S, Ito T (1983) Stability with regime switching. J Econ Theory 29:22–48
Article Google Scholar
Klemperer P (1995) Competition when consumers have switching costs: an overview with applications to industrial organization, macroeconomics, and international trade. Rev Econ Stud 62:515–539
Article Google Scholar
Kojima F (2009) A note on dynamics in games. Mimeo, Stanford University
Kojima F, Takahashi S (2007) Anti-coordination games and dynamic stability. Int Game Theory Rev 9(4):667–688
Article Google Scholar
Kuzmics C (2011) On the elimination of dominated strategies in stochastic models of evolution with large populations. Games Econ Behav 72(2):452–466
Article Google Scholar
Lipman BL, Wang R (2000) Switching costs in frequently repeated games. J Econ Theory 93(2):149–190
Article Google Scholar
Lipman BL, Wang R (2009) Switching costs in infinitely repeated games. Games Econ Behav 66:292–314
Article Google Scholar
Lou Y, Yin Y, Lawphongpanich S (2010) Robust congestion pricing under boundedly rational user equilibrium. Transp Res Part B 44:15–28
Article Google Scholar
Madrian BC, Shea DF (2001) The power of suggestion: inertia in 401 (k) participation and savings behavior. Q J Econ 116(4):1149–1187
Article Google Scholar
Masatlioglu Y, Ok EA (2005) Rational choice with status quo bias. J Econ Theory 121(1):1–29
Article Google Scholar
Matsui A (1992) Best response dynamics and socially stable strategies. J Econ Theory 57(2):343–362
Article Google Scholar
Norman TW (2009) Rapid evolution under inertia. Games Econ Behav 66(2):865–879
Article Google Scholar
Norman TW (2010) Cycles versus equilibrium in evolutionary games. Theory Decis 69(2):167–182
Article Google Scholar
Ortoleva P (2010) Status quo bias, multiple priors and uncertainty aversion. Games Econ Behav 69(2):411–424
Article Google Scholar
Oyama D, Sandholm WH, Tercieux O (2015) Sampling Best Response Dynamics and Deterministic Equilibrium Selection. Theor Econ 10(1):243–281
Roth G, Sandholm WH (2013) Stochastic approximations with constant step size and differential inclusions. SIAM J Control Optim 51(1):525–555
Article Google Scholar
Sagi JS (2006) Anchored preference relations. J Econ Theory 130(1):283–295
Article Google Scholar
Samuelson W, Zeckhauser R (1988) Status quo bias in decision making. J Risk Uncertain 1(1):7–59
Article Google Scholar
Sandholm WH (2001) Potential games with continuous player sets. J Econ Theory 97(1):81–108
Article Google Scholar
Sandholm WH (2010a) Local stability under evolutionary game dynamics. Theor Econ 5(1):27–50
Article Google Scholar
Sandholm WH (2010b) Population games and evolutionary dynamics. MIT Press, Cambridge
Google Scholar
Sandholm WH (2015) Population Games and Deterministic Evolutionary Dynamics. In: Young HP, Zamir S (eds) Handbook of game theory with economic applications, vol 4. Elsevier
Sandholm WH, Dokumaci E, Franchetti F (2012) Dynamo: diagrams for evolutionary game dynamics. http://www.ssc.wisc.edu/~whs/dynamo
Sawa R, Zusai D (2013) Best response dynamics in multitask environments. Mimeo, University of Aizu and Temple University
Smirnov GV (2001) Introduction to the theory of differential inclusions. American Mathematical Society, Providence
Book Google Scholar
Szeto W, Lo HK (2006) Dynamic traffic assignment: properties and extensions. Transportmetrica 2(1):31–52
Article Google Scholar
Tsakas E, Voorneveld M (2009) The target projection dynamic. Games Econ Behav 67(2):708–719
Article Google Scholar
van Damme E (1991) Stability and perfection of Nash equilibria, 2nd edn. Springer, Berlin
Book Google Scholar
Weibull JW (1995) Evolutionary game theory. MIT Press, Cambridge
Google Scholar
Zeeman E (1980) Population dynamics from game theory. In: Nitecki Z, Robinson C (eds) Global theory of dynamical systems. Springer, Berlin, pp 471–497
Chapter Google Scholar
Zusai D (2011) Essays on evolutionary dynamics and applications to implementation problems. Ph.D. thesis, University of Wisconsin-Madison
Zusai D (2013) Interior convergence under payoff monotone selections and proper equilibrium: application to equilibrium selection. In: Křivan V, Zaccour G (eds) Advances in dynamical games, Annals of the International Society of Dynamic Games, vol 13. Birkhäuser-Springer, Basel, pp 107–121
Zusai D (2015) Disaggregate evolutionary dynamics under payoff heterogeneity. Mimeo, Temple University

Download references

Acknowledgements

This paper is based on Chapter 1 of my doctoral dissertation (Zusai 2011). I greatly appreciate the encouragement and advice of Bill Sandholm and Marek Weretka. I would like to thank Takashi Akamatsu, Larry Blume, Dimitrios Diamantaras, Russell Golman, Makoto Hanazono, Josef Hofbauer, Ryota Iijima, Akihiko Matsui, Daisuke Oyama, Dan Sasaki, Ryoji Sawa, Noah Williams, the associate editor and the referee of the journal, and the seminar participants at Australian National U., Bank of Japan, Temple U., Tohoku U., U. Tokyo, U. Vienna, U. Wisconsin-Madison, Japan Economic Association, and Midwest Economic Theory Meeting for their comments. I also thank Nathan Yoder for careful reading and editing suggestions and Hiroko Ono for creating the vector diagrams. Francisco Franchetti worked with Bill Sandholm to include the tBRD into Dynamo, which enables me to draw the beautiful phase diagrams. Financial support from Richard E. Stockwell Graduate Student Fellowship is gratefully acknowledged. Of course all errors are mine.

Author information

Authors and Affiliations

Department of Economics, Temple University, 1301 Cecil B. Moore Ave., RA 873 (004-04), Philadelphia, PA, 19122, USA
Dai Zusai

Authors

Dai Zusai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dai Zusai.

Appendices

Appendix: Stability and classes of games

1.1 A.1: Stability concepts and Lyapunov stability theorem

Since the tBRD is a differential inclusion and allows multiple solution paths, we need to define “stability” in terms of convergence to the rest point on any solution path.

Definition 3

(Sandholm 2010b: Sec. 7.A) Consider a differential inclusion $\dot{\mathbf {x}}\in V(\mathbf {x})$ defined over $\mathcal {X}$ and a closed set $A\subset \mathcal {X}$. A is Lyapunov stable under V if for any open neighborhood O of A there exists a neighborhood $O'$ of A such that every solution $\{\mathbf {x}^t\}$ that starts from $O'$ remains in O. A is attracting if there is a neighborhood B of A such that every solution that starts in B converges to A. A is globally attracting if it is attracting with $B=\mathcal {X}.$ A is asymptotically stable if it is Lyapunov stable and attracting; it is globally asymptotically stable if it is Lyapunov stable and globally attracting.

We can apply the standard form of Lyapunov stability theorem (Sandholm 2010b, Thm. 7.B.2,4), if the Lyapunov function is continuously differentiable everywhere in the state space $\mathcal {X}$. But, in a contractive game, the proposed Lyapunov function (4) is not differentiable at some point, as we argue in Sect. 1. So, we need a DI version of the stability theorem of a Lyapunov function. Smirnov (2001, Theorem 8.2) proves the one for convergence to a single rest point of a differential inclusion, using an upper Dini derivative. As we expect multiple Nash equilibria, we modify the theorem to allow convergence to a set. On the other hand, as we assume Lipschitz continuity of the payoff function and the dynamic, we can restrict the Lyapunov function W to a Lipschitz continuous function.

Theorem 7

Let A be a closed subset of a compact space $\mathcal {X}$ and $A'$ be a neighborhood of A. Suppose that two continuous functions $W:\mathcal {X}\rightarrow \mathbb {R}$ and $\tilde{W}:\mathcal {X}\rightarrow \mathbb {R}$ satisfy (i) $W(\mathbf {x})\ge 0$ and $\tilde{W}(\mathbf {x})\ge 0$ for all $\mathbf {x}\in \mathcal {X}$ and (ii) $W^{-1}(0)=\tilde{W}^{-1}(0)=A$. In addition, assume that W is Lipschitz continuous in $\mathbf {x}\in \mathcal {X}$ with Lipschitz constant $K\in (0,\infty )$. If any Carathéodory solution $\{\mathbf {x}^t\}$ starting from $A'$ satisfies

$$\begin{aligned} \dot{W}(\mathbf {x}^t) \le -\tilde{W}(\mathbf {x}^t) \quad \text {for almost all }t\in [0,\infty ), \end{aligned}$$

(10)

then A is asymptotically stable and $A'$ is its basin of attraction.^{Footnote 32}

Proof

First of all, we can readily prove Lyapunov stability of A. By property (i) of $\tilde{W}$ and (10), we have $\dot{W}(\mathbf {x}^t)\le 0$ and thus $W(\mathbf {x}^t)$ cannot increase over time. With property (ii) of W, this implies Lyapunov stability of $A=W^{-1}(0)$.

Now we prove that $W(\mathbf {x}^t)$ can be less than any small positive number after a sufficiently long time t. To prove it by contradiction, suppose that there is a positive number $l>0$ such that $W(\mathbf {x}^t)\ge l$ for all $t\ge 0$ on a Carathéodory solution $\{\mathbf {x}^t\}$. Lipschitz continuity of W implies

$$\begin{aligned} d(\mathbf {x},A):=\min _{\mathbf {y}\in A} |\mathbf {x}-\mathbf {y}| \le 0.5 l/K \quad \Longrightarrow \quad W(\mathbf {x}) \le 0.5 l <l. \end{aligned}$$

Actually, this assumption implies the existence of $\mathbf {y}$ in the compact set A such that $|\mathbf {x}-\mathbf {y}| \le 0.5l/K$; then it follows that $W(\mathbf {x})=|W(\mathbf {x})-W(\mathbf {y})|\le K |\mathbf {x}-\mathbf {y}| \le 0.5 l$ from the Lipschitz continuity of W and properties (i,ii) of W. So the Carathéodory solution should satisfy $d(\mathbf {x}^t,A)>0.5 l/K$ to maintain $W(\mathbf {x}^t)\ge l$ for all $t\ge 0$.

Consider a closed set $\check{A}$ defined as

$$\begin{aligned} \check{A}= \{\mathbf {x}\in \mathcal {X}| d(\mathbf {x},A)\ge 0.5 l/K\}. \end{aligned}$$

Then the minimum of $\tilde{W}$ exists in this set $\check{A}$ and

$$\begin{aligned} \mu := \min _{\mathbf {x}\in \check{A}} \tilde{W} (\mathbf {x}) >0, \end{aligned}$$

since $\check{A}$ is a compact set and the minimizer belongs to $\check{A}$ and thus not to $A=\tilde{W}^{-1}(0)$. As $\mathbf {x}^t\in \check{A}$, we have $-\tilde{W}(\mathbf {x}^t)\le -\mu $. Hence (10) implies

$$\begin{aligned} W(\mathbf {x}^t)-W(\mathbf {x}^0)\le & {} -\int ^t_0 \tilde{W}(\mathbf {x}^s) ds \le -\mu t,\\ \therefore \quad W(\mathbf {x}^t)\le & {} W(\mathbf {x}^0) -\mu t \end{aligned}$$

for all $t\in [0,\infty )$. As $\mu >0$, this suggests that $W(\mathbf {x}^t)<0$ whenever $t>W(\mathbf {x}^0)/\mu $, contradicting property (i) of W.

Therefore, for any positive number $l>0$, we can find a time T such that we have $W(\mathbf {x}^t)<l$ for all $t\ge T$. In conclusion, any Carathéodory solution $\{\mathbf {x}_t\}$ starting from $A'$ satisfies

$$\begin{aligned} \lim _{t\rightarrow \infty } W(\mathbf {x}_t)=0, \end{aligned}$$

and converges to the set $A=W^{-1}(0)$.$\square $

1.2 A.2: Classes of games

Sandholm (2010b, Chapter 3) provides further explanation and examples.

1.2.1 Potential games

A population game $\mathbf {F}:\mathcal {X}\rightarrow \mathbb {R}^A$ is called a potential game if there is a scalar-valued continuously differentiable function $f:\mathcal {X}\rightarrow \mathbb {R}$ whose gradient vector always coincides with the relative payoff vector: for all $p\in \mathcal {P}$ and $\mathbf {x}\in \mathcal {X}$, f satisfies^{Footnote 33}

$$\begin{aligned} \frac{\partial f}{\partial x^p_a}(\mathbf {x})= & {} F^p_a(\mathbf {x})-\bar{F}^p(\mathbf {x}) \quad \text { for all } a\in \mathcal {A}^p,\\ \text { i.e., } \nabla ^p f(\mathbf {x}):= & {} \left( \frac{\partial f}{\partial x^p_1}(\mathbf {x}),\ldots ,\frac{\partial f}{\partial x^p_{A^p}}(\mathbf {x})\right) =\mathbf {F}^p(\mathbf {x})-\bar{F}^p(\mathbf {x})\mathbf {1}. \end{aligned}$$

The class of potential games includes random matching in symmetric games, binary choice games and standard congestion games. The potential function f works as a Lyapunov function in a wide range of evolutionary dynamics: replicator, BRD, etc.: see Sandholm (2001).

1.2.2 Contractive games

The existence of a potential function seems to be a strong assumption on a game. Contractive games are a generalization of potential games with concave potential functions. A population game $\mathbf {F}$ is a contractive game if^{Footnote 34}

$$\begin{aligned} (\mathbf {y}-\mathbf {x})\cdot (\mathbf {F}(\mathbf {y})-\mathbf {F}(\mathbf {x}))\le 0 \quad \text {for all } \mathbf {x},\mathbf {y}\in \mathcal {X}. \end{aligned}$$

If the strict inequality holds whenever $\mathbf {x}\ne \mathbf {y}$, $\mathbf {F}$ is a strict contractive game.

If $\mathbf {F}$ is $C^1$, the definition of a contractive game is equivalent to negative semidefiniteness of $D\mathbf {F}(\cdot )$ with respect to the tangent space $T\mathcal {X}$ of the state space $\mathcal {X}$: for any $\mathbf {x}\in \mathcal {X}$,

$$\begin{aligned} \mathbf {z}\cdot D\mathbf {F}(\mathbf {x})\mathbf {z}\le 0 \quad \text { if } \mathbf {1}\cdot \mathbf {z}=0. \end{aligned}$$

Notice that this implies similar negative semidefiniteness of $D\mathbf {F}^p(\cdot )$ on $T\mathcal {X}^p$ for each population $p\in \mathcal {P}$: for any $\mathbf {x}\in \mathcal {X}$,

$$\begin{aligned} {\mathbf {z}^p}\cdot D\mathbf {F}^p(\mathbf {x})\mathbf {z}^p \le 0\quad \text { if } \mathbf {1}\cdot \mathbf {z}^p=0. \end{aligned}$$

The class of contractive games includes two-player zero-sum games as well as games with an interior evolutionary stable state or neutrally stable state.

Hofbauer and Sandholm (2009) show that the set of Nash equilibria of a contractive game is globally asymptotic stable under a broad class of evolutionary dynamics. In the BRD $\dot{\mathbf {x}}\in B(\mathbf {x})-\mathbf {x}$, the Lyapunov function is the difference between the optimized payoff and the current average payoff: $L(\mathbf {x})=\sum _p F^p_*(\mathbf {x})=\max _{\mathbf {y}\in \mathcal {X}} (\mathbf {y}-\mathbf {x})\cdot \mathbf {F}(\mathbf {x}).$ In the perturbed BRD $\dot{\mathbf {x}}\in \tilde{B}(\mathbf {x})-\mathbf {x}$ with $\tilde{B}^p(\mathbf {x})=\arg \max _{\mathbf {y}\in \mathcal {X}^p} \mathbf {y}\cdot \mathbf {F}^p(\mathbf {x})-v^p(\mathbf {y}^p)$, it is the difference between the maximized payoff and the average payoff net of the payoff perturbations: $ L(\mathbf {x})=\sum _p \left[ \max _{\mathbf {y}\in \mathcal {X}^p}(\mathbf {y}-\mathbf {x})\cdot \mathbf {F}^p(\mathbf {x})- \left( v^p(\mathbf {y}^p)-v^p(\mathbf {x}^p)\right) \right] .$

1.2.3 Regular ESS

A state $\mathbf {x}^*\in \mathcal {X}$ is a regular (Taylor) evolutionary stable state if it is a quasi-strict equilibrium^{Footnote 35} and it satisfies

$$\begin{aligned} (\mathbf {y}-\mathbf {x}^*)\cdot D\mathbf {F}(\mathbf {x}^*) (\mathbf {y}-\mathbf {x}^*) <0 \quad \text { whenever } (\mathbf {y}-\mathbf {x}^*)\cdot \mathbf {F}(\mathbf {x}^*)=0 \quad \text { and } \mathbf {y}\in \mathcal {X}{\setminus }\{\mathbf {x}^*\}. \end{aligned}$$

(11)

Let $U^p$ be the set of population p’s unused actions in the regular ESS $\mathbf {x}^*$ and $\mathbb {R}^A_{0U}$ be the set of vectors in $\mathbb {R}^A$ that take zero on $U^p$ for all $p\in \mathcal {P}$:

$$\begin{aligned} U^p:=\left\{ a\in \mathcal {A}^p ~|~ x^{*,p}_a=0 \right\} , \quad \mathbb {R}^A_{0U}:=\left\{ \mathbf {z}\in \mathbb {R}^A ~|~ z^p_a=0 \text { for all }p\in \mathcal {P}, a\in U^p \right\} . \nonumber \\ \end{aligned}$$

(12)

Then the condition (11) can be replaced with

$$\begin{aligned} \mathbf {z}\cdot D\mathbf {F}(\mathbf {x}^*) \mathbf {z}<0 \quad \text { for any } \mathbf {z}\in T\mathcal {X}\cap \mathbb {R}^A_{0U}. \end{aligned}$$

This condition means that the game $\mathbf {F}$ is a strictly contractive game locally around the quasi-strict equilibrium $\mathbf {x}^*$ in the reduced state space where any action unused in $\mathbf {x}^*$ is kept unused. Sandholm (2010a) proves local stability of a regular ESS in major evolutionary dynamics.

Appendix B: The proofs

1.1 B.1: Theorem 1 (Nash stationarity)

Proof

(1) Suppose that $\mathbf {x}^*$ is a Nash equilibrium. Then for each $p\in \mathcal {P}$, every action $a\in \mathcal {A}^p$ satisfies $x^{*,p}_a=0$ or $\breve{F}^p_a(\mathbf {x}^*)=0$. Hence,

$$\begin{aligned} B^p_Q(\mathbf {x}^*)=\sum _{a\in b^p(\mathbf {x}^*)} \{Q(0) B^p(\mathbf {x}^*)+(1-Q(0)) \mathbf {e}^p_a\}=Q(0)B^p(\mathbf {x}^*)+(1-Q(0))\mathbf {x}^{*,p}. \end{aligned}$$

Hence, the condition for a Nash equilibrium $\mathbf {x}^{*,p}\in B^p(\mathbf {x}^*)$ implies $\mathbf {x}^{*,p}\in B^p_Q(\mathbf {x}^*)$ and thus $\mathbf {0}\in V_Q(\mathbf {x}^*):=B_Q(\mathbf {x}^*)-\mathbf {x}^*.$

Furthermore, if Assumption 3(a) holds, i.e., if $Q(0)=0$, then $B^p_Q(\mathbf {x}^*)$ reduces to $\mathbf {x}^{*,p}$ and thus $V_Q(\mathbf {x}^*)=\{\mathbf {0}\}$.

(2) Suppose that $\mathbf {x}$ is not a Nash equilibrium. Then we can find at least one population $p\in \mathcal {P}$ with a suboptimal action $a\in \mathcal {A}^p$ being played by a positive mass of its players: $\breve{F}^p_a(\mathbf {x})>0$ and $x^p_a>0$. Under the tBRD, this mass decreases at rate $Q(\breve{F}^p_a(\mathbf {x}))>0$ by Assumption 2(b). So any transition vector $\dot{\mathbf {x}}\in V(\mathbf {x})$ has a negative entry $\dot{x}^p_a(\mathbf {x})=-x^p_a Q(\breve{F}^p_a(\mathbf {x}))<0$ for this action. Hence $\dot{\mathbf {x}}$ cannot be a zero vector, i.e., $\mathbf {0} \notin V_Q(\mathbf {x}).$

(3) Part 3(i) is immediate from part 1(i). For part 3(ii), assume Assumptions 1, 2(a) and 3(a). First of all, Lipschitz continuity of $\mathbf {F}$ in Assumption 1 implies Lipschitz continuity of the payoff deficit $ \breve{F}^p_a:= F^p_*-F^p_a$ for any population $p\in \mathcal {P}$ and action $a\in \mathcal {A}^p$. Let $\bar{K}^p$ be the largest Lipschitz constant of $\breve{F}_a^p$ among all $a\in \mathcal {A}^p$; then,

$$\begin{aligned} | \breve{F}^p_a(\mathbf {x}_1) -\breve{F}_a^p(\mathbf {x}_2) | \le \bar{K}^p |\mathbf {x}_1-\mathbf {x}_2| \quad \text { for all } \mathbf {x}_1,\mathbf {x}_2\in \mathcal {X}. \end{aligned}$$

Suppose that there is a Carathéodory solution path staying at a Nash equilibrium $\mathbf {x}^*$ until time $T\ge 0$ and leaving it at time T. Then, by (2a, 2b) and the triangle inequality, we have

$$\begin{aligned} |\dot{\mathbf {x}}^t| \le \sum _{a,p} x^{p,t}_a |Q(\breve{F}^p_a(\mathbf {x}^t))| \cdot | \mathbf {y}^{p,t}_a-\mathbf {e}^{p,t}_a | \quad \text {with some } \mathbf {y}^{p,t}_a\in B^p(\mathbf {x}^t)\subset \Delta \mathcal {A}^p \end{aligned}$$

for almost all time t. First, consider any action $a\notin b^p(\mathbf {x}^*)$. Nash equilibrium requires $x^{*,p}_a=0$. Besides, by continuity of $\mathbf {F}^p$ and of the path $\mathbf {x}^t$, the assumptions $a\notin b^p(\mathbf {x}^*)$ and $\mathbf {x}^*=\mathbf {x}^T$ imply $\breve{F}^p_a(\mathbf {x}^{T+\tau })>0$ for sufficiently small $\tau $. So, in the time range $[T,T+\tau ]$, such actions are not optimal and thus each keeps $x^{p,t}_a=0$. Second, consider an action $a\in b^p(\mathbf {x}^*)$. Then $\breve{F}^p_*(\mathbf {x}^*)=0$ and thus $Q(\breve{F}^p_a(\mathbf {x}^*))=Q(0)=0$ by Assumption 3(a). With this fact, Lipschitz continuity of Q and of $\breve{F}^p_a$ in Assumptions 1 and 2(a) yields

$$\begin{aligned} |Q(\breve{F}^p_a(\mathbf {x}^t))|&= |Q(\breve{F}^p_a(\mathbf {x}^t))-Q(\breve{F}^p_a(\mathbf {x}^*))| \\&\le K_Q |\breve{F}^p_a(\mathbf {x}^t)-\breve{F}^p_a(\mathbf {x}^*)| \le K_Q \cdot 2 \bar{K}^p |\mathbf {x}^t-\mathbf {x}^*| \quad \text {for any }a\in b^p(\mathbf {x}^*). \end{aligned}$$

Hence in either case, we have

$$\begin{aligned} x^{p,t}_a|Q(\breve{F}^p_a(\mathbf {x}^t))| \le 2\bar{K}^p K_Q x^{p,t}_a|\mathbf {x}^t-\mathbf {x}^*| \quad \text {for any } p\in \mathcal {P}, a\in \mathcal {A}^p \text { and all } t\in [T,T+\tau ]. \end{aligned}$$

With $|\mathbf {y}^\cdot |,|\mathbf {e}^\cdot |\le 1$ and $\sum _a x^{p,t}_a=1 $, this implies

$$\begin{aligned} |\dot{\mathbf {x}}^t| \le \sum _{a,p} 2\bar{K}^p K_Q x^{p,t}_a|\mathbf {x}^t-\mathbf {x}^*|\cdot 2 \le K_V|\mathbf {x}^t-\mathbf {x}^*| \quad \text { for almost all } t\in [T,T+\tau ], \end{aligned}$$

where $K_V:=4K_Q \sum _{p} \bar{K}^p<\infty $. Since $\mathbf {x}^t$ is a Carathéodory solution and thus absolutely continuous, we have

$$\begin{aligned} |\mathbf {x}^s-\mathbf {x}^T| \le \int _T^s |\dot{\mathbf {x}}^t| dt \le K_V \int _T^s |\mathbf {x}^t-\mathbf {x}^*| dt \quad \text { for almost all } s\in [T,T+\tau ]. \end{aligned}$$

Then Gronwall’s inequality implies

$$\begin{aligned} |\mathbf {x}^s-\mathbf {x}^*| \le |\mathbf {x}^T-\mathbf {x}^*| \exp (K_V s)=0 \text { for all } s\in [T,T+\tau ]. \end{aligned}$$

So we have $\mathbf {x}^s=\mathbf {x}^*$ during $s\in [T,T+\tau ]$. This contradicts the hypothesis that $\mathbf {x}^t$ departs from $\mathbf {x}^*$ at time T. We therefore conclude that $\mathbf {x}^t\equiv \mathbf {x}^*$ is the only Carathéodory solution starting from $\mathbf {x}^0=\mathbf {x}^*$.$\square $

1.2 B.2: Theorem 2 (positive correlation)

Proof

(1) We begin the proof from the equality in (1). The vector $\mathbf {z}^p\in V^p_Q(\mathbf {x})$ should be represented as

$$\begin{aligned} \mathbf {z}^p=\sum _{a\in \mathcal {A}^p} x^p_a Q(\breve{F}^p_a(\mathbf {x})) (\mathbf {y}^p_a-\mathbf {e}^p_a) \end{aligned}$$

with a best response $\mathbf {y}^p_a\in B^p(\mathbf {x})$. So we have

$$\begin{aligned} \mathbf {F}^p(\mathbf {x})\cdot \mathbf {z}^p&= \mathbf {F}^p(\mathbf {x})\cdot \left[ \sum _{a\in \mathcal {A}^p} x^p_a Q(\breve{F}^p_a(\mathbf {x})) (\mathbf {y}^p_a-\mathbf {e}^p_a) \right] \nonumber \\&= \sum _{a\in \mathcal {A}^p} x^p_a Q(\breve{F}^p_a(\mathbf {x})) \left\{ \mathbf {F}^p(\mathbf {x})\cdot \mathbf {y}^p_a -\mathbf {F}^p(\mathbf {x})\cdot \mathbf {e}^p_a \right\} \\&= \sum _{a\in \mathcal {A}^p} x^p_a Q(\breve{F}^p_a(\mathbf {x})) (F^p_*(\mathbf {x})-F^p_a(\mathbf {x})) = \sum _{a\in \mathcal {A}^p} x^p_a Q(\breve{F}^p_a(\mathbf {x})) \breve{F}^p_a(\mathbf {x}) \end{aligned}$$

Since all terms in the last summation are non-negative, we have $\mathbf {F}^p(\mathbf {x})\cdot \mathbf {z}^p\ge 0$.

(2) From the above expression, we find that $\mathbf {F}^p(\mathbf {x})\cdot \mathbf {z}^p$ depends only on $\mathbf {x}$ and not on choice of $\mathbf {z}^p$ from $V^p_Q(\mathbf {x})$; furthermore, $\mathbf {F}^p(\mathbf {x})\cdot \mathbf {z}^p>0$ if and only if there exists $p\in \mathcal {P}$ and $a\in \mathcal {A}^p$ such that $x^p_a Q(\breve{F}^p_a(\mathbf {x})) \breve{F}^p_a(\mathbf {x})>0.$ Generally, this requires $x^p_a>0$ and $\breve{F}^p_a(\mathbf {x})>0$. The existence of such p and a means that $\mathbf {x}$ is not a Nash equilibrium. If Assumption 2(b) holds, the converse is true because $\breve{F}^p_a(\mathbf {x})>0$ implies $Q(\breve{F}^p_a(\mathbf {x}))>0$ by this assumption.$\square $

1.3 B.3: Theorem 3 (Nash stability)

1.3.1 B.3.1: Potential games

First, Nash stability in potential games (part 1) is a straightforward implication of positive correlation as in other dynamics.

Proof of part 1

From the definition of a potential function and the fact that $\mathbf {1}\cdot \dot{\mathbf {x}}^p=0$, Theorem 2 implies

$$\begin{aligned} \dot{f}(\mathbf {x})=\nabla f(\mathbf {x})\cdot \dot{\mathbf {x}}=\mathbf {F}(\mathbf {x})\cdot \dot{\mathbf {x}}=\sum _{p\in \mathcal {P}} \mathbf {F}^p(\mathbf {x})\cdot \dot{\mathbf {x}}^p=\sum _{p\in \mathcal {P}}\sum _{a\in \mathcal {A}^p} x^p_a Q(\breve{F}^p_a(\mathbf {x}))\breve{F}^p_a(\mathbf {x}) \ge 0; \end{aligned}$$

especially, $\dot{f}(\mathbf {x})=0$ iff $\mathbf {x}\in \text {NE}(\mathbf {F})$ under Assumption 2(b). So f is a strict Lyapunov function. Thanks to continuous differentiability of $\mathbf {F}$ in Assumption 1, the standard Lyapunov stability theorems (Sandholm 2010b, Thm. 7.B.2,4) are applicable; each local maximizer of f is Lyapunov stable and the set of stationary points, i.e., NE(F) is globally attracting.$\square $

1.3.2 B.3.1: Contractive games

Here we prove the global asymptotic stability of Nash equilibria in contractive games (part 2) in three steps. First, we verify that the function L in (4) is Lipschitz continuous both in state $\mathbf {x}$ and in time t. Then, we apply our version of the Lyapunov stability theorem (Theorem 7) to the function L and obtain the stability of Nash equilibria.

First, to prove Lipschitz continuity of the function L in (4), we should notice that

$$\begin{aligned} L(\mathbf {x})=\sum _{p\in \mathcal {P}} \max _{b\in \mathcal {A}^p} L^p_b(\mathbf {x}) \end{aligned}$$

where function $L^p_b:\mathcal {X}\rightarrow \mathbb {R}$ is given by

$$\begin{aligned} L^p_b(\mathbf {x}) := \sum _{a\in \mathcal {A}^p} x^p_a \int ^{F^p_b(\mathbf {x})-F^p_a(\mathbf {x})}_0 Q(q)dq \end{aligned}$$

for each $p\in \mathcal {P}$ and $b\in \mathcal {A}^p$. Notice that, if b is the best response action in $b^p(\mathbf {x})$, it attains the largest payoff $F^p_b(\mathbf {x})$ and thus the maximal value of $L^p_b(\mathbf {x})$ among all actions in $\mathcal {A}^p$ because $Q\ge 0$. Hence, $L(\mathbf {x})$ is the sum of the maximal values of $L^p_b(\mathbf {x})$ over all populations $p\in \mathcal {P}$.^{Footnote 36}

Under Assumption 1, $L^p_b(\mathbf {x})$ is Lipschitz continuous in $\mathbf {x}\in \mathcal {X}$ for each action $b\in \mathcal {A}^p$. Thus $L(\mathbf {x})$ is also Lipschitz continuous in $\mathbf {x}\in \mathcal {X}$. Furthermore, on a Carathéodory (and thus Lipschitz continuous) solution $\{\mathbf {x}^t\}$, $L^p_b(\mathbf {x}_t)$ is Lipschitz continuous in t. It follows that

$$\begin{aligned} \dot{L}(\mathbf {x}^t)= \sum _{p\in \mathcal {P}} \dot{L}^p_b(\mathbf {x}^t) \quad \text {for any }b\in b^p(\mathbf {x}^t)\text { and almost all }t\in \mathbb {R}_+ \end{aligned}$$

from a version of Danskin’s Envelope Theorem:

Theorem 8

(Hofbauer and Sandholm 2009: Theorem A.4) For each element z in a set Z, let $g_z:\mathbb {R}_+\rightarrow \mathbb {R}$ be Lipschitz continuous. Let

$$\begin{aligned} g_*(t)=\max _{z\in Z} g_z(t) \text { and } Z_*(t)=\arg \max _{z\in Z} g_z(t). \end{aligned}$$

Then $g_*:\mathbb {R}_+\rightarrow \mathbb {R}$ is Lipschitz continuous. Besides, for almost all $t\in \mathbb {R}_+$, we have that $\dot{g}_*(t)=\dot{g}_z(t)$ for each $z\in Z_*(t)$.

Now based on this fact, we proceed to prove that our function L is a Lyapunov function; then, the asymptotic stability of Nash equilibria is guaranteed by our version of Lyapunov stability theorem (Theorem 7).

Proof of part 2 in Theorem 3

We show the function L is a strictly decreasing Lyapunov function with $L^{-1}(0)=\text {NE}(\mathbf {F})$. First of all, since the integrand Q(q) is non-negative, the value of L is always non-negative. Besides, since Assumption 2(b) implies that the integral is zero if $\breve{F}^p_a(\mathbf {x})=0$ and positive otherwise,^{Footnote 37} we have $L^{-1}(0)=\text {NE}(\mathbf {F})$.

Consider an arbitrary Carathéodory solution $\{\mathbf {x}^t\}$ starting from point $\mathbf {x}^0\in \mathcal {X}$. For almost all time, the solution is differentiable in time and the transition vector satisfies (2a, 2b) and the time derivative of L equals $\sum _{p} \dot{L}^p_b$ for any $b\in b^p(\mathbf {x})$. Fix such a moment of time t arbitrarily and henceforth drop the time index t. The transition vector $\dot{\mathbf {x}}$ satisfies^{Footnote 38}

$$\begin{aligned} \dot{\mathbf {x}}^p = \sum _{a\in \mathcal {A}^p} x^p_a Q^p_a (\mathbf {y}^p_a-\mathbf {e}^p_a) \quad \text {with some } \mathbf {y}^p_a:=(y^p_{ab})_{b\in \mathcal {A}^p}\in B^p(\mathbf {x}) \text { for each }a\in \mathcal {A}^p. \end{aligned}$$

Since $y^p_{ab}>0$ only if $b\in b^p(\mathbf {x})$, Theorem 8 implies

$$\begin{aligned} \dot{F}^p_*= & {} \sum _{b\in \mathcal {A}^p} y^p_{ab} \dot{F}^p_b= \sum _{b\in \mathcal {A}^p} y^p_{ab} D F^p_b \dot{\mathbf {x}}={\mathbf {y}^p_a}\cdot D\mathbf {F}^p \dot{\mathbf {x}},\\ \therefore \frac{d }{d t}\breve{F}^p_a= & {} \dot{F}^p_*-\dot{F}^p_a =\left( \mathbf {y}^p_a-\mathbf {e}^p_a\right) \cdot D\mathbf {F}^p \dot{\mathbf {x}}. \end{aligned}$$

The time derivative of L at this time t is thus

$$\begin{aligned} \dot{L}&= \sum _{p\in \mathcal {P}} \dot{L}^p_b(\mathbf {x}_t) \quad \text {with any }b\in b^p(\mathbf {x}_t)\\&= \sum _{p\in \mathcal {P}}\sum _{a\in \mathcal {A}^p} {\dot{x}^p_a} \int ^{\breve{F}^p_a}_0 Q(q)dq +\sum _{p\in \mathcal {P}}\sum _{a\in \mathcal {A}^p} x^p_a {Q^p_a} \frac{d }{d t}\breve{F}^p_a \\&= -\sum _{p\in \mathcal {P}}\sum _{a\in \mathcal {A}^p} x^p_a Q^p_a \int ^{\breve{F}^p_a}_0 Q(q)dq +\sum _{p\in \mathcal {P}}\sum _{a\in \mathcal {A}^p} x^p_a Q_a^p (\mathbf {y}^p_a-\mathbf {e}^p_a)\cdot D\mathbf {F}^p {\dot{\mathbf {x}}} \\&= -\tilde{L}+ \dot{\mathbf {x}}\cdot D\mathbf {F}{\dot{\mathbf {x}}} \\&\le -\tilde{L}, \end{aligned}$$

where

$$\begin{aligned} \tilde{L}:=\sum _{p\in \mathcal {P}}\sum _{a\in \mathcal {A}^p} x^p_a Q^p_a \int ^{\breve{F}^p_a}_0 Q(q)dq. \end{aligned}$$

The facts that $a\in b^p(\mathbf {x})\ \Rightarrow \ \breve{F}^p_a(\mathbf {x})=0$ and that $a\notin b^p(\mathbf {x})\ \Rightarrow \ \dot{x}^p_a=-x^p_a Q^p_a$ yield the first term on the third line. The definition of the tBRD (2a, 2b) alone yields the second term on the fourth line; Assumption 3(a) is not needed. The last weak inequality comes from the definition of a contractive game and $\mathbf {1}\cdot {\dot{\mathbf {x}}^p}=0$.

Finally, function $\tilde{L}$ is always non-negative for the same reason as $L\ge 0$; in particular, Assumption 2(b) implies that $\tilde{L}(\mathbf {x})$ is positive when $\mathbf {x}$ is not a Nash equilibrium, and zero when $\mathbf {x}$ is a Nash equilibrium.

Therefore, function L is a strict Lyapunov function and satisfies the assumptions in Theorem 7. In conclusion, the set $L^{-1}(0)=\text {NE}(\mathbf {F})$ is asymptotically stable in the whole state space $\mathcal {X}$.$\square $

1.3.3 B.3.3: Regular ESS

Define a function $L^*:\mathcal {X}\rightarrow \mathbb {R}$ by

$$\begin{aligned} L^*(\mathbf {x})=L(\mathbf {x})+\sum _{p\in \mathcal {P}} C^p \sum _{b\in U^p} x^p_b, \end{aligned}$$

where $L:\mathcal {X}\rightarrow \mathbb {R}$ is the function given by (4), $C^p\in \mathbb {R}$ is a constant, and $U^p$ is the set defined in (12). We prove that this function $L^*$ works as a Lyapunov function for the regular ESS $\mathbf {x}^*$ when each of $C^1,\ldots ,C^P$ is sufficiently large positive.

Lemma 1

Suppose that Assumptions 1 and 2(a) hold. Let $\mathbf {x}^*\in \mathcal {X}$ be a regular ESS. Then there is a neighborhood $O\subset \mathcal {X}$ of $\mathbf {x}^*$ with constant $C^p>0$ for each population $p\in \mathcal {P}$ such that, for any $\mathbf {x}\in O,$

$$\begin{aligned} \text {(i)} \sum _{b\in U^p} \dot{x}^p_b =-\tilde{Q}^p(\mathbf {x}) \sum _{b\in U^p} \tilde{x}^p_b, \quad \text {(ii) } {\dot{\mathbf {x}}^p}\cdot D\mathbf {F}^p {\dot{\mathbf {x}}^p} \le C^p \tilde{Q}^p(\mathbf {x}) \sum _{b\in U^p} \tilde{x}^p_b. \end{aligned}$$

Proof

First, since a regular ESS is a quasi-strict equilibrium, the support of $\mathbf {x}^*$ coincides with the set of the pure best responses $b(\mathbf {x}^*)$; namely, $U^p=\mathcal {A}^p{\setminus } b^p(\mathbf {x}^*)$. Furthermore, by continuity of $\mathbf {F}^p$, there is a neighborhood $O^p\subset \mathcal {X}$ of $\mathbf {x}^*$ where any suboptimal action $b\in \mathcal {A}^p{\setminus } b^p(\mathbf {x}^*)=U^p$ at state $\mathbf {x}^*$ remains suboptimal and $D\mathbf {F}^p$ is negative definite with respect to $T\mathcal {X}\cap \mathbb {R}^A_{0U}$. As the transition of any suboptimal action b is $\dot{x}^p_b=-Q(\breve{F}^p_b(\mathbf {x})) x^p_b$, we obtain

$$\begin{aligned} \sum _{b\in U^p} \dot{x}^p_b = - \sum _{b\in U^p}Q(\breve{F}^p_b(\mathbf {x})) x^p_b = -\tilde{Q}^p(\mathbf {x}) \sum _{b\in U^p} \tilde{x}^p_b. \end{aligned}$$

(13)

It follows that

$$\begin{aligned} \sqrt{\sum _{b\in U^p} |\dot{x}^p_b|^2} \le \sum _{b\in U^p} |\dot{x}^p_b| = -\sum _{b\in U^p} \dot{x}^p_b = \tilde{Q}^p(\mathbf {x}) \sum _{b\in U^p} \tilde{x}^p_b. \end{aligned}$$

According to Sandholm (2010a, pp. 43–44), this and the local negative definiteness of $D\mathbf {F}^p$ jointly imply the existence of a positive constant $C^p>0$ such that

$$\begin{aligned} {\dot{\mathbf {x}}^p}\cdot D\mathbf {F}^p {\dot{\mathbf {x}}^p} \le C^p \sqrt{\sum _{b\in U^p} |\dot{x}^p_b|^2} \le C^p \tilde{Q}^p(\mathbf {x}) \sum _{b\in U^p} \tilde{x}^p_b \end{aligned}$$

(14)

at any point in the neighborhood $O^p$ of $\mathbf {x}^*$. Take the intersection of all $O^p$ ($p\in \mathcal {P}$) as O.$\square $

We use this constant $C^p$ to define Lyapunov function $L^*$ for regular ESS $\mathbf {x}^*$ and focus on this neighborhood O as the basin of attraction to $\mathbf {x}^*$.

Proof of part 3 in Theorem 3

According to the calculation in the proof of part 2 in Theorem 3, the time derivative of function L is

$$\begin{aligned} \dot{L}=-\tilde{L}+ \sum _{p\in \mathcal {P}} {\dot{\mathbf {x}}^p}\cdot D\mathbf {F}^p {\dot{\mathbf {x}}^p}. \end{aligned}$$

Hence we have

$$\begin{aligned} \dot{L}^*=-\tilde{L}+ \sum _{p\in \mathcal {P}} {\dot{\mathbf {x}}^p}\cdot D\mathbf {F}^p {\dot{\mathbf {x}}^p}+ \sum _{p\in \mathcal {P}} C^p \sum _{b\in U^p} \dot{x}^p_b. \end{aligned}$$

Lemma 1 implies

$$\begin{aligned} \dot{L}^*= -\tilde{L}+ \sum _{p\in \mathcal {P}} \left( {\dot{\mathbf {x}}^p}\cdot D\mathbf {F}^p {\dot{\mathbf {x}}^p} -C^p {\tilde{Q}^p} \sum _{b\in U^p} \tilde{x}^p_b \right) \le -\tilde{L} \end{aligned}$$

in the neighborhood O of $\mathbf {x}^*$. Then, Theorem 7 guarantees asymptotic stability of $\mathbf {x}^*$.

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zusai, D. Tempered best response dynamics. Int J Game Theory 47, 1–34 (2018). https://doi.org/10.1007/s00182-017-0575-9

Download citation

Accepted: 27 March 2017
Published: 04 April 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s00182-017-0575-9

Keywords

JEL Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tempered best response dynamics

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Uncertainty and the nature of the firm: From Frank Knight and Ronald Coase to an evolutionary approach

Reward Function Design in Reinforcement Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix: Stability and classes of games

1.1 A.1: Stability concepts and Lyapunov stability theorem

Definition 3

Theorem 7

Proof

1.2 A.2: Classes of games

1.2.1 Potential games

1.2.2 Contractive games

1.2.3 Regular ESS

Appendix B: The proofs

1.1 B.1: Theorem 1 (Nash stationarity)

Proof

1.2 B.2: Theorem 2 (positive correlation)

Proof

1.3 B.3: Theorem 3 (Nash stability)

1.3.1 B.3.1: Potential games

Proof of part 1

1.3.2 B.3.1: Contractive games

Theorem 8

Proof of part 2 in Theorem 3

1.3.3 B.3.3: Regular ESS

Lemma 1

Proof

Proof of part 3 in Theorem 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation