Fokker–Planck equation
Consider a group of N individuals acting according to the RPS learning rule described in Sect. 2. Let \(X^t=(X^t_1,\ldots ,X^t_N)\) represent the vector of pairs of stimuli for all the members at epoch t. Each component of this vector is 2-dimensional: \(X^t_i{}={}(S^t_{1,i},S^t_{2,i})\). By \(w_h({\bar{x}},t),\) where \(x\in [0,1]^{2N},\) we denote the PDF for the distribution of \(X^t\). We will write \({\bar{x}}=(x_1,\ldots ,x_N),\) where each \(x_i{}={}(s_{1,i},s_{2,i})\). The probability to play A will be denoted as \(\lambda _i{}={}s_{1,i}/(s_{1,i}+s_{2,i})\).
Suppose that member i and j are selected for the interaction. There will be only one game played during the period from t to \(t+\delta \). The matrix of payoffs is described in Table 1. The range of parameters \(\delta ,\,h,\,N\) will be restricted later on.
Conditioned on the event
\(X^t={\bar{x}},\) the agent probabilities for the next period are set according to the RPS rule (
4), which in the notation of the stochastic process are
$$\begin{aligned} X^{t+\delta }_i{}={}\left\{ \begin{array}{ll} ((1-\mu h)s_{1,i} + r_1\mu h +ah,\, (1-\mu h)s_{2,i}+r_2\mu h) &{} {\mathrm{Prob}}=\lambda _i\lambda _j\\ ((1-\mu h)s_{1,i} +r_1\mu h + dh,\,(1-\mu h)s_{2,i}+r_2\mu h) &{} {\mathrm{Prob}}=\lambda _i(1-\lambda _j)\\ ((1-\mu h)s_{1,i} +r_1\mu h,\,(1-\mu h)s_{2,i}+r_2\mu h + ch) &{} {\mathrm{Prob}}=(1-\lambda _i)\lambda _j\\ ((1-\mu h)s_{1,i} +r_1\mu h,\,(1-\mu h)s_{2,i} +r_2\mu h + bh) &{} {\mathrm{Prob}}=(1-\lambda _i)(1-\lambda _j) \end{array} \right. \end{aligned}$$
and symmetrically for
\(X^{t+\delta }_j\). For all other agents,
\(X^{t+\delta }_k{}={}X^t_k\) for
\(k\not =i,j\). The definition of
\(X^t\) makes it a discrete-time Markov process. We proceed by writing down the integral form of the Chapman–Kolmogorov equations and approximate its solution by a solution of the Fokker–Planck equation (forward Kolmogorov’s equation), for small values of
\(\delta , h\) and large
N.
Change of
\(w_h({\bar{x}},t)\) from
t to
\(t+\delta ,\) can be described in the following way.
$$\begin{aligned}&\int \phi ({\bar{x}})w_h({\bar{x}},t+\delta )\,d{\bar{x}}{}={}{\mathbb {E}}[\phi (X^{t+h})]\nonumber \\&\quad =\sum _{i\not =j}(N(N-1))^{-1}\int \left( \lambda _i\lambda _j\phi ({\bar{x}})\Big |_{\begin{array}{c} x_i{}={}((1-\mu h)s_{1,i} + r_1\mu h +ah,\, (1-\mu h)s_{2,i}+r_2\mu h) \\ \,\,\, x_j=((1-\mu h)s_{1,j} + r_1\mu h +ah,\, (1-\mu h)s_{2,j}+r_2\mu h) \end{array}}\right. \nonumber \\&\qquad \left. +\,\lambda _i(1-\lambda _j)\phi ({\bar{x}})\Big |_{\begin{array}{c} x_i{}={}((1-\mu h)s_{1,i} +r_1\mu h + dh,\,(1-\mu h)s_{2,i}+r_2\mu h)\\ \,\,\, x_j{}={}((1-\mu h)s_{1,j} +r_1\mu h,\,(1-\mu h)s_{2,j}+r_2\mu h + ch) \end{array}}\right. \nonumber \\&\qquad \left. +\,(1-\lambda _i)\lambda _j\phi ({\bar{x}})\Big |_{\begin{array}{c} x_i=((1-\mu h)s_{1,i} +r_1\mu h,\,(1-\mu h)s_{2,i}+r_2\mu h + ch) \\ \, \,\, x_j=((1-\mu h)s_{1,j} +r_1\mu h + dh,\,(1-\mu h)s_{2,j}+r_2\mu h) \end{array}} \right. \nonumber \\&\qquad \left. +\,(1-\lambda _i)(1-\lambda _j)\phi ({\bar{x}}) \Big |_{\begin{array}{c} x_i=((1-\mu h)s_{1,i} +r_1\mu h,\,(1-\mu h)s_{2,i} +r_2\mu h + bh) \\ \,\,\, x_j=((1-\mu h)s_{1,j} +r_1\mu h,\,(1-\mu h)s_{2,j} +r_2\mu h + bh) \end{array}}\right) w_h({\bar{x}},t)\,d{\bar{x}}.\nonumber \\ \end{aligned}$$
(12)
This equation can be written in a slightly different way:
$$\begin{aligned}&\int \phi ({\bar{x}})w_h({\bar{x}},t+\delta )\,d{\bar{x}}=\int \phi ({\bar{x}})w_h(x,t)\,d{\bar{x}}\nonumber \\&\qquad +\,\sum _{i\not =j}(N(N-1))^{-1}\nonumber \\&\qquad \int \left( \lambda _i\lambda _j\left[ \phi ({\bar{x}})\Big |_{\begin{array}{c} x_i{}={}((1-\mu h)s_{1,i} + r_1\mu h +ah,\, (1-\mu h)s_{2,i}+r_2\mu h)\\ \,\,\, x_j=((1-\mu h)s_{1,j} + r_1\mu h +ah,\, (1-\mu h)s_{2,j}+r_2\mu h) \end{array}}-\phi ({\bar{x}})\right] \right. \nonumber \\&\qquad \left. +\,\lambda _i(1-\lambda _j)\left[ \phi ({\bar{x}})\Big |_{\begin{array}{c} x_i{}={}((1-\mu h)s_{1,i} +r_1\mu h + dh,\,(1-\mu h)s_{2,i}+r_2\mu h)\\ \,\,\, x_j{}={}((1-\mu h)s_{1,j} +r_1\mu h,\,(1-\mu h)s_{2,j}+r_2\mu h + ch) \end{array}}-\phi ({\bar{x}})\right] \right. \nonumber \\&\qquad \left. +\,(1-\lambda _i)\lambda _j\left[ \phi ({\bar{x}})\Big |_{\begin{array}{c} x_i=((1-\mu h)s_{1,i} +r_1\mu h,\,(1-\mu h)s_{2,i}+r_2\mu h + ch) \\ \, \,\, x_j=((1-\mu h)s_{1,j} +r_1\mu h + dh,\,(1-\mu h)s_{2,j}+r_2\mu h) \end{array}}-\phi ({\bar{x}})\right] \right. \nonumber \\&\qquad \left. +\,(1-\lambda _i)(1-\lambda _j)\right. \nonumber \\&\qquad \left. \left[ \phi ({\bar{x}})\Big |_{\begin{array}{c} x_i=((1-\mu h)s_{1,i} +r_1\mu h,\,(1-\mu h)s_{2,i} +r_2\mu h + bh) \\ \,\,\, x_j=((1-\mu h)s_{1,j} +r_1\mu h,\,(1-\mu h)s_{2,j} +r_2\mu h + bh) \end{array}}-\phi ({\bar{x}})\right] \right) w_h({\bar{x}},t)\,d{\bar{x}}.\nonumber \\ \end{aligned}$$
(13)
The above equation can be used to obtain a 2
N-dimensional ODE approximation of the stochastic process by evaluating
\(\lim _{h\rightarrow 0}\left( {\mathbb {E}}[X^{t+h}\,|\, X^t]-X^t\right) /h{}={}F(X^t)\). This approach was implemented in the method of stochastic approximation developed by Benaim and Hirsch (
1999) and applied to the study of convergence of stochastic fictitious play processes. The method guarantees the convergence of the process
\(X^t\) under certain stability conditions for the dynamics of the associated ODE.
The large dimension of that dynamical system is an obstacle for further analysis. In contrast, we would like to obtain an equation for the distribution of a large number
N of agents in a 2-dimensional stimuli space. For this, denote the PDF of the distribution by
$$\begin{aligned} f_h(x,t){}={}\sum _k N^{-1}\int w_h({\bar{x}})\big |_{x_k=x}\,d{\bar{x}}_k,\quad x\in {\mathbb {R}}^2, \end{aligned}$$
where
\({\bar{x}}_k\) is a
\(2N-2\) dimensional vector of all coordinates, excluding
\(x_k\). In statistical physics this function is also called a one-particle distribution. In the formulas to follow we need to use a two-particle distribution function
$$\begin{aligned} g_h(x,y,t){}={}\sum _{i\not =j} (N(N-1))^{-1}\int w_h({\bar{x}})\big |_{x_i=x,\, x_j=y}\,d{\bar{x}}_{ij}, \end{aligned}$$
where
\({\bar{x}}_{ij}\) is the
\(2N-4\) dimensional vector of all coordinates excluding
\(x_i\) and
\(x_j\). Function
\(g_h\) is symmetric in (
x,
y) and is related to
\(f_h\) by the formulas
$$\begin{aligned} f_h(x,t){}={}\int g_h(x,y,t)\,dx{}={}\int g_h(x,y,t)\,dy. \end{aligned}$$
The moments of function
\(f_h\) and
\(g_h\) are computed from the moment of
\(w_h:\)$$\begin{aligned} \int \psi (x)f_h(x,t)\,dx{}={}\sum _k N^{-1}\int \psi (x_k)w_h({\bar{x}})\,d{\bar{x}}, \end{aligned}$$
and
$$\begin{aligned} \int \omega (x,y)g_h(x,y,t)\,dxdy{}={}\sum _{i\not =j} (N(N-1))^{-1}\int \omega (x_i,x_j)w_h({\bar{x}})\,d{\bar{x}}. \end{aligned}$$
This follows from the definition of these functions.
Now we use (
13) to obtain an integral equation of the change of function
\(f_h\). For that select
\(\phi ({\bar{x}}){}={}\psi (x_k),\) sum over
k and take average. We get
$$\begin{aligned}&\int \psi (x)f_h(x,t+\delta )\,dx =\int \psi (x)f_h(x,t)\,dx +N^{-1}\sum _{i\not =j}(N(N-1))^{-1}\nonumber \\&\qquad \int \left( \lambda _i\lambda _j[\psi ((1-\mu h)s_{1,i} + r_1\mu h +ah,\, (1-\mu h)s_{2,i}+r_2\mu h) - \psi (x_i) \right. \nonumber \\&\qquad \left. +\,\psi ((1-\mu h)s_{1,j} + r_1\mu h +ah,\, (1-\mu h)s_{2,j}+r_2\mu h)-\psi (x_j)]\right. \nonumber \\&\qquad \left. +\,\lambda _i(1-\lambda _j)[\psi ((1-\mu h)s_{1,i} +r_1\mu h + dh,\,\right. \nonumber \\&\qquad \left. (1-\mu h)s_{2,i}+r_2\mu h)-\psi (x_i)\right. \nonumber \\&\qquad \left. +\,\psi ((1-\mu h)s_{1,j} +r_1\mu h,\,(1-\mu h)s_{2,j}+r_2\mu h + ch) - \psi (x_j)] \right. \nonumber \\&\qquad \left. +\,(1-\lambda _i)\lambda _j[ \psi ((1-\mu h)s_{1,i} +r_1\mu h,\, \right. \nonumber \\&\qquad \left. (1-\mu h)s_{2,i}+r_2\mu h + ch)-\psi (x_i) \right. \nonumber \\&\qquad \left. +\,\psi ((1-\mu h)s_{1,j} +r_1\mu h + dh,\,\right. \nonumber \\&\qquad \left. (1-\mu h)s_{2,j}+r_2\mu h) - \psi (x_j)] \right. \nonumber \\&\qquad \left. +\,(1-\lambda _i)(1-\lambda _j)[\psi ((1-\mu h)s_{1,i} +r_1\mu h,\, \right. \nonumber \\&\qquad \left. (1-\mu h)s_{2,i} +r_2\mu h + bh)-\psi (x_i) \right. \nonumber \\&\qquad \left. +\,\psi ((1-\mu h)s_{1,j} +r_1\mu h,\, \right. \nonumber \\&\qquad \left. (1-\mu h)s_{2,j} +r_2\mu h + bh) - \psi (x_j)] \right) w_h({\bar{x}},t)\,d{\bar{x}}. \end{aligned}$$
(14)
The right-hand side can be conveniently expressed in terms of a two-particle function
\(g_h:\)$$\begin{aligned}&\int \psi (x)f_h(x,t+\delta )\,dx{}={}\int \psi (x)f_h(x,t)\,dx\nonumber \\&\qquad +\,2N^{-1}\int \left( \lambda (x)\lambda (y)[\psi ((1-\mu h)s^x_1 + r_1\mu h +ah,\, \right. \nonumber \\&\qquad \left. (1-\mu h)s^x_2+r_2\mu h) - \psi (x) \right. \nonumber \\&\qquad \left. +\,\psi ((1-\mu h)s^y_1 + r_1\mu h +ah,\, \right. \nonumber \\&\qquad \left. (1-\mu h)s^y_2+r_2\mu h)-\psi (y)]\right. \nonumber \\&\qquad \left. +\,\lambda (x)(1-\lambda (y))[\psi ((1-\mu h)s^x_1 +r_1\mu h + dh,\, \right. \nonumber \\&\qquad \left. (1-\mu h)s^x_2+r_2\mu h)-\psi (x)\right. \nonumber \\&\qquad \left. +\,\psi ((1-\mu h)s^y_1 +r_1\mu h,\, \right. \nonumber \\&\qquad \left. (1-\mu h)s^y_2+r_2\mu h + ch) - \psi (y)] \right. \nonumber \\&\qquad \left. +\,(1-\lambda (x))\lambda (y)[ \psi ((1-\mu h)s^x_1 +r_1\mu h,\,\right. \nonumber \\&\qquad \left. (1-\mu h)s^x_2+r_2\mu h + ch)-\psi (x) \right. \nonumber \\&\qquad \left. +\,\psi ((1-\mu h)s^y_1 +r_1\mu h + dh,\, \right. \nonumber \\&\qquad \left. (1-\mu h)s^y_2+r_2\mu h) - \psi (y)] \right. \nonumber \\&\qquad \left. +\,(1-\lambda (x))(1-\lambda (y))[\psi ((1-\mu h)s^x_1 +r_1\mu h,\, \right. \nonumber \\&\qquad \left. (1-\mu h)s^x_2 +r_2\mu h + bh)-\psi (x) \right. \nonumber \\&\qquad \left. +\,\psi ((1-\mu h)s^y_1 +r_1\mu h,\, \right. \nonumber \\&\qquad \left. (1-\mu h)s^y_2 +r_2\mu h + bh) - \psi (y)] \right) g_h(x,y,t)\,dxdy. \end{aligned}$$
(15)
where
\(x=(s^x_1,s^x_2),\)\(y=(s^y_1,s^y_2),\) and
\(\lambda (x){}={}s^x_1/(s^x_1+s^x_2),\) and similar for
\(\lambda (y)\). In the processes with large number of agents and random binary interactions, the two-particle distribution function can be factored into two independent distributions:
$$\begin{aligned} g_h(x,y,t){}={}f_h(x,t)f_h(y,t). \end{aligned}$$
With this relation, (
15) becomes a family of non-linear integral relations for the next time step distribution
\(f_h(x,t+\delta )\). Taking the Taylor expansions up to the first order for the increment of the test function
\(\psi ,\) we obtain integral equations:
$$\begin{aligned}&\frac{N}{2h}\int \psi (x)(f_h(x,t+\delta )-f_h(x,t))\,dx\\&\quad =2{\bar{m}}_h(t)\int \left( \lambda (x)(a + \mu (r_1-s_1))\partial _{s_1}\psi (x)\right. \\&\qquad \left. +\, \lambda (x)\mu (r_2-s_2)\partial _{s_2}\psi (x)\right) f_h(x,t)\,dx \\&\qquad +\,2(1-{\bar{m}}_h(t))\int \left( \lambda (x)(d + \mu (r_1-s_1))\partial _{s_1}\psi (x)\right. \\&\qquad \left. +\, \lambda (x)\mu (r_2-s_2)\partial _{s_2}\psi (x)\right) f_h(x,t)\,dx \\&\qquad +\,2{\bar{m}}_h(t)\int \left( (1-\lambda (x))(\mu (r_1-s_1))\partial _{s_1}\psi (x) \right. \\&\qquad \left. +\, (1-\lambda (x))(c+\mu (r_2-s_2))\partial _{s_2}\psi (x)\right) f_h(x,t)\,dx \\&\qquad +\,2(1-{\bar{m}}_h(t))\int \left( (1-\lambda (x))(\mu (r_1-s_1))\partial _{s_1}\psi (x)\right. \\&\qquad \left. +\, (1-\lambda (x))(b+\mu (r_2-s_2))\partial _{s_2}\psi (x)\right) f_h(x,t)\,dx \end{aligned}$$
where
\({\bar{m}}_h(t){}={}\int \lambda (x)f_h(x,t)\,dx\).
Combining various terms on the right-hand side of the equation we get
$$\begin{aligned}&\frac{N\delta }{4h}\int \psi (x)\left( \frac{f_h(x,t+\delta )-f_h(x,t)}{\delta }\right) \,dx\\&\quad =\int \left( v_1\partial _{s_1}\psi +v_2\partial _{s_2}\psi \right) f_h(x,t)\,dx, \end{aligned}$$
with
$$\begin{aligned} v_1= & {} \lambda (x)(a{\bar{m}}_h(t)+ (1-{\bar{m}}_h(t))d) + \mu (r_1-s_1),\\ v_2= & {} (1-\lambda (x))(c{\bar{m}}_h(t)+ (1-{\bar{m}}_h(t))b) + \mu (r_2-s_2). \end{aligned}$$
By integrating the right-hand side by parts, and assuming that
\(h,\delta \) are small and
N is large, in such a way that
\(\frac{4h}{\delta N}= 1,\) we obtain the integral relation:
$$\begin{aligned} \int \psi (x)\left( \frac{\partial f}{\partial t}{}+{}\text{ div }(u(x,t)f)\right) \,dx{}={}0. \end{aligned}$$
Since the test function
\(\psi \) is an arbitrary, the Fokker–Planck equation follows:
$$\begin{aligned} \frac{\partial f}{\partial t}{}+{}\text{ div }(u(x,t)f){}={}0, \end{aligned}$$
(16)
where
\(x=(s_1,s_2),\,s_1,s_2>0\) and the drift velocity is given by the formula
$$\begin{aligned} u(x,t){}={}\frac{1}{s_1+s_2}\left[ \begin{array}{l} \left( a{\bar{m}}(t){}+{}d(1-{\bar{m}}(t)\right) s_1 + \mu (r_1-s_1) ( s_1+s_2) \\ \left( c{\bar{m}}(t){}+{}b(1-{\bar{m}}(t)\right) s_2 +\mu (r_2-s_2) (s_1 +s_2) \end{array} \right] , \end{aligned}$$
with
$$\begin{aligned} {\bar{m}}(t){}={}\int \frac{s_1}{s_1+s_2}f(x,t)\,dx. \end{aligned}$$
(17)
Equation (
8) is obtained from (
16) by multiplying it by
\(\lambda (x){}={}s_1/(s_1+s_2),\) and integrating by parts:
$$\begin{aligned}&\frac{d}{dt}\int \frac{s_1}{s_1+s_2} f(x,t)\,dx\\&\quad = \int \left( u_1(x,t)\frac{s_2}{(s_1+s_2)^2}-u_2(x,t)\frac{s_1}{(s_1+s_2)^2}\right) f(x,t)\,dx. \end{aligned}$$
Here we are assuming that the support of
f is contained in the interior of the first quadrant, so that
f is zero on the boundary. That is to say that all agents play mixed strategies. This is a natural hypothesis, as nothing else can be learned if an agent chooses an action with certainty.
Substituting formulas for \(u_1\) and \(u_2\) from the previous page (with \(\mu =0\)) we obtain (8).
Consider now learning from playing a symmetric
\(n\times n\) game. Let the payoff to playing the
ith action against the
jth be equal to
\( a_{ij}h\). Denote the stimulus vector
\(x=(s_1,\ldots ,s_n),\) and the population average probability to play
i by
$$\begin{aligned} {\bar{m}}_i(t){}={}\int \frac{s_i}{\sum _j s_j}f(x,t)\,dx,\quad i=1,\ldots ,n. \end{aligned}$$
The first-order approximation of the RPS learning process is given by the Fokker–Planck equation
$$\begin{aligned} \frac{\partial f}{\partial t} {}+{}\text{ div }(u(x,t)f){}={}0, \end{aligned}$$
on the domain with
\(s_i\ge 0,\)\(i=1,\ldots ,n\). In this equation, the velocity vector
\(u=(u_1,\ldots ,u_n)\) is given by its components
$$\begin{aligned} u_i(x,t){}={}\frac{1}{\sum _j s_j}\left[ \left( \sum _j a_{ij}{\bar{m}}_j(t)\right) s_i + \mu (r_i-s_i) \sum _j s_j\right] ,\quad i=1,\ldots ,n. \end{aligned}$$
Now we consider in some detail learning in
\(2 \times 2\) games. Much of the analysis of Eq. (
16) is derived from the behavior of the trajectories of ODE (
7). The solution
f of (
16) is obtained by transporting the support of
\(f_0\) along trajectories of the dynamical system (
7) and changing the values
\(f_0\) so that the “mass” (measured by the density function
f) of any “fluid element” remains constant. In fact on can write the formula for
f in terms of
u and prove that the solution
f of the non-linear problem exists and is unique. This can be done by standard methods of PDEs, but it is outside of the scope of the present paper. Here, we will be interested in the long time, qualitative asymptotic for
f(
x,
t).
Equation (16) is considered in the first quadrant \(s_1,s_2>0\). For \(\mu =0,\) the boundary of the domain is invariant under the flow of (7). For the model with fading memory, \(\mu >0,\) the velocity u, at the boundary, is directed into the flow domain. In either case, we will assume that the function \(f_0(x)\) is zero on the boundary. Then, this property will hold for all \(t>0\). Additionally, in all of the analysis below, we assume that \(f_0\) is a continuously differentiable function with compact support (zero outside some bounded set).
Consider the case of the pure Nash equilibrium (
\(a>c,d>b\)) and no memory effects,
\(\mu =0\). The velocity is
$$\begin{aligned} u(x,t){}={}\frac{1}{s_1+s_2}M(t)x, \end{aligned}$$
where matrix
$$\begin{aligned} M(t){}={}\left[ \begin{array}{cc} a{\bar{m}}(t){}+{}d(1-{\bar{m}}(t)) &{} 0\\ 0 &{} c{\bar{m}}(t){}+{}b(1-{\bar{m}}(t)) \end{array} \right] . \end{aligned}$$
For any
t, the origin is an unstable node with two positive eigenvalues; the eigenvalue corresponding to the
\(s_1\)–direction is the dominant one. From the phase portrait of the ODE it is clear that the flow transports the support of
f into the region where
\(s_1\gg s_2,\) which correspond to the case of all agents asymptotically in
t adopting choice C in the game.
In the case of the mixed Nash equilibrium (
\(a<c,d>b\)) and no memory effect,
\(\mu =0,\) the origin is an unstable node. When
\({\bar{m}}(t)=m_*\) then two positive eigenvalues coincide, and all trajectories are straight lines through the origin. In general, however,
\({\bar{m}}(t)\not =m_*,\) if for example they are not equal at time
\(t=0\). In such cases Eq. (
9) can be used to show that
\(\lim _{t\rightarrow \infty }{\bar{m}}(t)=m_*. \) Let
\({\bar{m}}(0) > m_*\). Then, according to (
9),
\({\bar{m}}(0)>{\bar{m}}(t)>m_*\) for all
t, and
\({\bar{m}}(t)\) converges to
\(m_*\) provided that
$$\begin{aligned} \int _0^{\infty }\int \frac{s_1s_2}{(s_1+s_2)^{3}}f(x,t)\,dx dt \end{aligned}$$
(18)
diverges. Notice also, that the derivative of ratio
\(s_2/s_1\) along a flow trajectory equals
$$\begin{aligned} \frac{d}{dt}\left( \frac{s_2}{s_1}\right) {}={}(c-a+d-b)({\bar{m}}(t)-m_*)\frac{s_1s_2}{s_1+s_2}>0. \end{aligned}$$
Thus, if at
\(t=0\) the support of
\(f_0(\cdot )\) is strictly inside the first quadrant, then there is a constant
\(c>0\) such that
\(s_2/s_1>c\) for all points in the support of
\(f(\cdot ,t)\) for all later times. In particular, one can estimate
$$\begin{aligned} \int \frac{s_1s_2}{(s_1+s_2)^3}f(x,t)\,dx= & {} \int \frac{s_2/s_1}{(1 + s_2/s_1)(s_1+s_2)}\frac{s_1}{s_1+s_2}f(x,t)\,dx\nonumber \\> & {} \sup _{(s_1,s_2)\in \mathrm{supp} f(\cdot ,t)}\frac{c}{(1+c)(s_1+s_2)}{\bar{m}}(t). \end{aligned}$$
(19)
Finally,since
u(
x,
t) is uniformly bounded, i.e., for any (
x,
t),
\(|u(x,t)|< C,\) for some
C, then for any
x in the support of
\(f(\cdot ,t)\) there a constant
\({\hat{C}}\) such that
\(|x|<{\hat{C}}(1+t)\). From this and (
19) it follows that
$$\begin{aligned} \int \frac{s_1s_2}{(s_1+s_2)^3}f(x,t)\,dx>Cm_*(1+t)^{-1}, \end{aligned}$$
for some constant
\(C>0,\) and so, the integral in (
18) is infinite. The case
\({\bar{m}}(0)<m_*\) follows by similar arguments.
Consider now the model with memory decay when
\(\mu >0\) and residuals
\(r_1,r_2>0\). For any value of
\({\bar{m}}(t)\in [0,1]\) and any set of positive parameters of the game
\(a,b,c,d>0\) the right-hand side of (
7) has a steady state
\((s^0_1(t),s^0_2(t))\) in the interior of the first quadrant, with
\(s^0_1>r_1,\)\(s^0_2>r_2,\) and this point is an asymptotically stable node. The other steady state is the origin, which is an unstable node. The support of
f(
x,
t) moves in the direction of the stable interior point, contracting in size. When it becomes sufficiently small, the dynamics can be effectively approximated by ODE:
\(\frac{d(s_1,s_2)}{dt}{}={}({\bar{u}}_1,\,{\bar{u}}_2)\) where the new velocity is given by
$$\begin{aligned} {\bar{u}}_1= & {} \frac{a(s_1)^2}{(s_1+s_2)^2}{}+{} \frac{ds_1s_2}{(s_1+s_2)^2}{}+{}\mu (r_1-s_1),\\ {\bar{u}}_2= & {} \frac{cs_1s_2}{(s_1+s_2)^2}{}+{} \frac{b(s_2)^2}{(s_1+s_2)^2}{}+{}\mu (r_2-s_2). \end{aligned}$$
In the long run the fixed point will settle at the stable, interior state
\(x^*=(s^*_1,s^*_2),\) and
f(
x,
t) will be a delta-function supported at that point. In this process the agents learn to play A with probability
\(m_*{}={}s^*_1/(s^*_1+s^*_2)\).
A special case of zero residuals
\(r_1=r_2=0\) deserves a discussion. In the limit of zero residuals
\(r_1,r_2\rightarrow 0,\) velocity
u (for any fixed
\(t>0\)) has three fixed points:
\(x_0(t)=((a{\bar{m}}(t)+d(1-{\bar{m}}(t))/\mu ,0),\)\(x_1=(0,(c{\bar{m}}(t)+b(1-{\bar{m}}(t))/\mu )\) and (0, 0). When
\(a>c,d>b,\) the first, corresponding to the strategy “always A”, is an asymptotically stable node, the second, corresponding to “always B”, is a saddle, and the origin is an unstable node. One can compute that on any trajectory of the velocity field
u(
x,
t) inside the first quadrant,
$$\begin{aligned} \frac{d}{dt}\left( \frac{s_1}{s_2}\right) >0. \end{aligned}$$
Thus, the population average probability to play A,
\({\bar{m}}(t),\) increases to 1, the stable stationary point
\(x_0(t)\) converges to
\(x_0=(a/mu,0),\) and the support of
f(
x,
t) moves toward point
\(x_0\). The agents with memory decay and zero residual levels do learn the optimal strategy. Moreover, because learning occurs in the bounded region of the stimuli space, convergence to the equilibrium is faster than the case of learning with perfect memory
\(\mu =0\). Using Eq. (
11) we can also estimate the rate of convergence as a function of
\(\mu \). The rate is proportional to
\(\mu \) implying that the characteristic time of convergence is
\(C/\mu \). Figure
1 shows the simulations of the stochastic process in this regime for different values of
\(\mu \). It shows that the prediction based on the model (
16) is in good agreement with the stochastic learning process. We conclude that models with low residuals and high
\(\mu \) perform better in learning the optimal strategy in
\(2 \times 2\) games.