Boltzmann and Fokker–Planck Equations Modelling the Elo Rating System with Learning Effects
Abstract
In this paper, we propose and study a new kinetic rating model for a large number of players, which is motivated by the wellknown Elo rating system. Each player is characterised by an intrinsic strength and a rating, which are both updated after each game. We state and analyse the respective Boltzmanntype equation and derive the corresponding nonlinear, nonlocal Fokker–Planck equation. We investigate the existence of solutions to the Fokker–Planck equation and discuss their behaviour in the long time limit. Furthermore, we illustrate the dynamics of the Boltzmann and Fokker–Planck equation with various numerical experiments.
Keywords
Elo rating Learning Kinetic model FokkerPlanck equation Existence of weak solutions Asymptotic behaviourMathematics Subject Classification
35Q91 35Q20 35Q84 35K65 35B40 91G601 Introduction
In this paper, we propose a more general approach to describe how a player’s strength changes in encounters. We assume that individuals benefit from every game and increase their strength because of these interactions. However, the extent of the benefit depends on several factors—first, players with a lower rating benefit more. Second, the stronger the opponent, the more a win pushes the intrinsic strength. Furthermore, the individual performance changes due to small fluctuations, accounting for variations in the mental strength or personal fitness on a day. Based on the microscopic interaction laws, we derive the corresponding kinetic Boltzmanntype and limiting Fokker–Planck equations and analyse their behaviour. In the case of no diffusion, we can show that the strength and ratings of the appropriately shifted PDE converge, while we observe the formation of nonmeasure valued steady states in the case of diffusion. We illustrate our analytic results with numerical simulations of the kinetic as well as the limiting Fokker–Planck equation. The simulations give important insights into the dynamics, especially in situations where we are not able to prove rigorous results. The proposed interaction laws are a first step to develop and analyse more complicated rating models with dynamic strength. The next developments of the model should include losses in the player’s strength to ensure that the strength stays within certain bounds.
The kinetic description of the Elo rating system allowed Junca and Jabin to analyse the qualitative behaviour of solutions. In the last decades, kinetic models have been used successfully to describe the behaviour of large multiagent systems in socioeconomic applications. In all these applications, interactions among individuals are modelled as ‘collisions’, in which agents exchange goods (Delitala and Lorenzi 2014; Düring et al. 2017; Burger et al. 2013), wealth (Düring and Toscani 2007; Düring et al. 2008; Bellomo et al. 2013; Degond et al. 2014), opinion (Toscani 2006; Boudin et al. 2010; Düring et al. 2009; Motsch and Tadmor 2014; Albi et al. 2014; Düring and Wolfram 2015) or knowledge (Pareschi and Toscani 2014; Burger et al. 2016). For a general overview on interacting multiagent systems and kinetic equations, we refer to the book of Pareschi and Toscani (2013).
This paper is organised as follows. We introduce a generalisation of the kinetic Elo model with variable intrinsic strength due to learning in Sect. 2. In Sect. 3, we derive the corresponding Fokker–Plancktype equation as the quasiinvariant limit of the Boltzmanntype model. Convergence towards steady states of a suitable shifted Fokker–Planck model is analysed in Sect. 4. We conclude by presenting various numerical simulations of the Boltzmann and the Fokker–Plancktype equation in Sect. 5.
2 An Elo Model with Learning
In this section, we introduce an Elo model, in which the rating and the intrinsic strength of the players change in time. The dynamics are driven by similar microscopic binary interactions as in the original model by Jabin and Junca (2015) and Krupp (2016). We state the specific microscopic interaction rules in each encounter and derive the corresponding limiting Fokker–Planck equation.
2.1 Kinetic model
The interaction rules are motivated by the following considerations: player ratings change with the outcome of each game [as in the original model (1) proposed by Jabin and Junca (2015)]. The random variable \(S_{ij}\) corresponds to the score of the match and depends on the difference in strength of the two players. We assume that \( S_{ij}\) takes the values \(\pm 1\) with an expectation \(\langle S_{ij}\rangle = b({\rho _i}{\rho _j})\). Note that one could also assume that \(S_{ij}\) is continuous, for example \(S_{ij}\in [1,+1]\). The constant parameter \(\gamma >0\) controls the speed of adjustment.
The variables \(\eta \) and \(\tilde{\eta }\) are independent identically distributed random variables with mean zero and variance \({\sigma ^2}\) which model small fluctuations due to daylinked performance in the mental strength or personal fitness.
The proposed interaction rules are a first step towards a more realistic modelling. Alternative learning mechanisms, such as the one proposed in the context of knowledge exchange in a large society (see Burger et al. (2016)) could be considered in the future. Here the individual with the lower knowledge level assumes the higher level after the interaction, while the stronger one did not gain anything in the encounter. Hence, the overall knowledge level is bounded by the maximum initial knowledge level for all times and the distribution of individuals converges to a Delta Dirac at that point. We expect a similar dynamics, if we were to apply that rule instead of (6). Developing learning mechanisms, which combine limitations of individual learning with the continuous evolution of the collective knowledge, will be an important aspect of future research developments.
 (\({\mathcal {A}}1\))

Let \(\Omega = \mathbb {R}^2\) or a bounded Lipschitz domain \(\Omega \subset \mathbb {R}^2\).
 (\({\mathcal {A}}2\))
 Let \(f_0 \in H^1(\Omega )\) with \(f_0 \ge 0\) and compact support. Furthermore we assume that it has mean value zero, and bounded moments up to order two. Hence$$\begin{aligned} \int _{\Omega } f_0(\rho ,R) \,\mathrm{d}\rho \mathrm{d}R =1, \ \int _{\Omega } R f_0(\rho ,R) \,\mathrm{d}\rho \mathrm{d}R = 0,\! \text { and }\! \int _{\Omega } \rho f_0(\rho ,R) \,\mathrm{d}\rho \mathrm{d}R = 0. \end{aligned}$$
 (\({\mathcal {A}}3\))

The random variables \(\eta , \tilde{\eta }\) in (6) have the same distribution, zero mean, \(\langle \eta \rangle =0\), and variance \(\sigma _{\eta }^2\).
 (\({\mathcal {A}}4\))

Let the interaction rate function \(w\ge 0\) be an even function with \(w \in C^2(\Omega ) \cap L^{\infty }(\Omega )\).
The kinetic Elo model can be formulated on the whole space as well as on a bounded domain. In reality, the Elo ratings of top chess players vary between 2000 and 3000, which provides evidence for the assumption of a bounded domain \(\Omega \). However, sometimes it is easier to study the dynamics of models on the whole space, i.e. without boundary effects. We will generally work on the bounded domain, and clearly state where we deviate from this assumption, e.g. when we study the asymptotic behaviour of moments. The second assumption states the necessary regularity assumptions on the initial data, which we shall use in the analysis of the moments and the existence proof.
2.2 Analysis of the moments
We start by studying basic properties of the Boltzmanntype equation (11) such as mass conservation and the evolution of the first and second moments with respect to the strength and the ratings. Throughout this section we consider the problem in the whole space.
Moments with Respect to the Strength
3 The Fokker–Planck Limit
3.1 Qualitative Properties of the Fokker–Planck Equation
We continue by discussing qualitative properties of the Fokker–Planck equation (16). We shall see that several properties, which we observed for the Boltzmanntype equation (11), can be transferred.
3.2 Analysis of the Fokker–Planck Equation
In the section, we prove existence of weak solutions to (21). The main result reads as follows.
Theorem 1
Let (\({\mathcal {A}}1\)) be satisfied, \(g_0 \in H^1({\Omega })\) and \(0\le g_0 \le M_0\) for some \(M_0>0\) and assume \(h_1\), \(\langle h_2\rangle \), b\(\in L^\infty ({\Omega })\cap C^2({\Omega })\). Then, there exists a weak solution \(g\in L^2(0,T; H^1({\Omega }))\cap H^1(0,T; H^{1}({\Omega }))\) to (21a)–(21c), satisfying \(0\le g \le M_0 e^{\lambda t}\) for all \((\rho ,R) \in {\Omega }\), \(t>0\), with a constant \(\lambda >0\) depending on the functions \(h_1, \langle h_2\rangle , b\) and w.
The presented existence proof was adapted from a similar argument for a nonlinear Fokker–Planck equation describing the dynamics of agents in an economic market (see Düring et al. (2017)). However, Eq. (21a) has an additional nonlinearity in the derivative w.r.t. the rating R. We divide the proof in several steps for the ease of presentation. In Step 0, we regularise the nonlinear Fokker–Planck equation (21a) by adding a Laplace operator with small diffusivity \(\mu \ge 0\). We linearise the equation in Step 1 and show existence of a unique solution for this problem. In Step 2, we derive the necessary \(L^{\infty }\) estimates to use Leray–Schauder’s fixedpoint theorem and show existence of solutions to the nonlinear regularised problem. In Step 3, we present additional \(H^1\) estimates, which allow us to pass to the limit \(\mu \rightarrow 0\) in Step 4.
Proof
Step 2: uniform\(L^\infty \)bound and existence of a fixed point. We start by proving upper and lower bounds for the function \(g_{\mu }\). Let \(g_{\mu }\) be a fixed point of \(V(\cdot ,\theta )\), i.e. \(g_{\mu }\) solves (26) with \(\tilde{g}=g_{\mu }\), and \(\theta \in [0,1]\).
4 LongTime Behaviour of Ratings and Strength
In this section, we study possible steady states of the proposed Elo model and discuss the convergence of the ratings to the strength. We recall that Jabin and Junca (2015) showed that the ratings of players converge to their intrinsic strength in the case \(w=1\). This corresponds to the concentration of mass along the diagonal. In our model, the intrinsic strength is continuously increasing in time. Hence, to be able to identify steady states, we consider the shifted Fokker–Planck equation (21a). Throughout this section, we consider the problem in the whole space.
Since the diffusion part in (21a) is singular, the equation is degenerate parabolic. Degenerate Fokker–Planck equations frequently, despite their lack of coercivity, exhibit exponential convergence to equilibrium, a behaviour which has been referred to by Villani as hypocoercivity in Villani (2009). For subsequent research on hypercoercity in linear Fokker–Planck equations, see (Arnold and Erb 2014; Achleitner et al. 2015). Since (21a) is a nonlinear, nonlocal Fokker–Planck equation these results do not apply here, but it is conceivable that generalisations of this approach can be used in studying the decay to equilibrium for (21a), which is, however, beyond the scope of the present paper. In the following, we present some results on the longterm behaviour of solutions to (21a).
5 Numerical Simulations
In this section, we discuss the numerical discretisation of the Boltzmann equation (11) and the shifted Fokker–Planck equation (21a). We initialise the distribution of players with respect to their strength and rating with values from the unit interval and consider appropriately shifted interaction rules to ensure that the distribution remains inside the unit square for all times \(t > 0\).
5.1 Monte Carlo Simulations of the Boltzmann Equation
In each simulation, we consider \(N = 5000\) players and compute the steadystate distribution by performing \(10^8\) time steps. The result is then averaged over another \(10^5\) time steps. We perform \(M=10\) realisations and compute the density from the averaged steady states.
5.2 Finite Volume Discretisation and Simulations of the Nonlinear Fokker–Planck Equation
 (\({\mathcal {S}}_1\)) Interaction step in the strength variable R:subject to the initial condition \(g^*(\rho ,R,t) = \tilde{g}(\rho ,R,t)\). Note that we compute the interaction integrals using \(\tilde{g}\), which corresponds to the solution at the previous time step in the full splitting scheme.$$\begin{aligned} \frac{\partial g^*}{\partial t}(\rho , R, t) = \frac{\partial }{\partial \rho } (c[\tilde{g}] g^*(\rho , R, t)) + \frac{\sigma ^2}{2} d[\tilde{g}]\frac{\partial ^2}{\partial \rho ^2} ( g^*(\rho , R, t)) \end{aligned}$$
 (\({\mathcal {S}}_2\)) Interaction step in the rating variable \(\rho \):$$\begin{aligned} \frac{\partial g^\diamond }{\partial t}(\rho , R, t) = \frac{\partial }{\partial R} (a[g^*] g^\diamond (\rho , R, t)) \end{aligned}$$
5.3 Computational Experiments
5.3.1 AllPlayAll Tournaments
5.3.2 Competitions of Players with Similar Ratings
Assigning initial ratings to players in the Elo rating is a delicate issue, since inaccurate initial ratings may influence the ability of the rating to converge to a ‘good’ rating of players reflecting their intrinsic strengths. We show the difficulties in this case by studying the dynamics if players with close ratings compete.
We set the interaction rate function to (45)—hence, individuals only play against each other, if the difference between their ratings is small. We consider two groups of players with different strength and rating levels as initial distribution. The first group is underrated, that is all players have rating \(R = 0.2\), but their strength is distributed as \(\rho \in {\mathcal {N}}(0.75, 0.1)\). The second group is overrated, with rating \(R=0.9\) and a uniform distribution in strength. We use this initial configuration in two computational experiments.
In the first, we choose the learning parameters \(\alpha =0.1\) and \(\beta =0\). We see that the two groups remain separated due to their different ratings in this case (see Fig. 4). However, players compete within their own group and since \(\beta =0\) the overall rating improves. In the overrated group the strongest players accumulate at the highest possible rating, while the underrated group forms a diagonal pattern. Here, the underrated players evolve to the maximum possible rating level.
In the second experiment, using the same initial configuration, but \(\alpha =0.1\) and \(\beta = 0.05\) the steadystate profile looks totally different. In this setting, stronger players loose strength, when loosing against a weaker opponent. Therefore, the ratings of the overrated group decrease, while the ratings of the underrated group increase. After a while the two groups merge, accumulating on a diagonal which underestimates the intrinsic strength of players by approximately 0.1 (see Fig. 5).
5.3.3 Foul Play
Notes
Acknowledgements
The authors thank Martin Burger for the useful discussion during the Warwick EPSRC symposium workshop on ‘Emerging PDE models in socioeconomic sciences’. The authors are grateful to two anonymous referees for constructive comments and remarks.
Funding
BD has been supported by the Leverhulme Trust research project grant ‘Novel discretisations for higherorder nonlinear PDE’ (RPG201569). Part of this research was carried out during a threemonth visit of the second author to the University of Sussex, enabled through financial support by the University of Pavia. MTW acknowledges partial support from the Austrian Academy of Sciences via the New Frontier’s Grant NST 0001 and the EPSRC by the Grant EP/P01240X/1.
Compliance with Ethical Standards
Conflict of interest
The authors declare that they have no conflict of interest.
References
 Albi, G., Pareschi, L., Zanella, M.: Boltzmanntype control of opinion consensus through leaders. Phil. Trans. R. Soc. A 372, 20140138 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 Arnold, A., Erb, J.: Sharp entropy decay for hypocoercive and nonsymmetric Fokker–Planck equations with linear drift. arXiv preprint arXiv:1409.5425 (2014)
 Achleitner, F., Arnold, A., Stürzer, D.: Largetime behavior in nonsymmetric Fokker–Planck equations. Riv. Mat. Univ. Parma 6, 1–68 (2015)MathSciNetzbMATHGoogle Scholar
 Bellomo, N., Herrero, M.A., Tosin, A.: On the dynamics of social conflicts: looking for the Black Swan. Kinet. Relat. Models 6(3), 459–479 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 Boudin, L., Monaco, R., Salvarani, F.: Kinetic model for multidimensional opinion formation. Phys. Rev. E 81, 036109 (2010)MathSciNetCrossRefGoogle Scholar
 Burger, M., Caffarelli, L., Markowich, P.A., Wolfram, M.T.: On a Boltzmanntype price formation model. Proc. R. Soc. A. 469(2157), 20130126 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 Burger, M., Lorz, A., Wolfram, M.T.: On a Boltzmann mean field model for knowledge growth. SIAM J. Appl. Math. 76(5), 1799–1818 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
 Cercignani, C.: The Boltzmann Equation and Its Applications. Springer Series in Applied Mathematical Sciences, vol. 67. Springer, New York (1988)CrossRefGoogle Scholar
 Cercignani, C., Illner, R., Pulvirenti, M.: The Mathematical Theory of Dilute Gases. Springer Series in Applied Mathematical Sciences, vol. 106. Springer, New York (1994)CrossRefzbMATHGoogle Scholar
 Cordier, S., Pareschi, L., Piatecki, C.: Mesoscopic modelling of financial markets. J. Stat. Phys. 134(1), 161–184 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
 Degond, P., Liu, J.G., Ringhofer, C.: Evolution of wealth in a nonconservative economy driven by local Nash equilibria. Phil. Trans. R. Soc. A 372, 20130394 (2014)CrossRefzbMATHGoogle Scholar
 Delitala, M., Lorenzi, T.: A mathematical model for value estimation with public information and herding. Kinet. Relat. Models 7, 29–44 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 Düring, B., Toscani, G.: Hydrodynamics from kinetic models of conservative economies. Phys. A Stat. Mech. Appl. 384(2), 493–506 (2007)CrossRefGoogle Scholar
 Düring, B., Wolfram, M.T.: Opinion dynamics: inhomogeneous Boltzmanntype equations modelling opinion leadership and political segregation. Proc. R. Soc. Lond. A 471, 20150345 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 Düring, B., Matthes, D., Toscani, G.: Kinetic equations modelling wealth redistribution: a comparison of approaches. Phys. Rev. E 78(5), 056103 (2008)MathSciNetCrossRefGoogle Scholar
 Düring, B., Markowich, P.A., Pietschmann, J.F., Wolfram, M.T.: Boltzmann and Fokker–Planck equations modelling opinion formation in the presence of strong leaders. Proc. R. Soc. A 465(2112), 3687–3708 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
 Düring, B., Jüngel, A., Trussardi, L.: A kinetic equation for economic value estimation with irrationality and herding. Kinet. Relat. Models 10(1), 239–261 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
 Elo, A.E.: The Rating of Chess Players, Past and Present. ISHI Press International, San Jose (1978)Google Scholar
 Glickman, M.E., Jones, A.C.: Rating the chess rating system. Chance 12(2), 21–28 (1999)Google Scholar
 Jabin, P.E., Junca, S.: A continuous model for ratings. SIAM J. Appl. Math. 75(2), 420–442 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 Krupp, K.: Kinetische Modelle für die Rangeinstufung von Spielern, Master thesis, WWU Münster (2016)Google Scholar
 Motsch, S., Tadmor, E.: Heterophilious dynamics enhances consensus. SIAM Rev. 56, 577–621 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 Pareschi, L., Toscani, G.: Interacting Multiagent Systems: Kinetic Equations and Monte Carlo Methods. OUP, Oxford (2013)zbMATHGoogle Scholar
 Pareschi, L., Toscani, G.: Wealth distribution and collective knowledge: a Boltzmann approach. Phil. Trans. R. Soc. A 372, 20130396 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 Simon, J.: Compact sets in the space \(L^p(0, T;B)\). Ann. Mat. Pura Appl. 146, 65–96 (1986)CrossRefzbMATHGoogle Scholar
 Torregrossa, M., Toscani, G.: Wealth distribution in presence of debts. A Fokker–Planck description. Commun. Math. Sci. 16(2), 537–560 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
 Toscani, G.: Kinetic models of opinion formation. Commun. Math. Sci. 4(3), 481–496 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
 Villani, C.: Hypocoercivity. Memoirs of the American Mathematical Society, vol. 202(950). American Mathematical Society, Providence (2009)Google Scholar
 Zeidler, E.: Non Linear Functional Analysis and Application, vol. II/A. Springer, New York (1990)CrossRefzbMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.