Encyclopedia of Systems and Control

Living Edition
| Editors: John Baillieul, Tariq Samad

Mean Field Games

  • Peter E CainesEmail author
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4471-5102-9_30-1

Keywords

Nash Equilibrium Stochastic Differential Equation Finite Population Infinite Population Infinite Time Horizon 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Definition

Mean Field Game (MFG) theory studies the existence of Nash equilibria, together with the individual strategies which generate them, in games involving a large number of agents modeled by controlled stochastic dynamical systems. This is achieved by exploiting the relationship between the finite and corresponding infinite limit population problems. The solution of the infinite population problem is given by the fundamental MFG Hamilton-Jacobi-Bellman (HJB) and Fokker-Planck-Kolmogorov (FPK) equations which are linked by the state distribution of a generic agent, otherwise known as the system’s mean field.

Introduction

Large-population, dynamical, multi-agent, competitive, and cooperative phenomena occur in a wide range of designed and natural settings such as communication, environmental, epidemiological, transportation, and energy systems, and they underlie much economic and financial behavior. Analysis of such systems is intractable using the finite population game theoretic methods which have been developed for multi-agent control systems (see, e.g., Basar and Ho 1974; Ho 1980; Basar and Olsder 1999; and Bensoussan and Frehse 1984). The continuum population game theoretic models of economics (Aumann and Shapley 1974; Neyman 2002) are static, as, in general, are the large-population models employed in network games (Altman et al. 2002) and transportation analysis (Wardrop 1952; Haurie and Marcotte 1985; Correa and Stier-Moses 2010). However, dynamical (or sequential) stochastic games were analyzed in the continuum limit in the work of Jovanovic and Rosenthal (1988) and Bergin and Bernhardt (1992), where the fundamental mean field equations appear in the form of a discrete time dynamic programming equation and an evolution equation for the population state distribution.

The mean field equations for dynamical games with large but finite populations of asymptotically negligible agents originated in the work of Huang et al. (200320062007) (where the framework was called the Nash Certainty Equivalence Principle) and independently in that of Lasry and Lions (2006a,b2007), where the now standard terminology of Mean Field Games (MFGs) was introduced. Independent of both of these, the closely related notion of Oblivious Equilibria for large-population dynamic games was introduced by Weintraub et al. (2005) in the framework of Markov Decision Processes (MDPs).

One of the main results of MFG theory is that in large-population stochastic dynamic games individual feedback strategies exist for which any given agent will be in a Nash equilibrium with respect to the pre-computable behavior of the mass of the other agents; this holds exactly in the asymptotic limit of an infinite population and with increasing accuracy for a finite population of agents using the infinite population feedback laws as the finite population size tends to infinity, a situation which is termed an ε-Nash equilibrium. This behavior is described by the solution to the infinite population MFG equations which are fundamental to the theory; they consist of (i) a parameterized family of HJB equations (in the nonuniform parameterized agent case) and (ii) a corresponding family of McKean-Vlasov (MV) FPK PDEs, where (i) and (ii) are linked by the probability distribution of the state of a generic agent, that is to say, the mean field. For each agent, these yield (i) a Nash value of the game, (ii) the best response strategy for the agent, (iii) the agent’s stochastic differential equation (SDE) (i.e., the MV-SDE pathwise description), and (iv) the state distribution of such an agent (via the MV FPK for the parameterized individual).

Dynamical Agents

In the diffusion-based models of large-population games, the state evolution of a collection of N agents \(A_{i},1 \leq i \leq N < \infty,\) is specified by a set N of controlled stochastic differential equations (SDEs) which in the important linear case take the form
$$dx_{i}(t) = [F_{i}x_{i}(t) + G_{i}u_{i}(t)]dt + D_{i}dw_{i}(t),\quad 1 \leq i \leq N,$$
where \(x_{i} \in {\mathbb{R}}^{n}\) is the state, \(u_{i} \in {\mathbb{R}}^{m}\) the control input, and w i the state Wiener process of the ith agent A i , where \(\{w_{i},1 \leq i \leq N\}\) is a collection of N independent standard Wiener processes in \({\mathbb{R}}^{r}\) independent of all mutually independent initial conditions. For simplicity, throughout this entry, all collections of system initial conditions are taken to be independent, zero mean and have finite second moment.
A simplified form of the general case treated in Huang et al. (2007) and Nourian and Caines (2013) is given by the following set of controlled SDEs which for each agent A i includes state coupling with all other agents:
$$dx_{i}(t) = \frac{1} {N}\displaystyle\sum _{j=1}^{N}f(t,x_{ i}(t),u_{i}(t),x_{j}(t))dt +\sigma dw_{i}(t),\quad 1 \leq i \leq N,$$
where here, for the sake of simplicity, only the uniform (non-parameterized) generic agent case is presented. The dynamics of a generic agent in the infinite population limit of this system is then described by the following controlled MV stochastic differential equation:
$$dx(t) = f[t,x(t),u(t),\mu _{t}]dt +\sigma dw(t),$$
where \(f[t,x,u,\mu _{t}] =\int _{{\mathbb{R}}^{n}}f(t,x,u,y)\mu _{t}(dy)\), with the initial condition measure μ 0 specified, where \(\mu _{t}(\cdot )\) denotes the state distribution of the population at t ∈ [0, T]. The dynamics used in the analysis in Lasry and Lions (2006a,b2007) and Cardaliaguet (2012) are of the form \(dx_{i}(t) = u_{i}(t)dt +\sigma dw_{i}(t)\).

The dynamical evolution of the state x i of the ith agent A i in the discrete time Markov Decision Processes (MDP)-based formulation of the so-called anonymous sequential games (Jovanovic and Rosenthal 1988; Bergin and Bernhardt 1992; Weintraub et al. 2005) is described by a Markov state transition function, or kernel, of the form \(P_{t+1} := P(x_{i}(t + 1)\vert x_{i}(t),x_{-i}(t),u_{i}(t),P_{t})\).

Agent Performance Functions

In the basic finite population linear-quadratic diffusion case, the agent \(A_{i},1 \leq i \leq N\), possesses a performance, or loss, function of the form
$$J_{i}^{N}(u_{ i},u_{-i}) = E\displaystyle\int _{0}^{T}\{\Vert x_{ i}(t) - m_{N}(t)\Vert _{Q}^{2} +\Vert u_{ i}(t)\Vert _{R}^{2}\}dt,$$
where we assume the cost coupling to be of the form \(m_{N}(t) := (\overline{x_{N}(t)}+\eta ),\,\eta \in {\mathbb{R}}^{n}\), where u  − i denotes all agents’ control laws except for that of the ith agent and \(\overline{x_{N}}\) denotes the population average state \((1/N)\sum _{i=1}^{N}x_{i},\) and where here and below the expectation is taken over an underlying sample space which carries all initial conditions and Wiener processes.
For the nonlinear case introduced in the previous section, a corresponding finite population mean field loss function is
$$J_{i}^{N}(u_{ i};u_{-i}) := E\displaystyle\int _{0}^{T}\left ((1/N)\displaystyle\sum _{ j=1}^{N}L(t,x_{ i}(t),u_{i}(t),x_{j}(t))\right )dt,\quad 1 \leq i \leq N,$$
where L is the nonlinear state cost-coupling function. Setting, by possible abuse of notation, \(L[t,x,u,\mu _{t}] =\int _{{\mathbb{R}}^{n}}L(t,x,u,y)\mu _{t}(dy)\), the infinite population limit of this cost function for a generic individual agent A is given by
$$J(u,\mu ) := E\displaystyle\int _{0}^{T}L[t,x(t),u(t),\mu _{ t}]dt,$$
which is the general expression for the infinite population individual performance functions appearing in Huang et al. (2006) and Nourian and Caines (2013) and which includes those of Lasry and Lions (2006a,b2007) and Cardaliaguet (2012). Exponentially discounted costs with discount rate parameter ρ are employed for infinite time horizon performance functions in Huang et al. (20032007), while the sample path limit of the long-range average is used for ergodic MFG problems in Lasry and Lions (2006a2007) and Li and Zhang (2008) and in the analysis of adaptive MFG systems (Kizilkale and Caines 2013).

The Existence of Equilibria

The objective of each agent is to find strategies (i.e., control laws) which are admissible with respect to information and other constraints and which minimize its performance function. The resulting problem is necessarily game theoretic and consequently central results of the topic concern the existence of Nash Equilibria and their properties.

The basic linear-quadratic mean field problem has an explicit solution characterizing a Nash equilibrium (see Huang et al. 20032007). Consider the scalar infinite time horizon discounted case, with nonuniform parameterized agents A θ with parameter distribution \(F(\theta ),\theta \in \mathcal{A}\), and dynamical parameters identified as \(a_{\theta } := F_{\theta },b_{\theta } := G_{\theta },Q := 1,r := R;\) then the so-called Nash Certainty Equivalence (NCE) equation scheme generating the equilibrium solution takes the form

$$\displaystyle\begin{array}{rcl} \rho s_{\theta }& =& \frac{ds_{\theta }} {dt} + a_{\theta }s_{\theta } -\frac{b_{\theta }^{2}} {r} \Pi _{\theta }s_{\theta } - {x}^{{\ast}}, \\ \frac{d\overline{x}_{\theta }} {dt} & =& \left (a_{\theta } -\frac{b_{\theta }^{2}} {r} \Pi _{\theta }\right )\overline{x}_{\theta } -\frac{b_{\theta }^{2}} {r} s_{\theta },\qquad 0 \leq t < \infty, \\ \overline{x}(t)& =& \displaystyle\int _{\mathcal{A}}\overline{x}_{\theta }(t)dF(\theta ), \\ {x}^{{\ast}}(t)& =& \gamma (\overline{x}(t)+\eta ), \\ \rho \Pi _{\theta }& =& 2a_{\theta }\Pi _{\theta } -\frac{b_{\theta }^{2}} {r} \Pi _{\theta } + 1,\quad \Pi _{\theta } > 0,\quad \mbox{ Riccati Equation} \\ \end{array}$$
where the control action of the generic parameterized agent A θ is given by \(u_{\theta }^{0}(t) = -\frac{b_{\theta }} {r}(\Pi _{\theta }x_{\theta }(t) + s_{\theta }(t)),0 \leq t < \infty.\) u θ 0 is the optimal tracking feedback law with respect to x  ∗ (t) which is an affine function of the mean field term \(\overline{x}(t)\), the mean with respect to the parameter distribution F of the \(\theta \in \mathcal{A}\) parameterized state means of the agents. Subject to the conditions for the NCE scheme to have a solution, each agent is necessarily in a Nash equilibrium in all full information causal (i.e., non-anticipative) feedback laws with respect to the remainder of the agents when these are employing the law u 0.

It is an important feature of the best response control law u θ 0 that its form depends only on the parametric data of the entire set of agents, and at any instant it is a feedback function of only the state of the agent A θ itself and the deterministic mean field-dependent offset s θ .

For the general nonlinear case, the MFG equations on [0, T] are given by the linked equations for (i) the performance function V for each agent in the continuum, (ii) the FPK for the MV-SDE for that agent, and (iii) the specification of the best response feedback law depending on the mean field measure μ t and the agent’s state x(t). In the uniform agents case, these take the following form.

The Mean Field Game HJB: (MV) FPK Equations
$$\displaystyle\begin{array}{rcl} \text{[MV-HJB]}& & \quad -\frac{\partial V (t,x)} {\partial t} =\inf _{u\in U}\left \{f[t,x(t),u(t),\mu _{t}]\frac{\partial V (t,x)} {\partial x} + L[t,x(t),u(t),\mu _{t}]\right \} \\ & & \qquad \qquad \qquad \ \ \ + \frac{{\sigma }^{2}}{2} \frac{{\partial }^{2}V (t,x)} {\partial {x}^{2}}, \\ && V (T,x) = 0,\qquad (t,x) \in [0,T] \times \mathbb{R}, \\ \text{[MV-FPK]}& & \quad \frac{\partial \mu (t,x)} {\partial t} = -\frac{\partial \{f[t,x,u(t),\mu _{t}]\mu (t,x)\}} {\partial x} + \frac{{\sigma }^{2}} {2} \frac{{\partial }^{2}\mu (t,x)}{\partial {x}^{2}}, \\ \text{[MV-BR]}& & \quad \quad \quad u(t) =\varphi (t,x(t)\vert \mu _{t}),\quad (t,x) \in [0,T] \times \mathbb{R}\end{array}$$

The general nonlinear MFG problem is approached by different routes in Huang et al. (2006) and Nourian and Caines (2013), and Lasry and Lions (2006a,b2007) and Cardaliaguet (2012), respectively. In the former, the so-called probabilistic method solves the MFG equations directly. Subject to technical conditions, an iterated contraction argument establishes the existence of a solution to the HJB-(MV) FPK equations; the best response control laws are obtained from these MFG equations, and these are necessarily Nash equilibria within all causal feedback laws for the infinite population problem. In Lasry and Lions (2006a2007) the MFG equations on the infinite time interval (i.e., the ergodic case) are obtained as the limit of Nash equilibria for increasing finite populations, while in the expository notes of Cardaliaguet (2012) the analytic properties of solutions to the HJB-FPK equations on the finite interval are analyzed using PDE methods including the theory of viscosity solutions.

In Huang et al. (200320062007), Nourian and Caines (2013), and Cardaliaguet (2012), it is shown that subject to technical conditions, the solutions to the HJB-FPK scheme yield ε-Nash solutions for finite population MFGs in that for any ε > 0, there exists a population size N ε such that for all larger populations the use of the feedback law given by the MFG infinite population scheme gives each agent a value to its performance function within ε of the infinite population Nash value.

A counterintuitive feature of these results is that, asymptotically in population size, observations of the states of rival agents are of no value to any given agent; this is in contrast to the situation in single-agent optimal control theory where the value of observations on an agent’s environment is in general positive.

Current Developments and Open Problems

There is now an extensive literature on Mean Field Games, the following being a sample: the mathematical literature has focused on the study of general classes of solutions to the fundamental HJB-FPK equations (see e.g., Cardaliaguet (2013)), while in systems and control, the theory of major-minor agent MFG problems (in economics terminology, atoms and continua) is being developed (Huang 2010; Nguyen and Huang 2012; Nourian and Caines 2013), adaptive control extensions of the LQG theory have been carried out (Kizilkale and Caines 2013), and the risk-sensitive case has been analyzed (Tembine et al. 2012). Much work is now under way in the applications of MFG theory to economics, finance, distributed energy systems, and electrical power markets. Each of these areas has significant open problems, including the application of mathematical transport theory to HJB-FPK equations, the role of MFG theory in portfolio optimization, and the analysis of systems where the presence of partially observed major and minor agent states incurs mean field and agent state estimation.

Bibliography

  1. Altman E, Basar T, Srikant R (2002) Nash equilibria for combined flow control and routing in networks: asymptotic behavior for a large number of users. IEEE Trans Autom Control 47(6):917–930. Special issue on Control Issues in Telecommunication NetworksGoogle Scholar
  2. Aumann RJ, Shapley LS (1974) Values of non-atomic games. Princeton University Press, PrincetonzbMATHGoogle Scholar
  3. Basar T, Ho YC (1974) Informational properties of the Nash solutions of two stochastic nonzero-sum games. J Econ Theory 7:370–387CrossRefMathSciNetGoogle Scholar
  4. Basar T, Olsder GJ (1999) Dynamic noncooperative game theory. SIAM, PhiladelphiazbMATHGoogle Scholar
  5. Bensoussan A, Frehse J (1984) Nonlinear elliptic systems in stochastic game theory. J Reine Angew Math 350:23–67zbMATHMathSciNetGoogle Scholar
  6. Bergin J, Bernhardt D (1992) Anonymous sequential games with aggregate uncertainty. J Math Econ 21:543–562. North-HollandGoogle Scholar
  7. Cardaliaguet P (2012) Notes on mean field games. Collège de FranceGoogle Scholar
  8. Cardaliaguet P (2013) Long term average of first order mean field games and work KAM theory. Dyn Games Appl 3:473–488CrossRefMathSciNetGoogle Scholar
  9. Correa JR, Stier-Moses NE (2010) In: Cochran JJ (ed) Wardrop equilibria. Wiley encyclopedia of operations research and management science. Jhon Wiley & Sons, Chichester, UKGoogle Scholar
  10. Haurie A, Marcotte P (1985) On the relationship between Nash-Cournot and Wardrop equilibria. Networks 15(3):295–308CrossRefzbMATHMathSciNetGoogle Scholar
  11. Ho YC (1980) Team decision theory and information structures. Proc IEEE 68(6):15–22Google Scholar
  12. Huang MY (2010) Large-population LQG games involving a major player: the Nash certainty equivalence principle. SIAM J Control Optim 48(5):3318–3353CrossRefzbMATHGoogle Scholar
  13. Huang MY, Caines PE, Malhamé RP (2003) Individual and mass behaviour in large population stochastic wireless power control problems: centralized and Nash equilibrium solutions. In: IEEE conference on decision and control, Maui, pp 98–103Google Scholar
  14. Huang MY, Malhamé RP, Caines PE (2006) Large population stochastic dynamic games: closed loop Kean-Vlasov systems and the Nash certainty equivalence principle. Commun Inf Syst 6(3):221–252zbMATHMathSciNetGoogle Scholar
  15. Huang MY, Caines PE, Malhamé RP (2007) Large population cost-coupled LQG problems with non-uniform agents: individual-mass behaviour and decentralized ε – Nash equilibria. IEEE Tans Autom Control 52(9):1560–1571CrossRefGoogle Scholar
  16. Jovanovic B, Rosenthal RW (1988) Anonymous sequential games. J Math Econ 17(1):77–87. ElsevierGoogle Scholar
  17. Kizilkale AC, Caines PE (2013) Mean field stochastic adaptive control. IEEE Trans Autom Control 58(4):905–920CrossRefMathSciNetGoogle Scholar
  18. Lasry JM, Lions PL (2006a) Jeux à champ moyen. I – Le cas stationnaire. Comptes Rendus Math 343(9):619–625CrossRefzbMATHMathSciNetGoogle Scholar
  19. Lasry JM, Lions PL (2006b) Jeux à champ moyen. II – Horizon fini et controle optimal. Comptes Rendus Math 343(10):679–684CrossRefzbMATHMathSciNetGoogle Scholar
  20. Lasry JM Lions PL (2007) Mean field games. Jpn J Math 2:229–260CrossRefzbMATHMathSciNetGoogle Scholar
  21. Li T, Zhang JF (2008) Asymptotically optimal decentralized control for large population stochastic multiagent systems. IEEE Tans Autom Control 53(7):1643–1660CrossRefGoogle Scholar
  22. Neyman A (2002) Values of games with infinitely many players. In: Aumann RJ, Hart S (eds) Handbook of game theory, vol 3. North Holland, Amsterdam, pp 2121–2167Google Scholar
  23. Nourian M, Caines PE (2013) ε-Nash Mean field games theory for nonlinear stochastic dynamical systems with major and minor agents. SIAM J Control Optim 50(5):2907–2937MathSciNetGoogle Scholar
  24. Nguyen SL, Huang M (2012) Linear-quadratic-Gaussian mixed games with continuum-parametrized minor players. SIAM J Control Optim 50(5):2907–2937CrossRefzbMATHMathSciNetGoogle Scholar
  25. Tembine H, Zhu Q, Basar T (2012) Risk-sensitive mean field games. arXiv:1210.2806Google Scholar
  26. Wardrop JG (1952) Some theoretical aspects of road traffic research. In: Proceedings of the institute of civil engineers, London, part II, vol 1, pp 325–378Google Scholar
  27. Weintraub GY, Benkard C, Van Roy B (2005) Oblivious equilibrium: a mean field approximation for large-scale dynamic games. In: Advances in neural information processing systems. MIT, CambridgeGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  1. 1.McGill UniversityMontrealCanada