Encyclopedia of Systems and Control

Living Edition
| Editors: John Baillieul, Tariq Samad

Linear Quadratic Zero-Sum Two-Person Differential Games

  • Pierre BernhardEmail author
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4471-5102-9_29-1

Abstract

As in optimal control theory, linear quadratic (LQ) differential games (DG) can be solved, even in high dimension, via a Riccati equation. However, contrary to the control case, existence of the solution of the Riccati equation is not necessary for the existence of a closed-loop saddle point. One may “survive” a particular, nongeneric, type of conjugate point. An important application of LQDGs is the so-called H -optimal control, appearing in the theory of robust control.

Perfect State Measurement

Linear quadratic differential games are a special case of differential games (DG). See the article Pursuit-Evasion Games and Nonlinear Zero-Sum, Two-Person Differential Games. They were first investigated by Ho et al. (1965), in the context of a linearized pursuit-evasion game. This subsection is based upon Bernhard (19791980). A linear quadratic DG is defined as
$$\dot{x} = Ax + Bu + Dv\,,\quad x(t_{0}) = x_{0}\,,$$
with \(x \in {\mathbb{R}}^{n}\), \(u \in {\mathbb{R}}^{m}\), \(v \in {\mathbb{R}}^{\ell}\), \(u(\cdot ) \in {L}^{2}([0,T], {\mathbb{R}}^{m})\), \(v(\cdot ) \in {L}^{2}([0,T], {\mathbb{R}}^{\ell})\). Final time T is given, there is no terminal constraint, and using the notation \({x}^{t}Kx =\| x\|_{K}^{2}\),
$$J(t_{0},x_{0};u(\cdot ),v(\cdot )) =\| x(T)\|_{K}^{2}+\displaystyle\int _{ t_{0}}^{T}(\begin{array}{ccc} {x}^{t }&{u}^{t}&{v}^{t} \end{array} )\left (\begin{array}{@{}c@{\quad }c@{\quad }c@{}} Q \quad &S_{1}\quad & S_{2} \\ S_{1}^{t}\quad & R\quad & 0 \\ S_{2}^{t}\quad & 0 \quad & - \Gamma \end{array} \right )\left (\begin{array}{c} x\\ u\\ v \end{array} \right )\,\mathrm{d}t\,.$$
The matrices of appropriate dimensions, A, B, D, Q, S i , R, and \(\Gamma \), may all be measurable functions of time. R and \(\Gamma \) must be positive definite with inverses bounded away from zero. To get the most complete results available, we assume also that K and Q are nonnegative definite, although this is only necessary for some of the following results. Detailed results without that assumption were obtained by Zhang (2005) and Delfour (2005). We chose to set the cross term in uv in the criterion null; this is to simplify the results and is not necessary. This problem satisfies Isaacs’ condition (see article DG) even with nonzero such cross terms.
Using the change of control variables
$$u =\tilde{ u} - {R}^{-1}S_{ 1}^{t}x\,,\qquad v =\tilde{ v} + {\Gamma }^{-1}S_{ 2}^{t}x\,,$$
yields a DG with the same structure, with modified matrices A and Q, but without the cross terms in xu and xv. (This extends to the case with nonzero cross terms in uv.) Thus, without loss of generality, we will proceed with (S 1S 2) = (0 0).
The existence of open-loop and closed-loop solutions to that game is ruled by two Riccati equations for symmetric matrices P and P , respectively, and by a pair of canonical equations that we shall see later:
$$\displaystyle\begin{array}{rcl} & \dot{P} + PA + {A}^{t}P - PB{R}^{-1}{B}^{t}P + PD{\Gamma }^{-1}{D}^{t}P + Q = 0\,,& P(T) = K\,,\end{array}$$
(1)
$$\displaystyle\begin{array}{rcl} & \dot{{P}}^{\star } + {P}^{\star }A + {A}^{t}{P}^{\star } + {P}^{\star }D{\Gamma }^{-1}{D}^{t}{P}^{\star } + Q = 0\,,& {P}^{\star }(T) = K\,.\end{array}$$
(2)
When both Riccati equations have a solution over [t, T], it holds that in the partial ordering of definiteness,
$$0 \leq P(t) \leq {P}^{\star }(t)\,.$$
When the saddle point exists, it is represented by the state feedback strategies
$$u {=\varphi }^{\star }(t,x) = -{R}^{-1}{B}^{t}P(t)x\,,\qquad v {=\psi }^{\star }(t,x) = {\Gamma }^{-1}{D}^{t}P(t)x\,.$$
(3)
The control functions generated by this pair of feedbacks will be noted \(\hat{u}(\cdot )\) and \(\hat{v}(\cdot )\).

Theorem 1

  • A sufficient condition for the existence of a closed-loop saddle point, then given by \({(\varphi }^{\star }{,\psi }^{\star })\) in (3) , is that Eq. (1) has a solution P(t) defined over [t 0,T].

  • A necessary and sufficient condition for the existence of an open-loop saddle point is that Eq. (2) has a solution over [t 0 ,T] (and then so does (1)). In that case, the pairs \((\hat{u}(\cdot ),\hat{v}(\cdot ))\),\((\hat{u}(\cdot ){,\psi }^{\star })\) , and \({(\varphi }^{\star }{,\psi }^{\star })\) are saddle points.

  • A necessary and sufficient condition for \({(\varphi }^{\star },\hat{v}(\cdot ))\) to be a saddle point is that Eq. (1) has a solution over [t 0,T].

  • In all cases where a saddle point exists, the Value function is \(V (t,x) =\| x\|_{P(t)}^{2}\).

However, Eq. (1) may fail to have a solution and a closed-loop saddle point still exists. The precise necessary condition is as follows: let X(⋅) and Y (⋅) be two square matrix function solutions of the canonical equations
$$\left (\begin{array}{c} \dot{X}\\ \dot{Y }\end{array} \right ) = \left (\begin{array}{cc} A & - B{R}^{-1}{B}^{t} + D{\Gamma }^{-1}{D}^{t} \\ - Q& - {A}^{t} \end{array} \right )\left (\begin{array}{c} X\\ Y \end{array} \right )\,,\quad \left (\begin{array}{c} X(T)\\ Y (T) \end{array} \right ) = \left (\begin{array}{c} I\\ K \end{array} \right )\,.$$
The matrix P(t) exists for t ∈ [t 0, T] if and only if X(t) is invertible over that range, and then, P(t) = Y (t)X − 1(t). Assume that the rank of X(t) is piecewise constant, and let X (t) denote the pseudo-inverse of X(t) and \(\mathcal{R}(X(t))\) its range.

Theorem 2

A necessary and sufficient condition for a closed-loop saddle point to exist, which is then given by (3) with \(P(t) = Y (t){X}^{\dag }(t)\) , is that
  1. 1.

    \(x_{0} \in \mathcal{R}(X(t_{0}))\).

     
  2. 2.

    For almost all \(t \in [t_{0},T]\),\(\mathcal{R}(D(t)) \subset \mathcal{R}(X(t))\).

     
  3. 3.

    \(\forall t \in [t_{0},T]\),\(Y (t){X}^{\dag }(t) \geq 0\).

     

In a case where X(t) is only singular at an isolated instant t (then conditions 1 and 2 above are automatically satisfied), called a conjugate point but where YX − 1 remains positive definite on both sides of it, the conjugate point is called even. The feedback gain F = − R − 1 B t P diverges upon reaching t , but on a trajectory generated by this feedback, the control u(t) = F(t)x(t) remains finite. (See an example in Bernhard 1979.)

If T = , with all system and payoff matrices constant and Q > 0, Mageirou (1976) has shown that if the algebraic Riccati equation obtained by setting \(\dot{P} = 0\) in (1) admits a positive definite solution P, the game has a Value \(\|x\|_{P}^{2}\), but (3) may not be a saddle point. (ψ may not be an equilibrium strategy.)

H -Optimal Control

This subsection is entirely based upon Başar and Bernhard (1995). It deals with imperfect state measurement, using Bernhard’s nonlinear minimax certainty equivalence principle (Bernhard and Rapaport 1996).

Several problems of robust control may be brought to the following one: a linear, time-invariant system with two inputs (control input \(u \in {\mathbb{R}}^{m}\) and disturbance input \(w \in {\mathbb{R}}^{\ell}\)) and two outputs (measured output \(y \in {\mathbb{R}}^{p}\) and controlled output \(z \in {\mathbb{R}}^{q}\)) is given. One wishes to control the system with a nonanticipative controller u(⋅) = ϕ(y(⋅)) in order to minimize the induced linear operator norm between spaces of square-integrable functions, of the resulting operator w(⋅)↦z(⋅).

It turns out that the problem which has a tractable solution is a kind of dual one: given a positive number \(\gamma\), is it possible to make this norm no larger than \(\gamma\)? The answer to this question is yes if and only if
$$\inf _{\phi \in \Phi }\sup _{w(\cdot )\in {L}^{2}}\displaystyle\int _{-\infty }^{\infty }(\|z{(t)\|}^{2} {-\gamma }^{2}\|w{(t)\|}^{2})\,\mathrm{d}t \leq 0\,.$$

We shall extend somewhat this classical problem by allowing either a time variable system, with a finite horizon T, or a time-invariant system with an infinite horizon.

The dynamical system is
$$\displaystyle\begin{array}{rcl} \dot{x}& =& Ax + Bu + Dw\,,\end{array}$$
(4)
$$\displaystyle\begin{array}{rcl} y& =& Cx + Ew\,,\end{array}$$
(5)
$$\displaystyle\begin{array}{rcl} z& =& Hx + Gu\,.\end{array}$$
(6)
We let
$$\left (\begin{array}{c} D\\ E \end{array} \right )(\begin{array}{cc} {D}^{t}&{E}^{t} \end{array} ) = \left (\begin{array}{cc} M & L\\ {L}^{t } &N \end{array} \right )\,,\qquad \left (\begin{array}{c} {H}^{t} \\ {G}^{t} \end{array} \right )(\begin{array}{cc} H &G \end{array} ) = \left (\begin{array}{cc} Q & S\\ {S}^{t }&R \end{array} \right )\,,$$
and we assume that E is onto, ⇔ N > 0, and G is one-to-one ⇔ R > 0.

Finite Horizon

In this part, we consider a time-varying system, with all matrix functions measurable. Since the state is not known exactly, we assume that the initial state is not known either. The issue is therefore to decide whether the criterion
$$J_{\gamma } =\| x(T)\|_{K}^{2} +\displaystyle\int _{ t_{0}}^{T}(\|z{(t)\|}^{2} {-\gamma }^{2}\|w{(t)\|}^{2})\mathrm{d}t {-\gamma }^{2}\|x_{ 0}\|_{\Sigma _{0}}^{2}$$
(7)
may be kept finite and with which strategy. Let
$${\gamma }^{\star } =\inf \{\gamma \mid \inf _{\phi \in \Phi }\sup _{x_{0}\in {\mathbb{R}}^{n},w(\cdot )\in {L}^{2}}J_{\gamma } < \infty \}\,.$$

Theorem 3

\(\gamma {\leq \gamma }^{\star }\) if and only if the following three conditions are satisfied:
  1. 1.
    The following Riccati equation has a solution over [t 0 ,T] :
    $$-\dot{P} = PA + {A}^{t}P - (PB + S){R}^{-1}({B}^{t}P + {S}^{t}) {+\gamma }^{-2}PMP + Q\,,\quad P(T) = K\,.$$
    (8)
     
  2. 2.
    The following Riccati equation has a solution over [t 0 ,T] :
    $$\dot{\Sigma } = A\Sigma + \Sigma {A}^{t} - (\Sigma {C}^{t} + L){N}^{-1}(C\Sigma + {L}^{t}) {+\gamma }^{-2}\Sigma Q\Sigma + M\,,\quad \Sigma (t_{ 0}) = \Sigma _{0}\,.$$
    (9)
     
  3. 3.
    The following spectral radius condition is satisfied:
    $$\forall t \in [t_{0},T]\,,\quad \rho (\Sigma (t)P(t)) {<\gamma }^{2}\,.$$
    (10)
     
In that case, the optimal controller ensuring \(\inf _{\phi }\sup _{x_{0},w}J_{\gamma }\) is given by a “worst case state” \(\hat{x}(\cdot )\) satisfying \(\hat{x}(0) = 0\) and
$$\dot{\hat{x}} = [A-B{R}^{-1}({B}^{t}P+{S}^{t}){+\gamma }^{-2}D({D}^{t}P+{L}^{t})]\hat{x}+{(I{-\gamma }^{-2}\Sigma P)}^{-1}(\Sigma {C}^{t}+L)(y-C\hat{x})\,,$$
(11)
and the certainty equivalent controller
$${ \phi }^{\star }(y(\cdot ))(t) = -{R}^{-1}({B}^{t}P + {S}^{t})\hat{x}(t)\,.$$
(12)

Infinite Horizon

The infinite horizon case is the traditional H -optimal control problem reformulated in a state space setting. We let all matrices defining the system be constant. We take the integral in (7) from − to + , with no initial or terminal term of course. We add the hypothesis that the pairs (A, B) and (A, D) are stabilizable and the pairs (C, A) and (H, A) detectable. Then, the theorem is as follows:

Theorem 4

\(\gamma {\leq \gamma }^{\star }\) if and only if the following conditions are satisfied: The algebraic Riccati equations obtained by placing \(\dot{P} = 0\) and \(\dot{\Sigma } = 0\) in (8) and (9) have positive definite solutions, which satisfy the spectral radius condition (10) . The optimal controller is given by Eqs. (11) and (12) , where P and Σ are the minimal positive definite solutions of the algebraic Riccati equations, which can be obtained as the limit of the solutions of the differential equations as t →−∞ for P and t →∞ for Σ.

Conclusion

The similarity of the H -optimal control theory with the LQG, stochastic, theory is in many respects striking, as is the duality observation control. Yet, the “observer” of H -optimal control does not arise from some estimation theory but from the analysis of a “worst case.” The best explanation might be in the duality of the ordinary, or (+, ×), algebra with the idempotent (max, + ) algebra (see Bernhard 1996). The complete theory of H -optimal control in that perspective has yet to be written.

Cross-References

Bibliography

  1. Başar T, Bernhard P (1995) H -optimal control and related minimax design problems: a differential games approach, 2nd edn. Birkhäuser, BostonGoogle Scholar
  2. Bernhard P (1979) Linear quadratic zero-sum two-person differential games: necessary and sufficient condition. JOTA 27(1):51–69CrossRefzbMATHMathSciNetGoogle Scholar
  3. Bernhard P (1980) Linear quadratic zero-sum two-person differential games: necessary and sufficient condition: comment. JOTA 31(2):283–284CrossRefzbMATHMathSciNetGoogle Scholar
  4. Bernhard P (1996) A separation theorem for expected value and feared value discrete time control. COCV 1:191–206CrossRefzbMATHGoogle Scholar
  5. Bernhard P, Rapaport A (1996) Min-max certainty equivalence principle and differential games. Int J Robust Nonlinear Control 6:825–842CrossRefzbMATHMathSciNetGoogle Scholar
  6. Delfour M (2005) Linear quadratic differential games: saddle point and Riccati equation. SIAM J Control Optim 46:750–774CrossRefMathSciNetGoogle Scholar
  7. Ho Y-C, Bryson AE, Baron S (1965) Differential games and optimal pursuit-evasion strategies. IEEE Trans Autom Control AC-10:385–389CrossRefMathSciNetGoogle Scholar
  8. Mageirou EF (1976) Values and strategies for infinite time linear quadratic games. IEEE Trans Autom Control AC-21:547–550CrossRefMathSciNetGoogle Scholar
  9. Zhang P (2005) Some results on zero-sum two-person linear quadratic differential games. SIAM J Control Optim 43:2147–2165CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  1. 1.INRIA-Sophia Antipolis-MéditerranéeSophia AntipolisFrance