Infinite Horizon Optimal Control

Grüne, Lars; Pannek, Jürgen

doi:10.1007/978-3-319-46024-6_4

Infinite Horizon Optimal Control

Lars Grüne⁷ &
Jürgen Pannek⁸

Chapter
First Online: 11 November 2016

7262 Accesses

Part of the book series: Communications and Control Engineering ((CCE))

Abstract

In this chapter we give an introduction to nonlinear infinite horizon optimal control. The dynamic programming principle as well as several consequences of this principle are proved. One of the main results of this chapter is that the infinite horizon optimal feedback law asymptotically stabilizes the system and that the infinite horizon optimal value function is a Lyapunov function for the closed-loop system. Motivated by this property we formulate a relaxed version of the dynamic programming principle, which allows to prove stability and suboptimality results for nonoptimal feedback laws and without using the optimal value function. A practical version of this principle is provided, too. These results will be central in the following chapters for the stability and performance analysis of NMPC algorithms. For the special case of sampled data systems we finally show that for suitable integral costs asymptotic stability of the continuous time sampled data closed-loop system follows from the asymptotic stability of the associated discrete time system.

Download chapter PDF

In this chapter we give an introduction to nonlinear infinite horizon optimal control. The dynamic programming principle as well as several consequences of this principle are proved. One of the main results of this chapter is that the infinite horizon optimal feedback law asymptotically stabilizes the system and that the infinite horizon optimal value function is a Lyapunov function for the closed-loop system. Motivated by this property we formulate a relaxed version of the dynamic programming principle, which allows to prove stability and suboptimality results for nonoptimal feedback laws and without using the optimal value function. A practical version of this principle is provided, too. These results will be central in the following chapters for the stability and performance analysis of NMPC algorithms. For the special case of sampled data systems we finally show that for suitable integral costs asymptotic stability of the continuous time sampled data closed-loop system follows from the asymptotic stability of the associated discrete time system.

4.1 Definition and Well Posedness of the Problem

For the finite horizon optimal control problems from the previous chapter we can define infinite horizon counterparts by replacing the upper limits $N-1$ in the respective sums by $\infty $. Since for this infinite horizon formulation the terminal state $x_u(N)$ vanishes from the problem, it is not reasonable to consider terminal conditions. Furthermore, we will not consider any weights in the infinite horizon case. Hence, the most general infinite horizon problem we consider is the following.

We optimize over the set of admissible control sequences $\mathbb {U}^\infty (x_0)$ defined in Definition 3.2 and assume that this set is nonempty for all $x_0\in \mathbb {X}$, which is equivalent to the viability of $\mathbb {X}$ according to Assumption 3.3. In order to keep the presentation self contained all subsequent statements are formulated for general time varying stage cost $\ell $. For some of the results we assume that $\ell $ is as in (3.8), i.e., it penalizes the distance to a (possibly time varying) reference trajectory $x^{ref}$ . In the special case of constant reference $x^{\mathrm{ref}}\equiv x_*$ the stage cost $\ell $ and the functional $J_\infty $ in (OCP$_\infty ^\text {n}$) do not depend on the time n. In this case, we denote the problem by (OCP$_\infty $).

Similar to Definition 3.14 we define the optimal value function and optimal trajectories .

Definition 4.1

Consider the optimal control problem (OCP$_\infty ^\text {n}$) with initial value $x_0\in \mathbb {X}$ and time instant $n\in \mathbb {N}_0$ .

(i)
The function
$$\begin{aligned} V_\infty (n,x_0) := \inf _{u(\cdot )\in \mathbb {U}^\infty (x_0)} J_\infty (n,x_0,u(\cdot )) \end{aligned}$$
is called optimal value function .
(ii)
A control sequence $u^\star (\cdot )\in \mathbb {U}^\infty (x_0)$ is called optimal control sequence for $x_0$ if
$$\begin{aligned} V_\infty (n,x_0) = J_\infty (n,x_0,u^\star (\cdot )) \end{aligned}$$
holds. The corresponding trajectory $x_{u^\star }(\cdot ,x_0)$ is called optimal trajectory.

Since now—in contrast to the finite horizon problem—an infinite sum appears in the definition of $J_\infty $, it is no longer straightforward that $V_\infty $ is finite. In order to ensure that this is the case, the following definition is helpful.

Definition 4.2

Consider the control system (2.1) and a reference trajectory $x^{\mathrm{ref}}:\mathbb {N}_0\rightarrow \mathbb {X}$ with reference control sequence $u^{\mathrm{ref}}\in \mathbb {U}^\infty (x^{\mathrm{ref}}(0))$ . We say that the system is (uniformly) asymptotically controllable to $x^{\mathrm{ref}}$ if there exists a function $\beta \in \mathscr {KL}$ such that for each initial time $n_0\in \mathbb {N}_0$ and for each admissible initial value $x_0\in \mathbb {X}$ there exists an admissible control sequence $u\in \mathbb {U}^\infty (x_0)$ such that the inequality

$$\begin{aligned} |x_u(n,x_0)|_{x^{\mathrm{ref}}(n+n_0)} \le \beta (|x_0|_{x^{\mathrm{ref}}(n_0)},n) \end{aligned}$$

(4.1)

holds for all $n\in \mathbb {N}_0$. We say that this asymptotic controllability has the small control property if $u\in \mathbb {U}^\infty (x_0)$ can be chosen such that the inequality

$$\begin{aligned} |x_u(n,x_0)|_{x^{\mathrm{ref}}(n+n_0)} + |u(n)|_{u^{\mathrm{ref}}(n+n_0)} \le \beta (|x_0|_{x^{\mathrm{ref}}(n_0)},n) \end{aligned}$$

(4.2)

holds for all $n\in \mathbb {N}_0$. Here, as in Sect. 2.3 we write $|x_1|_{x_2}= d_{X}(x_1,x_2)$ and $|u_1|_{u_2}= d_{U}(u_1,u_2)$.

Observe that uniform asymptotic controllability is a necessary condition for uniform feedback stabilization . Indeed, if we assume asymptotic stability of the closed-loop system $x^+=g(n,x)=f(x,\mu (n,x))$, then we immediately get asymptotic controllability with control $u(n)=\mu (n,x(n,n_0,x_0))$. The small control property, however, is not satisfied in general.

In order to use Definition 4.2 for deriving bounds on the optimal value function , we need a result known as Sontag’s $\mathscr {KL}$-Lemma [24, Proposition 7] . This proposition states that for each $\mathscr {KL}$-function $\beta $ there exist functions $\gamma _1,\gamma _2\in \mathscr {K}_\infty $ such that the inequality

$$\begin{aligned} \beta (r,n)\le \gamma _1(e^{-n}\gamma _2(r)) \end{aligned}$$

holds for all $r,n\ge 0$ (in fact, the result holds for real $n\ge 0$ but we only need it for integers here). Using the functions $\gamma _1$ and $\gamma _2$ we can define stage cost functions

$$\begin{aligned} \ell (n,x,u) := \gamma _1^{-1}(|x|_{x^{\mathrm{ref}}(n)}) + \lambda \gamma _1^{-1}(|u|_{u^{\mathrm{ref}}(n)}) \end{aligned}$$

(4.3)

for $\lambda \ge 0$. The following theorem states that under Definition 4.2 this stage cost ensures (uniformly) finite upper and positive lower bounds on $V_\infty $.

Theorem 4.3

(Bounds on $V_\infty $) Consider the control system (2.1) and a reference trajectory $x^{\mathrm{ref}}:\mathbb {N}_0\rightarrow \mathbb {X}$ with reference control sequence $u^{\mathrm{ref}}\in \mathbb {U}^\infty (x^{\mathrm{ref}}(0))$. If the system is asymptotically controllable to $x^{\mathrm{ref}}$, then there exist $\alpha _1,\alpha _2\in \mathscr {K}_\infty $ such that the optimal value function $V_\infty $ corresponding to the cost function $\ell :\mathbb {N}_0\times X\times U\rightarrow \mathbb {R}_0^+$ from (4.3) with $\lambda =0$ satisfies

$$\begin{aligned} \alpha _1(|x_0|_{x^{\mathrm{ref}}(n_0)}) \le V_\infty (n_0,x_0)\le \alpha _2(|x_0|_{x^{\mathrm{ref}}(n_0)}) \end{aligned}$$

(4.4)

for all $n_0\in \mathbb {N}_0$ and all $x_0\in \mathbb {X}$.

If, in addition, the asymptotic controllability has the small control property then the statement also holds for $\ell $ from (4.3) with arbitrary $\lambda \ge 0$.

Proof

For each $x_0$, $n_0$, and $u\in \mathbb {U}^\infty (x_0)$ we get

$$\begin{aligned} J_\infty (n_0,x_0,u) = \sum _{k=0}^{\infty } \ell (n_0+k,x_u(k,x_0),u(k)) \ge \ell (n,x_u(0,x_0),u(0)) \ge \gamma _1^{-1}(|x_0|_{x^{\mathrm{ref}}(n_0)}) \end{aligned}$$

for each $\lambda \ge 0$. Hence, from the definition of $V_\infty $ we get

$$\begin{aligned} V_\infty (n_0,x_0) = \inf _{u(\cdot )\in \mathbb {U}^\infty (x_0)} J_\infty (n_0,x_0,u(\cdot ))\ge \gamma _1^{-1}(|x_0|_{x^{\mathrm{ref}}(n_0)}). \end{aligned}$$

This proves the lower bound in (4.4) for $\alpha _1 = \gamma _1^{-1}$.

For proving the upper bound, we first consider the case $\lambda =0$. For all $n_0$ and $x_0$ the control $u\in \mathbb {U}^\infty (x_0)$ from Definition 4.2 yields

$$\begin{aligned} V_\infty (n_0,x_0)&\le J_\infty (n_0,x_0,u) \\&= \sum _{k=0}^{\infty } \ell (n_0+k,x_u(k,x_0),u(k)) \\&= \sum _{k=0}^{\infty } \gamma _1^{-1}(|x_u(k,x_0)|_{x^{\mathrm{ref}}(n_0+k)}) \\&\le \sum _{k=0}^{\infty } \gamma _1^{-1}(\beta (|x_0|_{x^{\mathrm{ref}}(n_0)},k))\;\; \le \;\; \sum _{k=0}^{\infty } e^{-k}\gamma _2(|x_0|_{x^{\mathrm{ref}}(n_0)}) \\&= \frac{e}{e-1}\gamma _2(|x_0|_{x^{\mathrm{ref}}(n_0)}), \end{aligned}$$

i.e., the upper inequality from (4.4) with $\alpha _2(r) = e\gamma _2(r)/(e-1)$. If the small control property holds, then the upper bound for $\lambda >0$ follows similarly with $\alpha _2(r) = (1+\lambda )e\gamma _2(r)/(e-1)$. $\square $

In fact, the specific form (4.3) is just one possible choice of $\ell $ for which this theorem holds. It is rather easy to extend the result to any $\ell $ which is bounded from below by some $\mathscr {K}_\infty $-function in x (uniformly for all u and n) and bounded from above by $\ell $ from (4.3) in balls $\mathscr {B}_\varepsilon (x^{\mathrm{ref}}(n))$. Since, however, the choice of appropriate cost functions $\ell $ for infinite horizon optimal control problems is not a central topic of this book, we leave this extension to the interested reader.

4.2 The Dynamic Programming Principle

In this section we essentially restate and reprove the results from Sect. 3.4 for the infinite horizon case . We begin with the dynamic programming principle for the infinite horizon problem (OCP$_\infty ^\text {n}$). Throughout this section we assume that $V_\infty (x)$ is finite for all $x\in \mathbb {X}$ as ensured, e.g., by Theorem 4.3.

Theorem 4.4

(Dynamic programming principle) Consider the optimal control problem (OCP$_\infty ^\text {n}$) with $x_0\in \mathbb {X}$ and $n\in \mathbb {N}_0$. Then for all $K\in \mathbb {N}$ the equation

$$\begin{aligned} V_\infty (n,x_0) = \inf _{u(\cdot )\in \mathbb {U}^{K}(x_0)}\left\{ \sum _{k=0}^{K-1} \ell (n+k,x_u(k,x_0),u(k)) + V_\infty (n+K,x_u(K,x_0))\right\} \end{aligned}$$

(4.5)

holds. If, in addition, an optimal control sequence $u^\star (\cdot )$ exists for $x_0$, then we get the equation

$$\begin{aligned} V_\infty (n,x_0) = \sum _{k=0}^{K-1} \ell (n+k,x_{u^\star }(k,x_0),u^\star (k)) + V_\infty (n+K,x_{u^\star }(K,x_0)). \end{aligned}$$

(4.6)

In particular, in this case the “$\inf $” in (4.5) is a “$\min $”.

Proof

From the definition of $J_\infty $ for $u(\cdot )\in \mathbb {U}^\infty (x_0)$ we immediately obtain

$$\begin{aligned} J_\infty (n,x_0,u(\cdot )) = \sum _{k=0}^{K-1} \ell (n+k,x_u(k,x_0),u(k)) + J_\infty (n+K,x_u(K,x_0),u(\cdot +K)), \end{aligned}$$

(4.7)

where $u(\cdot +K)$ denotes the shifted control sequence defined by $u(\cdot +K)(k) = u(k+K)$, which is admissible for $x_u(K,x_0)$.

We now prove (4.5) by showing “$\ge $” and “$\le $” separately: From (4.7) we obtain

$$\begin{aligned} J_\infty (n,x_0,u(\cdot ))&= \sum _{k=0}^{K-1} \ell (n+k,x_u(k,x_0),u(k)) + J_\infty (n+K,x_u(K,x_0),u(\cdot +K)) \\&\ge \sum _{k=0}^{K-1} \ell (n+k,x_u(k,x_0),u(k)) + V_\infty (n+K,x_u(K,x_0)). \end{aligned}$$

Since this inequality holds for all $u(\cdot )\in \mathbb {U}^\infty $, it also holds when taking the infimum on both sides. Hence we get

$$\begin{aligned} V_\infty (n,x_0)&= \inf _{u(\cdot )\in \mathbb {U}^\infty (x_0)} J_\infty (n,x_0,u(\cdot )) \\&\ge \inf _{u(\cdot )\in \mathbb {U}^K(x_0)}\left\{ \sum _{k=0}^{K-1} \ell (n+k,x_u(k,x_0),u(k)) + V_\infty (n+K,x_u(K,x_0))\right\} , \end{aligned}$$

i.e., (4.5) with “$\ge $”.

In order to prove “$\le $”, fix $\varepsilon >0$ and let $u^\varepsilon (\cdot )$ be an approximately optimal control sequence for the right hand side of (4.7), i.e.,

$$\begin{aligned}&\sum _{k=0}^{K-1} \ell (n+k,x_{u^\varepsilon }(k,x_0),u^\varepsilon (k)) + J_{\infty }(n+K,x_{u^\varepsilon }(K,x_0),u^\varepsilon (\cdot +K)) \\&\le \;\; \inf _{u(\cdot )\in \mathbb {U}^\infty (x_0)}\left\{ \sum _{k=0}^{K-1} \ell (n+k,x_u(k,x_0),u(k)) + J_{\infty }(n+K,x_u(K,x_0),u(\cdot +K))\right\} + \varepsilon . \end{aligned}$$

Now we decompose $u(\cdot )\in \mathbb {U}^\infty (x_0)$ analogously to Lemma 3.12(ii) and (iii) into $u_1\in \mathbb {U}^K(x_0)$ and $u_2\in \mathbb {U}^{\infty }(x_{u_1}(K,x_0))$ via

$$ u(k) = \left\{ \begin{array}{ll} u_1(k), &{} k=0,\ldots ,K-1\\ u_2(k-K), \;\; &{} k\ge K \end{array}\right. $$

This implies

$$\begin{aligned}&\inf _{u(\cdot )\in \mathbb {U}^\infty (x_0)}\left\{ \sum _{k=0}^{K-1} \ell (n+k,x_u(k,x_0),u(k)) + J_{\infty }(n+K,x_u(K,x_0),u(\cdot +K))\right\} \\&= \,\, \inf _{u_1(\cdot )\in \mathbb {U}^K(x_0) \atop u_2(\cdot )\in \mathbb {U}^{\infty }(x_{u_1}(K,x_0))}\left\{ \sum _{k=0}^{K-1} \ell (n+k,x_{u_1}(k,x_0),u_1(k)) + J_{\infty }(n+K,x_{u_1}(K,x_0),u_2(\cdot ))\right\} \\&= \,\, \inf _{u_1(\cdot )\in \mathbb {U}^K(x_0)}\left\{ \sum _{k=0}^{K-1} \ell (n+k,x_{u_1}(k,x_0),u_1(k)) + V_{\infty }(n+K,x_{u_1}(K,x_0))\right\} \end{aligned}$$

Now (4.7) yields

$$\begin{aligned}&V_\infty (n,x_0) \;\; \le \;\; J_\infty (n,x_0,u^\varepsilon (\cdot )) \\&= \;\; \sum _{k=0}^{K-1} \ell (n+k,x_{u^\varepsilon }(k,x_0),u^\varepsilon (k)) + J_\infty (n+K,x_{u^\varepsilon }(K,x_0),u^\varepsilon (\cdot +K)) \\&\le \;\; \inf _{u(\cdot )\in \mathbb {U}^K(x_0)}\left\{ \sum _{k=0}^{K-1} \ell (n+k,x_u(k,x_0),u(k)) + V_\infty (n+K,x_u(K,x_0))\right\} + \varepsilon , \end{aligned}$$

i.e.,

$$\begin{aligned}&V_\infty (n,x_0)\\&\le \;\; \inf _{u(\cdot )\in \mathbb {U}^K(x_0)}\left\{ \sum _{k=0}^{K-1} \ell (n+k,x_u(k,x_0),u(k)) + V_\infty (n+K,x_u(K,x_0))\right\} + \varepsilon . \end{aligned}$$

Since $\varepsilon >0$ was arbitrary and the expressions in this inequality are independent of $\varepsilon $, this inequality also holds for $\varepsilon =0$, which shows (4.5) with “$\le $” and thus (4.5).

In order to prove (4.6) we use (4.7) with $u(\cdot )=u^\star (\cdot )$. This yields

$$\begin{aligned}&V_\infty (n,x_0) \;\; = \;\; J_\infty (n,x_0,u^\star (\cdot )) \\&= \;\; \sum _{k=0}^{K-1} \ell (n+k,x_{u^\star }(k,x_0),u^\star (k)) + J_\infty (n+K,x_{u^\star }(K,x_0),u^\star (\cdot +K)) \\&\ge \;\; \sum _{k=0}^{K-1} \ell (n+k,x_{u^\star }(k,x_0),u^\star (k)) + V_\infty (n+K,x_{u^\star }(K,x_0)) \\&\ge \;\; \inf _{u(\cdot )\in \mathbb {U}^K(x_0)}\left\{ \sum _{k=0}^{K-1} \ell (n+k,x_u(k,x_0),u(k)) + V_\infty (n+K,x_u(K,x_0))\right\} \\&= \;\; V_\infty (n,x_0), \end{aligned}$$

where we used the (already proved) equality (4.5) in the last step. Hence, the two “$\ge $” in this chain are actually “$=$” which implies (4.6). $\square $

The following corollary states an immediate consequence from the dynamic programming principle. It shows that tails of optimal control sequences are again optimal control sequences for suitably adjusted initial value and time.

Corollary 4.5

If $u^\star (\cdot )$ is an optimal control sequence for (OCP$_\infty ^\text {n}$) with initial value $x_0$ and initial time n, then for each $K\in \mathbb {N}$ the sequence $u^\star _K(\cdot )=u^\star (\cdot +K)$, i.e.,

$$\begin{aligned} u^\star _K(k) = u^\star (K+k), \quad k=0,1,\ldots \end{aligned}$$

is an optimal control sequence for initial value $x_{u^\star }(K,x_0)$ and initial time $n+K$.

Proof

Inserting $V_\infty (n,x_0)=J_\infty (n,x_0,u^\star (\cdot ))$ and the definition of $u_K^\star (\cdot )$ into (4.7) we obtain

$$ V_\infty (n,x_0) = \sum _{k=0}^{K-1} \ell (n+k,x_{u^\star }(k,x_0),u^\star (k)) + J_\infty (n+K,x_{u^\star }(K,x_0),u^\star _K(\cdot ))$$

Subtracting (4.6) from this equation yields

$$ 0 = J_\infty (n+K,x_{u^\star }(K,x_0),u^\star _K(\cdot )) - V_\infty (n+K,x_{u^\star }(K,x_0))$$

which shows the assertion. $\square $

The next two results are the analogues of Theorem 3.17 and Corollary 3.18 in the infinite horizon setting.

Theorem 4.6

Consider the optimal control problem (OCP$_\infty ^\text {n}$) with $x_0\in \mathbb {X}$ and $n\in \mathbb {N}_0$ and assume that an optimal control sequence $u^\star (\cdot )$ exists. Then the feedback law $\mu _\infty (n,x_0)=u^*(0)$ satisfies

(4.8)

and

$$\begin{aligned} V_\infty (n,x_0)= \ell (n,x_0,\mu _\infty (n,x_0)) + V_\infty (n+1,f(x_0,\mu _\infty (n,x_0))) \end{aligned}$$

(4.9)

where in (4.8)—as usual—we interpret $\mathbb {U}^1(x_0)$ as a subset of $U$, i.e., we identify the one element sequence $u=u(\cdot )$ with its only element $u=u(0)$.

Proof

The proof is identical to the finite horizon counterpart Theorem 3.17. $\square $

As in the finite horizon case, the following corollary shows that the feedback law (4.8) can be used in order to construct the optimal control sequence.

Corollary 4.7

Consider the optimal control problem (OCP$_\infty ^\text {n}$) with $x_0\in \mathbb {X}$ and $n\in \mathbb {N}_0$ and consider an admissible feedback law $\mu _{\infty }:\mathbb {N}_0\times \mathbb {X}\rightarrow U$ in the sense of Definition 3.2(iv). Denote the solution of the closed-loop system

$$\begin{aligned} x(0) = x_0, \quad x(k+1) = f(x(k),\mu _{\infty }(n+k,x(k)),\; k=0,1,\ldots \end{aligned}$$

(4.10)

by $x_{\mu _\infty }$ and assume that $\mu _{\infty }$ satisfies (4.8) for initial values $x_0=x_{\mu _\infty }(k)$ for all $k=0,1,\ldots $. Then

$$\begin{aligned} u^\star (k) = \mu _{\infty }(n+k,x_{u^\star }(k,x_0)), \quad k=0,1,\ldots \end{aligned}$$

(4.11)

is an optimal control sequence for initial time n and initial value $x_0$ and the solution of the closed-loop system (4.10) is a corresponding optimal trajectory .

Proof

From (4.11) for x(n) from (4.10) we immediately obtain

$$\begin{aligned} x_{u^\star }(n,x_0) = x(n), \quad n=0,1,\ldots . \end{aligned}$$

Hence, we need to show that

$$\begin{aligned} V_\infty (n,x_0) = J_\infty (n,x_0,u^\star ), \end{aligned}$$

where it is enough to show “$\ge $” because the opposite inequality follows by definition of $V_\infty $. Using (4.11) and (4.9) we get

$$ V_{\infty }(n+k,x_0)= \ell (n+k,x(k),u^\star (k)) + V_{\infty }(n+k+1,x(k+1)) $$

for $k=0,1,\ldots $. Summing these equalities for $k=0,\ldots ,K-1$ for arbitrary $K\in \mathbb {N}$ and eliminating the identical terms $V_{\infty }(n+k,x_0)$, $k=1,\ldots ,K-1$ on the left and on the right we obtain

$$ V_\infty (n,x_0)= \sum _{k=0}^{K-1} \ell (n+k,x(k),u^\star (k)) + V_\infty (n+K,x(K)) \ge \sum _{k=0}^{K-1} \ell (n+k,x(k),u^\star (k)). $$

Since the sum is monotone increasing in K and bounded from above, for $K\rightarrow \infty $ the right hand side converges to $J_\infty (n,x_0,u^\star )$ showing the assertion. $\square $

Corollary 4.7 implies that infinite horizon optimal control is nothing but NMPC with $N=\infty $: Formula (4.11) for $k=0$ yields that if we replace the optimization problem (OCP$_\text {N}^\text {n}$) in Algorithm 3.7 by (OCP$_\infty ^\text {n}$), then the feedback law resulting from this algorithm equals $\mu _\infty $ . The following theorem shows that for stage cost of the form (3.8) this infinite horizon NMPC feedback law yields an asymptotically stable closed loop and thus solves the stabilization and tracking problem.

Theorem 4.8

(Asymptotic stability) Consider the optimal control problem (OCP$_\infty ^\text {n}$) for the control system (2.1) and a reference trajectory $x^{\mathrm{ref}}:\mathbb {N}_0\rightarrow \mathbb {X}$ with reference control sequence $u^{\mathrm{ref}}\in \mathbb {U}^\infty (x^{\mathrm{ref}}(0))$ . Assume that there exist $\alpha _1,\alpha _2,\alpha _3\in \mathscr {K}_\infty $ such that the inequalities

$$\begin{aligned} \alpha _1(|x|_{x^{\mathrm{ref}}(n)}) \le V_\infty (n,x) \le \alpha _2(|x|_{x^{\mathrm{ref}}(n)}) \quad \text{ and } \quad \ell (n,x,u) \ge \alpha _3(|x|_{x^{\mathrm{ref}}(n)}) \end{aligned}$$

(4.12)

hold for all $x\in \mathbb {X}$, $n\in \mathbb {N}_0$, and $u\in U$. Assume furthermore that an optimal feedback $\mu _\infty $ exists, i.e., an admissible feedback law $\mu _{\infty }:\mathbb {N}_0\times \mathbb {X}\rightarrow U$ satisfying (4.8) for all $n\in \mathbb {N}_0$ and all $x\in \mathbb {X}$. Then this optimal feedback asymptotically stabilizes the closed-loop system

$$\begin{aligned} x^+ = g(n,x) = f(x,\mu _\infty (n,x)) \end{aligned}$$

on $\mathbb {X}$ in the sense of Definition 2.16.

Proof

For the closed-loop system, (4.9) and the last inequality in (4.12) yield

$$\begin{aligned} V_\infty (n,x)&= \ell (n,x,\mu _\infty (n,x)) + V_\infty (n+1,f(x,\mu _\infty (n,x)))\\&\ge \alpha _3(|x|_{x^{\mathrm{ref}}(n)}) + V_\infty (n+1,f(x,\mu _\infty (n,x))). \end{aligned}$$

Together with the first two inequalities in (4.12) this shows that $V_\infty $ is a Lyapunov function on $\mathbb {X}$ in the sense of Definition 2.21 with $\alpha _V=\alpha _3$. Thus, Theorem 2.22 yields asymptotic stability on $\mathbb {X}$. $\square $

By Theorem 4.3 we can replace (4.12) by the asymptotic controllability condition from Definition 4.2 if $\ell $ is of the form (4.3). This is used in the following corollary in order to give a stability result without explicitly assuming (4.12).

Corollary 4.9

Consider the optimal control problem (OCP$_\infty ^\text {n}$) for the control system (2.1) and a reference trajectory $x^{\mathrm{ref}}:\mathbb {N}_0\rightarrow \mathbb {X}$ with reference control sequence $u^{\mathrm{ref}}\in \mathbb {U}^\infty (x^{\mathrm{ref}}(0))$. Assume that the system is asymptotically controllable to $x^{\mathrm{ref}}$ and that an optimal feedback $\mu _\infty $, i.e., a feedback satisfying (4.8), exists for the cost function $\ell :\mathbb {N}_0\times X\times U\rightarrow \mathbb {R}_0^+$ from (4.3) with $\lambda =0$. Then this optimal feedback asymptotically stabilizes the closed-loop system

$$\begin{aligned} x^+ = g(n,x) = f(x,\mu _\infty (n,x)) \end{aligned}$$

on $\mathbb {X}$ in the sense of Definition 2.16.

If, in addition, the asymptotic controllability has the small control property then the statement also holds for $\ell $ from (4.3) with arbitrary $\lambda \ge 0$.

Proof

Theorem 4.3 yields

$$\begin{aligned} \alpha _1(|x_0|_{x^{\mathrm{ref}}(n_0)}) \le V_\infty (n_0,x_0)\le \alpha _2(|x_0|_{x^{\mathrm{ref}}(n_0)}) \end{aligned}$$

for suitable $\alpha _1,\alpha _2\in \mathscr {K}_\infty $. Furthermore, by (4.3) the third inequality in (4.12) holds with $\alpha _3=\gamma _1^{-1}$. Hence, (4.12) holds and Theorem 4.8 yields asymptotic stability on $\mathbb {X}$. $\square $

4.3 Relaxed Dynamic Programming

The last results of the previous section show that infinite horizon optimal control can be used in order to derive a stabilizing feedback law. Unfortunately, a direct solution of infinite horizon optimal control problem is in general impossible, both analytically and numerically. Still, infinite horizon optimal control plays an important role in our analysis since we will interpret the model predictive control algorithm as an approximation of the infinite horizon optimal control problem. Here the term “approximation” is not necessarily to be understood in the sense of “being close to” (although this aspect is not excluded) but rather in the sense of “sharing the important structural properties”.

Looking at the proof of Theorem 4.8 we see that the important property for stability is the inequality

$$\begin{aligned} V_\infty (n,x) \ge \ell (n,x,\mu _\infty (n,x)) + V_\infty (n+1,f(x,\mu _\infty (n,x))) \end{aligned}$$

which follows from the feedback version (4.9) of the dynamic programming principle. Observe that although (4.9) yields equality, only this inequality is needed in the proof of Theorem 4.8.

This observation motivates a relaxed version of this dynamic programming inequality, which on the one hand yields asymptotic stability and on the other hand provides a quantitative measure of the closed-loop performance of the system. This relaxed version will be formulated in Theorem 4.11, below. In order to quantitatively measure the closed-loop performance, we use the infinite horizon cost functional evaluated along the closed-loop trajectory which we define as follows.

Definition 4.10

(Infinite horizon cost) Let $\mu :\mathbb {N}_0\times \mathbb {X}\rightarrow U$ be an admissible feedback law . For the trajectories $x_\mu (n)$ of the closed-loop system $x^+=f(x,\mu (n,x))$ with initial value $x_\mu (n_0)=x_0\in \mathbb {X}$ we define the infinite horizon cost as

$$\begin{aligned} J^{cl}_\infty (n_0,x_0,\mu ) := \sum _{k=0}^\infty \ell (n_0+k, x_\mu (n_0+k),\mu (x_\mu (n_0+k))). \end{aligned}$$

For stage costs of the form (3.8) our stage cost $\ell $ is always nonnegative, hence either the infinite sum has a well defined finite value or it diverges to infinity, in which case we write $J^{cl}_\infty (n_0,x_0, \mu )=\infty $. More general stage costs will be discussed in Chap. 8.

By Corollary 4.7 for the infinite horizon optimal feedback law $\mu _\infty $ we obtain

$$\begin{aligned} J^{cl}_\infty (n_0,x_0,\mu _\infty ) = V_\infty (n_0,x_0) \end{aligned}$$

while for all other admissible feedback laws $\mu $ we get

$$\begin{aligned} J^{cl}_\infty (n_0,x_0,\mu ) \ge V_\infty (n_0,x_0). \end{aligned}$$

In other words, $V_\infty $ is a strict lower bound for $J^{cl}_\infty (n_0,x_0,\mu )$.

The following theorem now gives a relaxed dynamic programming condition from which we can derive both asymptotic stability and an upper bound on the infinite horizon cost $J^{cl}_\infty (n_0,x_0,\mu )$ for an arbitrary admissible feedback law $\mu $.

Theorem 4.11

(Asymptotic stability and suboptimality estimate) Consider a stage cost $\ell :\mathbb {N}_0\times X\times U\rightarrow \mathbb {R}_0^+$ and a function $V:\mathbb {N}_0\times X\rightarrow \mathbb {R}_0^+$. Let $\mu :\mathbb {N}_0\times \mathbb {X}\rightarrow U$ be an admissible feedback law and let $S(n)\subseteq \mathbb {X}$, $n\in \mathbb {N}_0$ be a family of forward invariant sets for the closed-loop system

$$\begin{aligned} x^+ = g(n,x) = f(x,\mu (n,x)). \end{aligned}$$

(4.13)

Assume there exists $\alpha \in (0,1]$ such that the relaxed dynamic programming inequality

$$\begin{aligned} V(n,x) \ge \alpha \ell (n,x,\mu (n,x)) + V(n+1,f(x,\mu (n,x)))\end{aligned}$$

(4.14)

holds for all $n\in \mathbb {N}_0$ and all $x\in S(n)$. Then the suboptimality estimate

$$\begin{aligned} J^{cl}_\infty (n,x,\mu ) \le V(n,x)/\alpha \end{aligned}$$

(4.15)

holds for all $n\in \mathbb {N}_0$ and all $x\in S(n)$.

If, in addition, there exist $\alpha _1,\alpha _2,\alpha _3\in \mathscr {K}_\infty $ such that the inequalities

$$\begin{aligned} \alpha _1(|x|_{x^{\mathrm{ref}}(n)}) \le V(n,x) \le \alpha _2(|x|_{x^{\mathrm{ref}}(n)}) \quad \text{ and } \quad \ell (n,x,u) \ge \alpha _3(|x|_{x^{\mathrm{ref}}(n)}) \end{aligned}$$

hold for all $x\in \mathbb {X}$, $n\in \mathbb {N}_0$, $u\in \mathbb {U}$ and a reference trajectory $x^{\mathrm{ref}}:\mathbb {N}_0 \rightarrow \mathbb {X}$, then the closed-loop system (4.13) is asymptotically stable on S(n) in the sense of Definition 2.16 .

Proof

In order to prove (4.15) consider $n\in \mathbb {N}_0$, $x\in S(n)$ and the trajectory $x_{\mu }(\cdot )$ of (4.13) with $x_\mu (n)=x$. By forward invariance of the sets S(n) this trajectory satisfies $x_\mu (n+k)\in S(n+k)$. Hence from (4.14) for all $k\in \mathbb {N}_0$ we obtain

$$\begin{aligned}&\alpha \ell (n+k,x_\mu (n+k),\mu (n+k,x_\mu (k))) \\&\;\; \le \;\; V(n+k,x_\mu (n+k)) - V(n+k+1,x_\mu (n+k+1)). \end{aligned}$$

Summing over k yields for all $K\in \mathbb {N}$

$$ \alpha \sum _{k=0}^{K-1} \ell (n+k,x_{\mu }(k),\mu (x_{\mu }(k))) \le V(n,x_\mu (n)) - V(n+K,x_\mu (n+K)) \le V(n,x)$$

since $V(n+K,x_\mu (n+K))\ge 0$ and $x_\mu (n)=x$. Since the stage cost $\ell $ is nonnegative, the term on the left is monotone increasing and bounded, hence for $K\rightarrow \infty $ it converges to $\alpha J^{cl}_\infty (n,x,\mu )$. Since the right hand side is independent of K, this yields (4.15).

The stability assertion now immediately follows by observing that V satisfies all assumptions of Theorem 2.22 with $\alpha _V=\alpha \,\alpha _3$. $\square $

Remark 4.12

An inspection of the proof of Theorems 2.19 and 2.22 reveals that for fixed $\alpha _1,\alpha _2\in \mathscr {K}_\infty $ and $\alpha _V=\alpha \,\alpha _3$ with fixed $\alpha _3\in \mathscr {K}_\infty $ and varying $\alpha \in (0,1]$ the attraction rate $\beta \in \mathscr {KL}$ constructed in this proof depends on $\alpha $ in the following way: if $\beta _{\alpha }$ and $\beta _{\alpha '}$ are the attraction rates from Theorem 2.22 for $\alpha _V=\alpha \,\alpha _3$ and $\alpha _V=\alpha '\alpha _3$, respectively, with $\alpha '\ge \alpha $, then $\beta _{\alpha '}(r,t)\le \beta _{\alpha }(r,t)$ holds for all $r,t\ge 0$. This in particular implies that for every $\bar{\alpha }\in (0,1)$ the attraction rate $\beta _{\bar{\alpha }}$ is also an attraction rate for all $\alpha \in [\bar{\alpha },1]$, i.e., we can find an attraction rate $\beta \in \mathscr {KL}$ which is independent of $\alpha \in [\bar{\alpha },1]$ .

Remark 4.13

Theorem 4.11 proves asymptotic stability of the discrete time closed-loop system (4.13) or (2.5). For a sampled data system (2.8 ) with sampling period $T>0$ this implies the discrete time stability estimate (2.47) for the sampled data closed-loop system (2.30 ) . For sampled data systems we may define the stage cost $\ell $ as an integral over a running cost function L according to (3.4), i.e.,

$$\begin{aligned} \ell (x,u) := \int _0^T L(\varphi (t,0,x,u),u(t))dt. \end{aligned}$$

We show that for this choice of $\ell $ a mild condition on L ensures that the sampled data closed-loop system (2.30) is also asymptotically stable in the continuous time sense, i.e., that (2.48) holds. For simplicity, we restrict ourselves to time invariant reference $x^{\mathrm{ref}}\equiv x_*$.

The condition we use is that there exists $\delta \in \mathscr {K}_\infty $ such that the vector field $f_c$ in (2.6) satisfies

$$\begin{aligned} \Vert f_c(x,u)\Vert \le \max \{ \varepsilon , \delta (1/\varepsilon ) L(x,u) \}\end{aligned}$$

(4.16)

holds for all $x\in X$, all $u\in U$ and all $\varepsilon >0$. For instance, in a linear–quadratic problem with $X=\mathbb {R}^d$, $U=\mathbb {R}^m$, and $x_*=0$ we have $\Vert f_c(x,u)\Vert =\Vert Ax+Bu\Vert \le C_1(\Vert x\Vert +\Vert u\Vert )$ and $L(x,u) = x^\top Qx + u^\top Ru \ge C_2 (\Vert x\Vert + \Vert u\Vert )^2$ for suitable constants $C_1$, $C_2>0$ provided Q and R are positive definite. In this case, (4.16) holds with $\delta (r) = C_1^2/C_2 r$, since $\Vert f_c(x,u)\Vert > \varepsilon $ implies $C_1(\Vert x\Vert +\Vert u\Vert )>\varepsilon $ and thus

$$ C_1(\Vert x\Vert +\Vert u\Vert ) \le \frac{C_1^2}{\varepsilon }(\Vert x\Vert +\Vert u\Vert )^2 \le \frac{C_1^2}{C_2 \varepsilon }C_2(\Vert x\Vert +\Vert u\Vert )^2 = \delta (1/\varepsilon )L(x,u).$$

In the general nonlinear case, (4.16) holds if $f_c$ is continuous with $f_c(x_*,u_*)=0$, L(x, u) is positive definite and the inequality $\Vert f_c(x,u)\Vert \le C L(x,u)$ holds for some constant $C>0$ whenever $\Vert f_c(x,u)\Vert $ is sufficiently large.

We now show that (4.16) together with Theorem 4.11 implies the continuous time stability estimate (2.48). If the assumptions of Theorem 4.11 hold, then (4.15) implies $\ell (x,\mu (x)) \le V(x)/\alpha \le \alpha _2(|x|_{x_*})/\alpha $. Thus, for $t\in [0,T]$ inequality (4.16) yields

$$\begin{aligned} |\varphi (t,0,x,\mu )|_{x_*}&\le |x|_{x_*} + \int _0^t \Vert f_c(\varphi (\tau ,0,x,\mu ),\mu (x)(\tau ))\Vert d\tau \\&\le |x|_{x_*} + \max \left\{ t\varepsilon ,\,\delta (1/\varepsilon ) \int _0^t L(\varphi (\tau ,0,x,\mu ),\mu (x)(\tau )) d\tau \right\} \\&\le |x|_{x_*} + \max \left\{ T\varepsilon , \,\delta (1/\varepsilon ) \ell (x,u) \right\} \\&\le |x|_{x_*} + \max \left\{ T\varepsilon , \,\delta (1/\varepsilon ) \alpha _2(|x|_{x_*})/\alpha \right\} \end{aligned}$$

Setting $\varepsilon =\tilde{\gamma }(|x|_{x_*})$ with

$$ \tilde{\gamma }(r)=\frac{1}{\delta ^{-1} \left( \frac{1}{\sqrt{\alpha _2(r)}}\right) } $$

for $r>0$ and $\tilde{\gamma }(0)=0$ yields $\tilde{\gamma }\in \mathscr {K}_\infty $ and

$$\begin{aligned} \delta (1/\varepsilon )\alpha _2(|x|_{x_*}) = \sqrt{\alpha _2(|x|_{x_*})}. \end{aligned}$$

Hence, defining

$$\begin{aligned} \gamma (r) = r + \max \{ T\tilde{\gamma }(r), \sqrt{\alpha _2(r)}/\alpha \} \end{aligned}$$

we finally obtain

$$\begin{aligned} |\varphi (t,0,x,\mu )|_{x_*} \le \gamma (|x|_{x_*}) \end{aligned}$$

for all $t\in [0,T]$ with $\gamma \in \mathscr {K}_\infty $.

Hence, if (4.16) and the assumptions of Theorem 4.11 hold, then the sampled data closed-loop system (2.30 ) fulfills the uniform boundedness over T property from Definition 2.24 and consequently by Theorem 2.27 the sampled data closed-loop system (2.30) is asymptotically stable .

We now turn to investigating practical stability. Recalling the Definitions 2.15 and 2.17 of P-practical asymptotic stability and their Lyapunov function characterizations in Theorems 2.20 and 2.23 we can formulate the following practical version of Theorem 4.11.

Theorem 4.14

(Asymptotic stability and suboptimality estimate) Consider a stage cost $\ell :\mathbb {N}_0\times X\times U\rightarrow \mathbb {R}_0^+$ and a function $V:\mathbb {N}_0\times X\rightarrow \mathbb {R}_0^+$. Let $\mu :\mathbb {N}_0\times \mathbb {X}\rightarrow U$ be an admissible feedback law and let $S(n)\subseteq \mathbb {X}$, and $P(n)\subset S(n)$, $n\in \mathbb {N}_0$ be families of forward invariant sets for the closed-loop system (4.13 ) .

Assume there exists $\alpha \in (0,1]$ such that the relaxed dynamic programming inequality (4.14) holds for all $n\in \mathbb {N}_0$ and all $x\in S(n)\setminus P(n)$. Then the suboptimality estimate

$$\begin{aligned} J_{k^*}^\mathrm{cl}(n,x,\mu ) \le V(n,x)/\alpha \end{aligned}$$

(4.17)

holds for all $n\in \mathbb {N}_0$ and all $x\in S(n)$, where $k^*\in \mathbb {N}_0$ is the minimal time with $x_\mu (k^*+n,n,x)\in P(k^*+n)$ and

$$ J_{k^*}^\mathrm{cl}(n,x,\mu ) := \sum _{k=0}^{k^*-1} \ell (n+k, x_\mu (n+k,n,x),\mu (x_\mu (n+k,n,x))) $$

is the truncated closed-loop performance functional from Definition 4.10.

If, in addition, there exist $\alpha _1,\alpha _2,\alpha _3\in \mathscr {K}_\infty $ such that the inequalities

$$\begin{aligned} \alpha _1(|x|_{x^{\mathrm{ref}}(n)}) \le V(n,x) \le \alpha _2(|x|_{x^{\mathrm{ref}}(n)}) \quad \text{ and } \quad \ell (n,x,u) \ge \alpha _3(|x|_{x^{\mathrm{ref}}(n)}) \end{aligned}$$

hold for all $x\in \mathbb {X}$, $n\in \mathbb {N}_0$, and $u\in U$ and a reference $x^{\mathrm{ref}}:\mathbb {N}_0\rightarrow \mathbb {X}$, then the closed-loop system (4.13) is P-asymptotically stable on S(n) in the sense of Definition 2.17 .

Proof

The proof follows with analogous arguments as the proof of Theorem 4.11 by only considering $k < k^*$ in the first part and using Theorem 2.23 with $Y(n)=S(n)$ instead of Theorem 2.22 in the second part. $\square $

Remark 4.15

(i) Note that Remark 4.12 holds accordingly for Theorem 4.14. Furthermore, it is easily seen that both Theorems 4.11 and 4.14 remain valid if f in (4.13) depends on n.

(ii) The suboptimality estimate (4.17 ) states that the closed-loop trajectories $x_\mu (\cdot ,x)$ from (4.13) behave like suboptimal trajectories until they reach the sets $P(\cdot )$.

As a consequence of Theorem 4.11, we can show the existence of a stabilizing almost optimal infinite horizon optimal feedback even if no infinite horizon optimal feedback exists. The assumptions of the following Theorem 4.16 are identical with the assumptions of Theorem 4.8 except that we do not assume the existence of an infinite horizon optimal feedback law $\mu _\infty $.

Theorem 4.16

Consider the the optimal control problem (OCP$_\infty ^\text {n}$) with stage cost $\ell $ of the form (3.8) for the control system (2.1) and a reference trajectory $x^{\mathrm{ref}}:\mathbb {N}_0\rightarrow \mathbb {X}$ with reference control sequence $u^{\mathrm{ref}}\in \mathbb {U}^\infty (x^{\mathrm{ref}}(0))$ . Assume that there exist $\alpha _1,\alpha _2,\alpha _3\in \mathscr {K}_\infty $ such that the inequalities (4.12) hold for all $x\in \mathbb {X}$, $n\in \mathbb {N}_0$, and $u\in U$.

Then for each $\alpha \in (0,1)$ there exists an admissible feedback $\mu _\alpha :\mathbb {N}_0\times \mathbb {X}\rightarrow U$ which asymptotically stabilizes the closed-loop system

$$\begin{aligned} x^+ = g(n,x) = f(x,\mu _\alpha (n,x)) \end{aligned}$$

on $\mathbb {X}$ in the sense of Definition 2.16 and satisfies

$$\begin{aligned} J^{cl}_\infty (n,x,\mu _\alpha ) \le V_\infty (n,x)/\alpha \end{aligned}$$

for all $x\in \mathbb {X}$ and $n\in \mathbb {N}_0$.

Proof

Fix $\alpha \in (0,1)$ and pick an arbitrary $x\in \mathbb {X}$. From (4.5) for $K=1$ for each $x\in \mathbb {X}$ and each $\varepsilon >0$ there exists $u_x^\varepsilon \in \mathbb {U}^1(x)$ with

$$\begin{aligned} V_\infty (n,x) \ge \ell (n,x,u_x^\varepsilon ) + V_\infty (n+1,f(x,u_x^\varepsilon ))-\varepsilon . \end{aligned}$$

If $V_\infty (n,x)>0$, then (4.12) implies $x\ne x^{\mathrm{ref}}(n)$ and thus again (4.12) yields the inequality $\inf _{u\in U}\ell (n,x,u)>0$. Hence, choosing $\varepsilon =(1-\alpha )\inf _{u\in U}\ell (n,x,u)$ and setting $\mu _\alpha (n,x) = u_x^\varepsilon $ yields

$$\begin{aligned} V_\infty (n,x) \ge \alpha \ell (n,x,\mu _\alpha (n,x)) + V_\infty (n+1,f(x,\mu _\alpha (n,x))).\end{aligned}$$

(4.18)

If $V_\infty (n,x)=0$, then (4.12) implies $x=x^{\mathrm{ref}}(n)$ and thus from the definition of $u^{\mathrm{ref}}$ we get $f(x,u^{\mathrm{ref}}(n))=x^{\mathrm{ref}}(n+1)$. Using (4.12) once again gives us $V_\infty (n+1,f(x,u^{\mathrm{ref}}(n)))=0$ and from (3.8) we get $\ell (n,x,u^{\mathrm{ref}}(n))=0$. Thus, $\mu _\alpha (n,x)=u^{\mathrm{ref}}(n)$ satisfies (4.18). Hence, we obtain (4.14) with $V=V_\infty $ for all $x\in \mathbb {X}$. In conjunction with (4.12) this implies that all assumptions of Theorem 4.11 are satisfied for $V=V_\infty $ with $S(n)=\mathbb {X}$. Thus, the assertion follows. $\square $

Again we can replace (4.12) by the asymptotic controllability condition from Definition 4.2.

Corollary 4.17

Consider the the optimal control problem (OCP$_\infty ^\text {n}$) for the control system (2.1) and a reference trajectory $x^{\mathrm{ref}}:\mathbb {N}_0\rightarrow \mathbb {X}$ with reference control sequence $u^{\mathrm{ref}}\in \mathbb {U}^\infty (x^{\mathrm{ref}}(0))$. Assume that the system is asymptotically controllable to $x^{\mathrm{ref}}$ and that the cost function $\ell :\mathbb {N}_0\times X\times U\rightarrow \mathbb {R}_0^+$ is of the form (4.3) with $\lambda =0$. Then for each $\alpha \in (0,1)$ there exists an admissible feedback $\mu _\alpha :\mathbb {N}_0\times \mathbb {X}\rightarrow U$ which asymptotically stabilizes the closed-loop system

$$\begin{aligned} x^+ = g(n,x) = f(x,\mu _\alpha (n,x)) \end{aligned}$$

on $\mathbb {X}$ in the sense of Definition 2.16 and satisfies

$$\begin{aligned} J^{cl}_\infty (n,x,\mu _\alpha ) \le V_\infty (n,x)/\alpha \end{aligned}$$

for all $x\in \mathbb {X}$ and $n\in \mathbb {N}_0$.

If, in addition, the asymptotic controllability has the small control property then the statement also holds for $\ell $ from (4.3) with arbitrary $\lambda \ge 0$.

Proof

Theorem 4.3 yields

$$\begin{aligned} \alpha _1(|x_0|_{x^{\mathrm{ref}}(n_0)}) \le V_\infty (n_0,x_0)\le \alpha _2(|x_0|_{x^{\mathrm{ref}}(n_0)}) \end{aligned}$$

for suitable $\alpha _1,\alpha _2\in \mathscr {K}_\infty $. Furthermore, by (4.3) the third inequality in (4.12) holds with $\alpha _3=\gamma _1^{-1}$. Hence, (4.12) holds and Theorem 4.16 yields the assertion. $\square $

While Theorem 4.16 and Corollary 4.17 are already nicer than Theorem 4.8 and Corollary 4.9, respectively, in the sense that no existence of an optimal feedback law is needed, for practical applications both theorems require the (at least approximate) solution of an infinite horizon optimal control problem, which is in general a hard, often infeasible computational task, see also the discussion in Sect. 4.4, below.

Hence, in the following chapters we are going to use Theorems 4.11 and 4.14 in a different way: we will derive conditions under which (4.14) is satisfied by the finite horizon optimal value function $V=V_N$ and the corresponding NMPC feedback law $\mu =\mu _N$. The advantage of this approach lies in the fact that in order to compute $\mu _N(x_0)$ it is sufficient to know the finite horizon optimal control sequence $u^\star $ for initial value $x_0$. This is a much easier computing task, at least if the optimization horizon N is not too large.

4.4 Notes and Extensions

Infinite horizon optimal control is a classical topic in control theory. The version presented in Sect. 4.1 can be seen as a nonlinear generalization of the classical (discrete time) linear-quadratic regulator (LQR) problem , see, e.g., Dorato and Levis [6]. A rather general existence result for optimal control sequences and trajectories in the metric space setting considered here was given by Keerthi and Gilbert [15]. Note, however, that by Theorem 4.16 we do not need the existence of optimal controls for the existence of almost optimal stabilizing feedback controls .

Dynamic programming as introduced in Sect. 4.2 is a very common approach also for infinite horizon optimal control and we refer to the discussion in Sect. 3.5 for some background information. As in the finite horizon case, the monographs of Bertsekas [2, 3] provide a good source for more information on this method.

The connection between infinite horizon optimal control and stabilization problems for nonlinear systems has been recognized for quite a while. Indeed, the well known construction of control Lyapunov functions in continuous time by Sontag [23] is based on techniques from infinite horizon optimal control. As already observed after Corollary 4.7, discrete time infinite horizon optimal control is nothing but NMPC with $N=\infty $. This has lead to the investigation of infinite horizon NMPC algorithms , e.g., by Keerthi and Gilbert [16], Meadows and Rawlings [19], and Alamir and Bornard [1]. For linear systems, this approach was also considered in the monograph of Bitmead, Gevers, and Wertz [4].

The stability results in this chapter are easily generalized to the stability of sets when $\ell $ is of the form (3.24). In this case, it suffices to replace the bounds $\alpha _j(|x|_{x^{\mathrm{ref}}(n)})$, $j=1,2,3$, in, e.g., Theorem 4.11 by bounds of the form

$$\begin{aligned} \alpha _j\left( \min _{y\in X^{\mathrm{ref}}(n)} |x|_{y}\right) . \end{aligned}$$

(4.19)

Alternatively, one could formulate these bounds via so called proper indicator functions as used, e.g., by Grimm et al. in [8].

By Formula (4.8) the optimal—and stabilizing—feedback law $\mu _\infty $ can be computed by solving a rather simple optimization problem once the optimal value function $V_\infty $ is known. This has motivated a variety of approaches for solving the dynamic programming equation (4.5) (usually for $K=1$) numerically in order to obtain an approximation of $\mu _\infty $ from a numerical approximation of $V_\infty $. Approximation techniques like linear and multilinear approximations are proposed, e.g., in Kreisselmeier and Birkhölzer [17], Camilli, Grüne, and Wirth [5], or by Falcone [7]. A set oriented approach was developed in Junge and Osinga [14] and used for computing stabilizing feedback laws in Grüne and Junge [10] (see also [11, 12] for further improvements of this method). All such methods, however, suffer from the so called curse of dimensionality, which means that the numerical effort grows exponentially with the dimension of the state space $X$. In practice, this means that these approaches can only be applied for low-dimensional systems, typically not higher than 4–5. For homogeneous systems, Tuna [25] (see also Grüne [9]) observed that it is sufficient to compute $V_\infty $ on a sphere, which reduces the dimension of the problem by one. Still, this only slightly reduces the computational burden. In contrast to this, a numerical approximation of the optimal control sequence $u^\star $ for finite horizon optimal control problems like (OCP$_\text {N}$) and its variants is possible also in rather high space dimensions, at least when the optimization horizon N is not too large. This makes the NMPC approach computationally attractive.

Relaxed dynamic programming in the form introduced in Sect. 4.3 was originally developed by Lincoln and Rantzer [18] and Rantzer [20] in order to lower the computational complexity of numerical dynamic programming approaches. Instead of trying to solve the dynamic programming equation (4.5) exactly, it is only solved approximately using numerical approximations for $V_\infty $ from a suitable class of functions, e.g., polynomials. The idea of using such relaxations is classical and can be realized in various other ways, too, see, e.g., [2, Chapter 6]. Here we use relaxed dynamic programming not for solving (4.5) but rather for proving properties of closed-loop solutions, cf. Theorems 4.11 and 4.14. While the specific form of the assumptions in these theorems were first used in an NMPC context in Grüne and Rantzer [13], the conceptual idea is actually older and can be found, e.g., in Shamma and Xiong [22] or in Scokaert, Mayne and Rawlings [21]. The fact that stability of the sampled data closed-loop can be derived from the stability of the associated discrete time system for integral costs (3.4), cf. Remark 4.13, was, to the best of our knowledge, not observed before.

References

Alamir, M., Bornard, G.: Stability of a truncated infinite constrained receding horizon scheme: the general discrete nonlinear case. Automatica 31(9), 1353–1356 (1995)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. I, 3rd edn. Athena Scientific, Belmont (2005)
MATH Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. II, 2nd edn. Athena Scientific, Belmont (2001)
MATH Google Scholar
Bitmead, R.R., Gevers, M., Wertz, V.: Adaptive optimal control. The thinking man’s GPC. International Series in Systems and Control Engineering. Prentice-Hall, New York (1990)
Google Scholar
Camilli, F., Grüne, L., Wirth, F.: A regularization of Zubov’s equation for robust domains of attraction. In: Isidori, A., Lamnabhi-Lagarrigue, F., Respondek, W. (eds.) Nonlinear Control in the Year 2000, vol. 1. Lecture Notes in Control and Information Science, vol. 258, pp. 277–289. Springer, London (2001)
Google Scholar
Dorato, P., Levis, A.H.: Optimal linear regulators: the discrete-time case. IEEE Trans. Automat. Control 16, 613–620 (1971)
Article MathSciNet Google Scholar
Falcone, M.: Numerical solution of dynamic programming equations. Appendix A in Bardi, M. and Capuzzo Dolcetta, I., Optimal control and viscosity solutions of Hamilton–Jacobi–Bellman equations, Birkhäuser, Boston (1997)
Google Scholar
Grimm, G., Messina, M.J., Tuna, S.E., Teel, A.R.: Model predictive control: for want of a local control Lyapunov function, all is not lost. IEEE Trans. Automat. Control 50(5), 546–558 (2005)
Article MathSciNet Google Scholar
Grüne, L.: Homogeneous state feedback stabilization of homogeneous systems. SIAM J. Control Optim. 38, 1288–1314 (2000)
Article MathSciNet MATH Google Scholar
Grüne, L., Junge, O.: A set oriented approach to optimal feedback stabilization. Syst. Control Lett. 54, 169–180 (2005)
Article MathSciNet MATH Google Scholar
Grüne, L., Junge, O.: Global optimal control of perturbed systems. J. Optim. Theory Appl. 136, 411–429 (2008)
Article MathSciNet MATH Google Scholar
Grüne, L., Junge, O.: Set oriented construction of globally optimal controllers. at -. Automatisierungstechnik 57, 287–295 (2009)
Article Google Scholar
Grüne, L., Rantzer, A.: On the infinite horizon performance of receding horizon controllers. IEEE Trans. Automat. Control 53, 2100–2111 (2008)
Article MathSciNet Google Scholar
Junge, O., Osinga, H.M.: A set oriented approach to global optimal control. ESAIM Control Optim. Calc. Var. 10, 259–270 (2004)
Article MathSciNet MATH Google Scholar
Keerthi, S.S., Gilbert, E.G.: An existence theorem for discrete-time infinite-horizon optimal control problems. IEEE Trans. Automat. Control 30(9), 907–909 (1985)
Article MathSciNet MATH Google Scholar
Keerthi, S.S., Gilbert, E.G.: Optimal infinite-horizon feedback laws for a general class of constrained discrete-time systems: stability and moving-horizon approximations. J. Optim. Theory Appl. 57(2), 265–293 (1988)
Article MathSciNet MATH Google Scholar
Kreisselmeier, G., Birkhölzer, T.: Numerical nonlinear regulator design. IEEE Trans. Automat. Control 39, 33–46 (1994)
Article MathSciNet MATH Google Scholar
Lincoln, B., Rantzer, A.: Relaxing dynamic programming. IEEE Trans. Automat. Control 51(8), 1249–1260 (2006)
Article MathSciNet Google Scholar
Meadows, E.S., Rawlings, J.B.: Receding horizon control with an infinite cost. In: Proceedings of the American Control Conference - ACC 1993, pp. 2926–2930. San Francisco, California, USA (1993)
Google Scholar
Rantzer, A.: Relaxed dynamic programming in switching systems. IEE Proc. - Control Theory Appl. 153(5), 567–574 (2006)
Article MathSciNet Google Scholar
Scokaert, P.O.M., Mayne, D.Q., Rawlings, J.B.: Suboptimal model predictive control (feasibility implies stability). IEEE Trans. Automat. Control 44(3), 648–654 (1999)
Article MathSciNet MATH Google Scholar
Shamma, J.S., Xiong, D.: Linear nonquadratic optimal control. IEEE Trans. Automat. Control 42(6), 875–879 (1997)
Article MathSciNet MATH Google Scholar
Sontag, E.D.: A Lyapunov-like characterization of asymptotic controllability. SIAM J. Control Optim. 21(3), 462–471 (1983)
Article MathSciNet MATH Google Scholar
Sontag, E.D.: Comments on integral variants of ISS. Syst. Contr. Lett. 34, 93–100 (1998)
Article MathSciNet MATH Google Scholar
Tuna, E.S.: Optimal regulation of homogeneous systems. Automatica 41, 1879–1890 (2005)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Mathematisches Institut, Universität Bayreuth, Bayreuth, Germany
Lars Grüne
Bremer Institut für Produktion und Logistik (BIBA), Universität Bremen, Bremen, Germany
Jürgen Pannek

Authors

Lars Grüne
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Pannek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lars Grüne .

Problems

1.
Consider the problem (OCP$_\infty ^\text {n}$) with finite optimal value function $V_\infty :\mathbb {N}_0\times X\rightarrow \mathbb {R}_0^+$ and asymptotically stabilizing admissible optimal feedback law $\mu _\infty :\mathbb {N}_0\times \mathbb {X}\rightarrow U$. Let $V:\mathbb {N}_0\times X\rightarrow \mathbb {R}_0^+$ be a function which satisfies
$$\begin{aligned} V(n,x_0)= \min _{u\in \mathbb {U}^1(x_0)} \left\{ \ell (n,x_0,u) + V_\infty (n+1,f(x_0,u))\right\} \end{aligned}$$
(4.20)
for all $n\in \mathbb {N}_0$ and all $x_0\in X$.
1. (a)
  Prove that $V(n,x)\ge V_\infty (n,x)$ holds for all $n\in \mathbb {N}_0$ and all $x\in \mathbb {X}$.
2. (b)
  Prove that for the optimal feedback law the inequality
  $$ V(n,x_0)-V_\infty (n,x_0) \le V(n+1,f(x,\mu _\infty (n,x))) - V_\infty (n+1,f(x,\mu _\infty (n,x))) $$
  holds for all $n\in \mathbb {N}_0$ and all $x\in \mathbb {X}$.
3. (c)
  Assume that in addition there exist $\alpha _2\in \mathscr {K}_\infty $ such that the inequality
  $$\begin{aligned} V(n,x) \le \alpha _2(|x|_{x^{\mathrm{ref}}(n)}) \end{aligned}$$
  holds for all $n\in \mathbb {N}_0$, $x\in \mathbb {X}$ and a reference trajectory $x^{\mathrm{ref}}:\mathbb {N}_0 \rightarrow \mathbb {X}$. Prove that under this condition $V(n,x) = V_\infty (n,x)$ holds for all $n\in \mathbb {N}_0$ and all $x\in \mathbb {X}$.
4. (d)
  Find a function $V:\mathbb {N}_0\times X\rightarrow \mathbb {R}_0^+$ satisfying (4.20) but for which $V(n,x) = V_\infty (n,x)$ does not hold. Of course, for this function the additional condition on V from (c) must be violated.
Hint for (a): Define a feedback $\mu $ which assigns to each pair (n, x) a minimizer of the right hand side of (4.20), check that Theorem 4.11 is applicable for $S(n)=\mathbb {X}$ (for which $\alpha \in (0,1]$?) and conclude the desired inequality from (4.15). Hint for (c): Perform an induction over the inequality from (b) along the optimal closed-loop trajectory.
2.
Consider the unconstrained linear control system
$$\begin{aligned} x^+ = A x + B u \end{aligned}$$
with matrices $A \in \mathbb {R}^{d \times d}$, $B \in \mathbb {R}^{d \times m}$. Consider problem (OCP$_\infty ^\text {n}$) with
$$\begin{aligned} \ell (x,u) = x^\top Q x + u^\top R u \end{aligned}$$
with symmetric positive definite matrices Q, R of appropriate dimension (this setting is called the linear-quadratic regulator (LQR) problem ). If the pair (A, B) is stabilizable, then it is known that the discrete time algebraic Riccati equation
$$\begin{aligned} P = Q + A^\top ( P - P B ( B^\top P B + R )^{-1} B^\top P ) A. \end{aligned}$$
has a unique symmetric and positive definite solution $P \in \mathbb {R}^{d \times d}$.
1. (a)
  Show that the function $V(x)=x^\top P x$ satisfies (4.20). Note that since the problem here is time invariant we do not need the argument n.
2. (b)
  Use the results from Problem 1 to conclude that $V_\infty (x)=x^\top Px$ holds.
3. (c)
  Prove that the corresponding optimal feedback law asymptotically stabilizes the equilibrium $x_*=0$.
Hint for (a): You may use that for symmetric matrices C, D, E of appropriate dimension with D positive definite the formula
$$ \min _{u\in \mathbb {R}^m} \{ x^\top Cx + u^\top Du + u^\top Ex + x^\top Eu\} = c^\top Cx - x^\top E D^{-1}x $$
holds. This formula is proved by computing the zero of the derivative of the expression in the “min” with respect to u (which is also a nice exercise). Hint for (b) and (c): Use that for any symmetric and positive definite matrix $M\in \mathbb {R}^{d\times d}$ there exist constants $C_2\ge C_1>0$ such that the inequality $C_1\Vert x\Vert ^2 \le x^\top M x \le C_2\Vert x\Vert ^2$ holds for all $x\in \mathbb {R}^d$.
3.
Consider the finite horizon counterpart (OCP$_\text {N}$) of Problem 2. For this setting one can show that the optimal value function is of the form $V_N(x) = x^\top P_N x$ and that the matrix $P_N$ converges to the matrix P from Problem 2 as $N\rightarrow \infty $. This convergence implies that for each $\varepsilon >0$ there exists $N_\varepsilon >0$ such that the inequality
$$\begin{aligned} |x^\top P_N x - x^\top P x| \le \varepsilon \Vert x\Vert ^2 \end{aligned}$$
holds for all $N\ge N_\varepsilon $. Use this property and Theorem 4.11 in order to prove that the NMPC feedback law from Algorithm 3.1 is asymptotically stabilizing for sufficiently large optimization horizon $N>0$. Hint: Look at the hint for Problem 2 (b) and (c).
4.
Consider the scalar control system
$$\begin{aligned} x^+ = x + u \end{aligned}$$
with $x \in X= \mathbb {R}$, $u \in U= \mathbb {R}$ which shall be controlled via the NMPC Algorithm 3.1 using the quadratic stage cost function
$$\begin{aligned} \ell (x,u) = x^2 + u^2 \end{aligned}$$
Compute $V_N(x_0)$ und $J^{cl}_\infty (x_0, \mu _N(\cdot ))$ for $N = 2$ (cf. Chap. 3, Problem 3). Using these values, derive the degree of suboptimality $\alpha $ from the relaxed dynamic programming inequality (4.14) and from the suboptimality estimate (4.15).

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Grüne, L., Pannek, J. (2017). Infinite Horizon Optimal Control. In: Nonlinear Model Predictive Control. Communications and Control Engineering. Springer, Cham. https://doi.org/10.1007/978-3-319-46024-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-46024-6_4
Published: 11 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46023-9
Online ISBN: 978-3-319-46024-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Abstract

4.1 Definition and Well Posedness of the Problem

Definition 4.1

Definition 4.2

Theorem 4.3

Proof

4.2 The Dynamic Programming Principle

Theorem 4.4

Proof

Corollary 4.5

Proof

Theorem 4.6

Proof

Corollary 4.7

Proof

Theorem 4.8

Proof

Corollary 4.9

Proof

4.3 Relaxed Dynamic Programming

Definition 4.10

Theorem 4.11

Proof

Remark 4.12

Remark 4.13

Theorem 4.14

Proof

Remark 4.15

Theorem 4.16

Proof

Corollary 4.17

Proof

4.4 Notes and Extensions

References

Author information

Authors and Affiliations

Corresponding author

Problems

Problems

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation