On Generalized Bolza Problem and Its Application to Dynamic Optimization
 61 Downloads
Abstract
We consider two classes of problems: unconstrained variational problems of Bolza type and optimal control problems with state constraints for systems governed by differential inclusions, both under fairly general assumptions, and prove necessary optimality conditions for both of them. The proofs using techniques of variational analysis are rather short, compared to the existing proofs, and the results seem to cover and extend the now available. The key step in the proof of the necessary conditions for the second problem is an equivalent reduction to one or a sequence of reasonably simple versions of the first.
Keywords
Optimal control Differential inclusion Maximum principle Variational analysis Subdifferential calculusMathematics Subject Classification
49K21 49J53 49J52 58C061 Introduction
The ultimate purpose of the paper is to study optimal control problems with state constraints for systems governed by differential inclusions. This is a fairly nontrivial class of optimization problems. The result we prove seems to offer the sharpest and most general necessary optimality conditions. It strengthens Clarke’s recent theorem [1] (so far the strongest for problems without state constraints), in particular, by extending it to problems involving state constraints, and accordingly weakens the assumptions on which the principal results presented in the monograph by Vinter [2] (for problems with state constraints) are based.
But I believe the way the result has been obtained deserves a special, if not the main, attention. The key element of the proof is reduction of the original problem to unconstrained minimization of a functional that looks similar to the classical Bolza problem of calculus of variations with Lipschitz integrand and offintegral term. (Actually, in the absence of state constraints it is a standard Bolza problem with nonsmooth integrand and offintegral term. In the presence of state constraints, the latter has more complicated structure.) The proof of necessary optimality conditions for strong minimum in this “generalized Bolza problem” is a combination of a simple relaxation theorem that allows to reduce the task to finding necessary conditions for weak minimum in the relaxed problem with subsequent application of standard rules of the nonconvex subdifferential calculus which is a welldeveloped chapter of local variational analysis.
The idea behind the reduction is actually very simple: if in an optimization problem the cost function is Lipschitz near a solution and constraints are subregular at the point, then the problem admits exact penalization with a Lipschitz (but necessarily nonsmooth) penalty function giving a certain measure of violation of constraints. This fact was first mentioned and used in [3], but the very idea to use one or another type of exact penalization appeared even earlier, at the very dawn of variational analysis. Ioffe and Tikhomirov in [4] and Rockafellar in [5] used such reduction to prove the existence of solutions in optimal control problems. Clarke in [6] applied similar reduction to prove the maximum principle for optimal control problems under the calmness assumption. This is a fairly weak assumption (implied, in particular, by metric regularity of constraints), although difficult to verify directly. However, the techniques used in [6] did not work in the absence of calmness and a different approach (associated with controllability) was chosen in the accompanying paper [7] to prove a full version of the maximum principle.
In the three mentioned works, the reduction was based on the standard trick when a mapping was replaced by the indicator function of its graph, so the penalty functions appeared to be extended realvalued. Loewen [8] was the first to use a Lipschitz penalty to reduce an optimal control problem with free end points (automatically calm) to a Bolza problem with everywhere finite integrand, even having certain Lipschitz properties. His proof of the maximum principle for such problems was amazingly simple. But again, the techniques could not be applied to problems with fixed end points and, like Clarke, Loewen had to use another technique for the proof of a general maximum principle.
A proof fully based on reduction to an unconstrained Bolza problem was given by Ioffe in [9] with the help of different exact penalization techniques associated with subregularity property of constraints. This technique was later formalized in [17] in terms of a certain “optimality alternative” (see the last section). The other key factor that made the proof in [9] possible was the necessary condition for Bolza problems obtained by Ioffe and Rockafellar in [10]. This paper was a crucial step in another (but associated) line of developments related to the very structure of necessary optimality conditions in optimal control problems with differential inclusions.
Here is a brief account of the developments up to the mentioned Clarke’s work [1]. Starting with his early papers [6, 7] and up to Loewen’s book [8], the adjoint Euler–Lagrange inclusions were stated it terms of Clarke’s normal cones which are convex closures of limiting normal cones. In the pioneering papers by Smirnov [11] and Loewen and Rockafellar [12], a new and substantially more precise version of the Euler–Lagrange inclusion was introduced that involved only partial convexification of the limiting normal cones. The problem was that in both cases the proof needed convexity of values of the setvalued mapping in the righthand side of the inclusion. Mordukhovich in [13] showed that a modification of Smirnov’s proof (for boundedvalued and Hausdorff Lipschitz setvalued maps) allows to establish the same type of Euler–Lagrange inclusion without the convexity assumption. However, the maximum principle was lost in this proof. In 1997, Ioffe in the mentioned paper [9] and Vinter and Zheng [14], using a different approach, did prove that both the partially convexified Euler–Lagrange inclusion and the maximum principle are necessary for a strong local minimum in optimal control problems with differential inclusions under fairly general conditions, basically the same as in [12] but without the convexity assumption on the setvalued map. A detailed proof of the result combining the approaches of [9, 14] is present in Vinter’s monograph [2].
Another line of developments concerns Lipschitz properties of the setvalued mapping in the righthand side of the differential inclusion near he optimal trajectory (see the comments after the statement of Theorem 3.3).
The plan of the paper is the following. The next section contains information, mainly from variational analysis, which is necessary for the statements and proofs of the main results. Section 3 is central. It contains the statements of the problems and the main theorems. We also give in this section some applications and versions of the theorems. The subsequent two sections contain the proofs of the main theorems for the generalized Bolza problem (Sect. 4) and the optimal control problem (Sect. 5), and in a short last section we return to a question that has remained open for quite a while. To conclude, we mention that for optimal control problems in the classical Pontryagin form the proofs can be noticeably simplified. We hope to discuss this in detail elsewhere.
2 Preliminaries
2.1 Subdifferentials
We shall need several types of subdifferentials: proximal and limiting subdifferential in \(\mathbb {R}^n\) [2, 15] and in general Hilbert spaces [16], DiniHadamard subdifferential in separable Banach spaces, and Gsubdifferential [17, 18] and Clarke’s generalized gradients in Banach spaces [19], all coinciding with the subdifferential in the sense of convex analysis for convex functions. We shall denote by \(\partial _p\) the proximal subdifferential, by \(\partial _C\) the generalized gradient of Clarke and by \(\partial \) both the limiting subdifferential in \(\mathbb {R}^n\) and the subdifferential in the sense of convex analysis, by \(\partial ^\) the DiniHadamard subdifferential and by \(\partial _G\) the Gsubdifferential in the general Banach space.
Here are some basic facts about these subdifferentials we shall need in proofs.
Proposition 2.1
The uniform lower semicontinuity property is satisfied, in particular, when all functions but possibly one of them are Lipschitz near x.
Proposition 2.2
Note that it is assumed in [16] that \(\varphi \) does not depend on t and is globally Lipschitz as a function of x. But neither of the assumptions actually plays any role in the proof.
Proposition 2.3
([17], Corollary 7.15) Let X be a Banach space, and let the functions \(f_i\), \(i=1,\ldots ,k\) be Lipschitz continuous in a neighborhood of a certain \({\overline{x}}\). Set \(f(x)=\max _if_i(x)\) and let \(I=\{i:\; f_i({\overline{x}})=f({\overline{x}}) \}\). Then \(\partial _G(f({\overline{x}}))\subset \sum \alpha _i \partial _Gf_i({\overline{x}})\) over all \(\alpha _i\ge 0\) such that \(\; \alpha _i= 0\) for \(i\not \in I\) and \( \sum \alpha _i=1\).
Proposition 2.4
Surprisingly the next result seems to be absent in the literature, so we supply it with a complete proof (in which we use without explanation the definition of the Gsubdifferential in a separable Banach space).
Proposition 2.5
Proof
Remark 2.1
Remark 2.2
The previous results remain valid for a function \(g=f\circ F\) with a continuously differentiable \(F: X\rightarrow Y\) if as A we take the derivative of F at the point of interest.
2.2 Lipschitz Properties of SetValued Mappings
We denote by \(F: Q\rightrightarrows P\) a setvalued mapping from Q to P. If both Q and P are topological spaces, then F is said to be a closed mapping if its graph is closed in the product topology. The expressions like “closedvalued mapping,” “convexvalued mapping,” etc., need no explanation.
Speaking about Lipschitz properties of maps, we of course assume that both the domain and range spaces are at least metric. Here we mainly deal with mappings between Euclidean spaces and shall use the corresponding notation and language starting with this point. The simplest and the most convenient Lipschitz property of a setvalued map to work with occurs when the mapping is Lipschitz with respect to the Hausdorff metric in the range space. This, however, is a very restrictive property, especially when the values of the mapping are unbounded sets.
Proposition 2.6
 (i)
F has the Aubin property at \((\bar{x},\bar{y})\);
 (ii)
there are \(r>0,\ R>0\) and \(\varepsilon >0\) such that the function \(x\rightarrow d(y,F(x))\) is RLipschitz in the \(\varepsilon \)ball around \({\overline{x}}\) for any y with \(\Vert y{\overline{y}}\Vert \le r\).
The implication \((ii)\rightarrow (i)\) is straightforward. The opposite implication is well known (see, e.g., [15], Exercise 9.37). Note also that the \(\varepsilon \) and R in both properties coincide, while the r in (ii) may be smaller.
We shall also use a noninfinitesimal version of the pseudoLipschitz property with a priory fixed constants r, R and \(\varepsilon \). The following fundamental result shows that a part of the mapping associated with the property admits lifting to a boundedvalued mapping which is Lipschitz with respect to the Hausdorff metric.
Proposition 2.7
This is a slight generalization of a Clarke’s result ([1], Lemma 1) in which C is a one point set. We omit the proof as it repeats almost word for word the original proof by Clarke.
Lipschitz continuity of \(d(y,F(\cdot ))\), on the other hand, offers a convenient way to compute normal cones to the graph of F which naturally appear in necessary optimality conditions in problems involving setvalued mappings.
Proposition 2.8
([9], Proposition 1) Let \(F: \mathbb {R}^n\rightrightarrows \mathbb {R}^n\) be a closedvalued mapping having the Aubin property at some \((\bar{x},\bar{y})\in \mathrm{Graph}~F\). Then the (limiting) normal cone to \(\mathrm{Graph}~F\) at \((\bar{x},\bar{y})\) is generated by the limiting subdifferential of the function \((x,y)\rightarrow d(y,F(x))\) at \((\bar{x},\bar{y})\).
The last result we have to mention offers a geometric characterization of the Aubin property.
Proposition 2.9
Let \(F: \mathbb {R}^n\rightrightarrows \mathbb {R}^m\) be a setvalued mapping with closed graph that has the Aubin property at \((\bar{x},\bar{y})\in \mathrm{Graph}~F\) with Lipschitz constant R. Let \((q,p)\in N(\mathrm{Graph}~F,(\bar{x},\bar{y}))\). Then \(\Vert q\Vert \le R\Vert p\Vert \).
This is a simple and wellknown fact whose proof, however, does not seem to be explicitly available in the existing literature. The inequality is straightforward for Fréchet normal cones (or regular normals in terminology of [15]). The Fréchet normal cone to the graph of F at (x, y) is defined as the collection of pairs (q, p) such that \(\langle q,h\rangle \langle p,v\rangle \le o(\Vert h\Vert +\Vert v\Vert )\) if h and v are sufficiently small and \(y+v\in F(x+h)\). By the Aubin property, given an h, there is a v satisfying \(\Vert v\Vert \le R\Vert h\Vert \). If now we take \(h=\lambda q\), then we get \(\lambda \Vert q\Vert ^2\le \langle p,v\rangle \le \lambda R \Vert p\Vert \Vert q\Vert +o(\Vert \lambda q\Vert )\). It remains to take into account that \(N(\mathrm{Graph}~F,(\bar{x},\bar{y}))\) is the outer limit of Fréchet normal cones at \((x,y)\in \mathrm{Graph}~F\), when \((x,y)\rightarrow (\bar{x},\bar{y})\).
2.3 Measurability
A setvalued mapping \(F:[0,T]\rightrightarrows \mathbb {R}^m\) is measurable if its graph belongs to the sigmaalgebra generated by all products \(\varDelta \times Q\), where \(\varDelta \subset [0,T]\) is Lebesgue measurable and \(Q\subset \mathbb {R}^n\) is open. For closedvalued and openvalued mappings, a convenient equivalent definition can be given in terms of socalled Castaing representation: F is measurable if and only if there is a countable collection \((u_i(\cdot ))\) of measurable selections of F such that for almost every t the set \(\{u_i(t),i=1,2,\ldots \}\) is dense in F(t).
A setvalued mapping \(F: [0,T]\times \mathbb {R}^n\rightrightarrows \mathbb {R}^m\) is called measurable (in the standard sense) if the setvalued mapping \(t\rightarrow \mathrm{Graph}~F(t,\cdot )\) is measurable. An extended realvalued function f on \( [0.T]\times \mathbb {R}^n\) is measurable (in the standard sense) if so is the mapping \(\mathrm{epi}~f(t)=\{{t,\alpha }:\; \alpha \ge f(t) \}\).
The most important fact is that all basic operations used in analysis preserve measurability. For details, we refer to [15, 17].
2.4 Relaxation
Consider the functional \(\int _0^TL(t,x(t),\dot{x}(t)){\mathrm{d}}t\), assuming that the integrand L is continuous in x and L(t, x(t), u(t)) is measurable whenever \(x(\cdot )\) is continuous and \(u(\cdot )\) is measurable.
Proposition 2.10
This is a simplified version of Bogolyubov’s convexification theorem [20] (see also [4] for more details).
Proof
Take \(\delta < 1\) to guarantee that \(\delta \Vert u_i(\cdot )\dot{x}(\cdot )\Vert _{L^1}<\varepsilon /2\) for all \(i=1,\ldots ,k\). Let \(\alpha _i\) and \(x(\cdot )\) satisfy the conditions with the chosen \(\delta \).
3 Statements of the Problems and Main Results
In what follows, we shall fix some \({\overline{x}}(\cdot )\in W^{1,1}\) which will be assumed a local minimizer of J in one or another sense. Here are the basic hypotheses on the functions \(\varphi \) and L:
Note that the conditions do not exclude the possibility that L is extended realvalued, in particular, that \(L(t,x,y)= \infty \) if \(y\not \in D(t)\), even if x is close to \({\overline{x}}(t)\). It should also be taken into account that \(\inf r(t)\) can be zero and L need not be Lipschitz with respect to the last argument.
Theorem 3.1
An interesting (and perhaps the most interesting) case corresponds to \(D(t)\equiv \mathbb {R}^n\). It will be easy to see from the proof that in this case the Euler inclusion (5) is a necessary condition for a weak minimum in (P), that is, on a set of \(x(\cdot )\) that are \(W^{1,\infty }\)close to \({\overline{x}}(\cdot )\). Looking ahead, we just say that the proof of this fact is noticeably simpler and need not any reference to the relaxation theorem.
We deduce from the theorem in this case that the Euler inclusion (5) and the Weierstrass condition (6) together give a necessary condition for a strong minimum in the classical sense, that is, when there is an \(\varepsilon >0\) such that \(J(x(\cdot ))\ge J({\overline{x}}(\cdot ))\) for all \(x(\cdot )\in W^{1,1}\) satisfying \(\Vert x(t){\overline{x}}(t)\Vert <\varepsilon ,\; \forall \ t\). Another point we wish to mention in this connection is that L(t, x, y) have to satisfy the Lipschitz property w.r.t x only for y in a possibly small neighborhood of \(\dot{{\overline{x}}}(t)\).
Theorem 3.2
Proof

(H\(_3\)) \(\ell \) is Lipschitz near \(({\overline{x}}(0),{\overline{x}}(T))\), \(S\subset \mathbb {R}^n\times \mathbb {R}^n\) is closed;
 (H\(_4\)) g is upper semicontinuous on the set \(\{(t,x):\; t\in [0,T],\; \Vert x{\overline{x}}(t)\Vert \le \overline{\varepsilon }\}\) and there is a \(K>0\) such that for \(x,x'\in B({\overline{x}}(t),\overline{\varepsilon })\)$$\begin{aligned} g(t,x) g(t,x')\le K\Vert xx'\Vert \quad \mathrm{a.e.\ on}\; [0,T]. \end{aligned}$$
 (H\(_5\)) F is closedvalued and measurable in the standard sense and there are a measurable \({\overline{r}}(t)>0\) bounded away from zero, a summable \({\overline{R}}(t)\ge 0\) and an \(\eta \in ]0,1[\) such that the relationshold for all \( x,x'\in B({\overline{x}}(t),\overline{\varepsilon })\).$$\begin{aligned}&F(t,x)\cap B(\dot{{\overline{x}}}(t)),{\overline{r}}(t)\subset F(t,x')+{\overline{R}}(t)\Vert xx'\Vert B,\nonumber \\&F(t,x)\cap B(\dot{{\overline{x}}}(t),(1\eta ){\overline{r}}(t))\ne \emptyset \end{aligned}$$(9)
Theorem 3.3
 (i)
\(\lambda + \Vert p(\cdot )\Vert + \mu ([0,T])=1\) (nontriviality);
 (ii)
\((p(0),p(T)\gamma (T)\mu (\{T\})\in \lambda \partial \ell ({\overline{x}}(0),{\overline{x}}(T)) + N(S,(x(0),x(T)))\) (transversality);
 (iii)
\(q(t)\in \mathrm{conv}~\{q:\; (q,p(t))\in N(\mathrm{Graph}~F(t,\cdot ),({\overline{x}}(t),\dot{{\overline{x}}}(t)))\}\) a.e. on [0, T] (Euler–Lagrange inclusion);
 (iv)
\(\langle p(t),y\dot{{\overline{x}}}(t)\rangle \le 0\), \(\forall \; y\in U(t)\) a.e. on [0, T] (maximum principle).
By (H\(_5\)), \(F(t,\cdot )\) has the Aubin property at every \(u\in F(t,{\overline{x}}(t))\) with \(\Vert u\dot{{\overline{x}}}(t)\Vert <{\overline{r}}(t)\). Thus, replacing in the statement U(t) by \(B(\dot{{\overline{x}}}(t),{\overline{r}}(t))\), we get an extension of Clarke’s “stratified maximum principle” to optimal control problems with state constraints. It seems to be appropriate at this point to add a few words concerning the evolution of the assumptions concerning Lipschitz properties of F. In [7, 8, 11], the mapping was assumed globally Lipschitz in the state variable with the Lipschitz constant being a summable function of the time variable. In [12], this property was weakened and replaced by a certain global version of the Aubin pseudoLipschitz property with a linear growth of the Lipschitz constant as a function of the radius of the ball on which the mapping is considered. This assumption (also applied in [9, 14]), although still sufficiently restrictive, made possible meaningful treatment of differential inclusions with unbounded righthand sides. Finally, Clarke’s stratified maximum principle implied that arbitrary rate of growth works as well.
Remark 3.1
There are certain differences in formulations of the maximum principle for problems with state constraints in the literature. The above statement is structured along the lines of the maximum principle proved in [4] (Theorem 1 of $ 5.2). On the other hand, the statement of the maximum principle for (OC) proved in [2] look differently at the first glance. But it is not a difficult matter to notice that the functions p(t) here and q(t) in [2] coincide up to a delicate difference: the first is continuous from the left and may be discontinuous at the left end of the interval while the second is continuous from the right and may be discontinuous at the right end of the interval.
4 Proof of Theorem 3.1
 1.Let \(u_i(\cdot )\), \(i=1,\ldots ,k\) be bounded measurable selections of D(t) a.e.. such that for some \(\varepsilon >0\) the functions \(\max _{\Vert x{\overline{x}}(t)\Vert \le \varepsilon } L(t,x,u_i(t)) \) is summable. Then \(u_i(\cdot )\) satisfy the conditions of Proposition 2.10 with any \(x(\cdot )\in W^{1,1}\) close to \({\overline{x}}(\cdot )\). It follows immediately from the proposition that the vector \(({\overline{x}}(\cdot ),0,\ldots ,0 )\) is a weak local minimizer of the functionalwith \(w(\cdot )\) defined by (1), subject to the constraint \(\Vert \dot{x}(t)\dot{{\overline{x}}}(t)\Vert \le r(t)\) a.e.. (By saying that \(({\overline{x}}(\cdot ),0,\ldots ,0)\) is a weak minimizer of \(\hat{J}\), we mean that there is an \(\varepsilon >0\) such that \(\hat{J}((x(\cdot ),\alpha _1,\ldots ,\alpha _k))\ge \hat{J}(({\overline{x}}(\cdot ),0,\ldots ,0))\) whenever \(\Vert x(\cdot )\Vert _{1,\infty }<\varepsilon \) and \(0\le \alpha _i<\varepsilon \).) Consider the space \(Z=\mathbb {R}^n\times L^2\times L^2\times \mathbb {R}^k\) and let the operator \(\Lambda : Z\rightarrow W^{1,2}\) associate with every \((a,x(\cdot ),y(\cdot ),\alpha )\in Z\), where \(\alpha =(\alpha _1,\ldots ,\alpha _k)\), the function$$\begin{aligned} \hat{J}(x(\cdot ),\alpha _1,\ldots ,\alpha _k)= & {} \varphi (w(\cdot ))+ \displaystyle \int _0^TL(t,w(t),\dot{x}(t) ){\mathrm{d}}t\\&+\displaystyle \sum _i\alpha _i^+ \displaystyle \int _0^T\big (L(t,w(t),u_i(t))L(t,w(t),\dot{x}(t)) \big ){\mathrm{d}}t, \end{aligned}$$To simplify notation, we can assume without loss of generality that \(\overline{x}(t)\equiv 0\). Set$$\begin{aligned} \Lambda (a,y(\cdot ),\alpha _1,\ldots ,\alpha _k)(t)= a+\int _0^t\Big (y(\tau )+\sum _i\alpha _i(u_i(\tau )y(\tau ))\Big ){\mathrm{d}}\tau . \end{aligned}$$$$\begin{aligned} \tilde{L}(t,x,y)= \left\{ \begin{array}{ll} L(t,x,y),&{}\quad \mathrm{if}\;\Vert x\Vert \le \overline{\varepsilon },\; \Vert y\Vert \le \min \{\overline{\varepsilon },r(t)\};\\ \infty ,&{} \quad \mathrm{otherwise};\end{array}\right. \end{aligned}$$and consider the following four functionals on Z (where of course \(z=(a,x(\cdot ),y(\cdot ),\alpha _1,\ldots ,\alpha _k)\) ):$$\begin{aligned} g_{i\varepsilon }(t)= \max _{\Vert x\Vert ,\Vert y\Vert \le \varepsilon }(L(t,x.u_i(t))L(t,x,y)); \end{aligned}$$and set \(I=I_1+I_2+I_3+I_4\). It is an easy matter to see that \(\hat{J}(0,\ldots ,0)= I(0.\ldots ,0)\) and (setting \(w(t) = a+\int _0^ty(\tau ){\mathrm{d}}\tau \))$$\begin{aligned} I_1(z)= & {} \varphi (\Lambda (a,y(\cdot ),\alpha _1,\ldots ,\alpha _k) );\\ I_2(z)= & {} \displaystyle \int _0^T \tilde{L}(t,x(t),y(t)){\mathrm{d}}t;\\ I_3(z)= & {} \displaystyle \int _0^TR(t)\Big \Vert x(t)  a\displaystyle \int _0^t\Big (y(\tau )+\displaystyle \sum _i\alpha _i(u_i(\tau )y(\tau )) \Big ){\mathrm{d}}\tau \Big \Vert {\mathrm{d}}t;\\ I_4(z)= & {} \displaystyle \sum _i\alpha _i\displaystyle \int _0^Tg_{i\varepsilon }(t){\mathrm{d}}t + K\displaystyle \sum _i\alpha _i^, \end{aligned}$$if \(\Vert a\Vert +\int _0^T\Vert y(t)\Vert {\mathrm{d}}t\Vert \le \overline{\varepsilon }\), \(\Vert x(t)\Vert \le \overline{\varepsilon }\) for all t, \(\Vert y(t)\Vert \le \min \{\overline{\varepsilon },r(t)\}\) a.e., \(\alpha _i\ge 0\) and K is sufficiently large. It follows that zero is a local minimum of I(z) in Z. Therefore,$$\begin{aligned} \hat{J}(w(\cdot ), \alpha _1,\ldots ,\alpha _k)\le I(a,x(\cdot ),y(\cdot ),\alpha _1,\ldots ,\alpha _k). \end{aligned}$$$$\begin{aligned} 0\in \partial _pI(0). \end{aligned}$$
 2.Analysis of the inclusion is the next step in the proof. The functionals \(I_j\) are lower semicontinuous and uniformly lower semicontinuous near zero in Z since all \(I_j\) except \(I_2\) satisfy the Lipschitz condition. By Proposition 2.1 for any given \(\delta >0\), there are \(z_j=(a_j,x_j(\cdot ),y_j(\cdot ),\alpha _{1j},\ldots ,\alpha _{kj})\in Z\), and \(z_j^*=(b_j,x_j^*,y_j^*,\beta _1,\ldots ,\beta _k)\in Z^*\), \(j=1,2,3,4\) such that \(I_j(z_j)I_j(0)<\delta \) andFinding estimates for \(\partial _pI_j\) does not require much effort.$$\begin{aligned}&(b_j,x_j^*,y_j^*,\beta _{1j},\ldots ,\beta _{kj})\in \partial _pI_j(z_j);\nonumber \\&\Vert a_j\Vert +\Vert x_j(\cdot )\Vert +\Vert y_j(\cdot )\Vert +\displaystyle \sum _{i=1}^k\alpha _{ij}<\delta ;\nonumber \\&\left\ \displaystyle \sum _{j=1}^4 z_j^*\right\ =\max \left\{ \left\ \displaystyle \sum _{j=1}^4 b_j\right\ ,\left\ \displaystyle \sum _{j=1}^4 x_j^*\right\ ,\left\ \displaystyle \sum _{j=1}^4 y_j^*\right\ ,\left \displaystyle \sum _{j=1}^4\beta _{1j}\right ,\ldots ,\left \sum _{j=1}^4\beta _{kj}\right \right\} <\delta \nonumber \\ \end{aligned}$$(10)
 Let \((b,x^*,y^*,\beta _1,\ldots ,\beta _k)\in \partial _GI_1(z)\). Then \(x^*=0\) as \(I_1\) does not depend on \(x(\cdot )\), and if \(w(\cdot )= \Lambda (z)\), then there is a measure \(\nu \in \partial _G\varphi (w(\cdot ))\) such thatwhere we have set \(\nu (t)= \int _t^T\nu ({\mathrm{d}}t)\). Indeed, \(I_1\) can be viewed as a composition of \(\Lambda \) and the restriction of \(\varphi \) to \(W^{1,2}\). Denote for a moment this restriction by \(\tilde{\varphi }\). \(\Lambda \) is a smooth mapping and its derivative at zero (and hence at all nearby points) is onto. By Proposition 2.4, \(\partial _G(z)= \Lambda '(z)^*(\partial _G\tilde{\varphi })(z)\). On the other hand, \(\tilde{\varphi }\) is the composition of embedding \(W^{1,2}\) into C([0, T]) and \(\varphi \). Therefore, by Proposition 2.5\(\partial _G\tilde{\varphi }(x(\cdot ))\) is contained in the set of restrictions of elements of \(\partial _G\varphi (x(\cdot ))\) to \(W^{1,2}\times \mathbb {R}^k\). It remains to recall that the proximal subdifferential is contained in the Gsubdifferential.$$\begin{aligned} b= & {} \int _0^T\nu ({\mathrm{d}}t),\quad \langle y^*,v(\cdot )\rangle = (1\sum \alpha _i)\int _0^T\langle \nu (t),v(t)\rangle {\mathrm{d}}t,\\ \beta _i= & {} \int _0^T\langle \nu (t),u_i(t)y(t)\rangle {\mathrm{d}}t, \end{aligned}$$
 If \((b,x^*,y^*,\beta _1,\ldots ,\beta _k)\in \partial _pI_2(z)\), then \(b=0\), all \(\beta _i=0\), and there are \(\lambda (\cdot )\) and \(\mu (\cdot )\) belonging to \(L^2\) such that \((\lambda (t),\mu (t))\in \partial _p\tilde{L}(t,\cdot )(x(t),y(t))\) almost everywhere andfor all \(h(\cdot )\) and \(v(\cdot )\) in \(L^2\). This is immediate from Proposition 2.2.$$\begin{aligned} \langle x_2^*, h(t)\rangle + \langle y_2^*,v(t)\rangle =\int _0^T(\langle \lambda (t),h(t)\rangle +\langle \mu (t),v(t)\rangle ){\mathrm{d}}t \end{aligned}$$
 If \((b,x^*,y^*,\beta _1,\ldots ,\beta _k)\in \partial _pI_3(z)\), then there is a \(\xi (\cdot )\in L^2\) with \(\Vert \xi (t)\Vert \le R(t)\) a.e. such thatwhere \(\eta (t) = \int _t^T\xi (t){\mathrm{d}}t\). Indeed, \(I_3\) is a composition of a convex lsc functional \(\int _0^TR(t)\Vert x(t)\Vert {\mathrm{d}}t\) and a smooth mapping from Z into \(L^2\). For such mappings, all subdifferentials coincide and are equal to the composition of the derivative of the inner mapping and the convex subdifferential of the outer function.$$\begin{aligned} b= & {}  \displaystyle \int _0^T\xi (t){\mathrm{d}}t,\quad \langle x^*,h(\cdot )\rangle =\displaystyle \int _0^T\langle \xi (t),h(t)\rangle {\mathrm{d}}t,\\ \langle y^*,v(\cdot )\rangle= & {} \left( 1\sum _i\alpha _i\right) \displaystyle \int _0^T\langle \eta (t),v(t)\rangle {\mathrm{d}}t,\quad \beta _i=\displaystyle \int _0^T\langle \eta (t),u_i(t)y(t)\rangle {\mathrm{d}}t, \end{aligned}$$
 if \((b,x^*,y^*,\beta _1,\ldots ,\beta _k)\in \partial _pI_4(z)\), then there are \(\rho _i\in [K,0]\) such thatTaking (10) into account, we conclude that for some \(z_j ,\ j=1,2,3,4,\) satisfying the second inequality in (10) there are a regular \(\mathbb {R}^n\)valued Radon measure \(\nu \in \partial _G\varphi (z_1)\), measurable \((\lambda (t),\mu (t))\in \partial _p\tilde{L}(t,\cdot ,\cdot )(x_2(t),y_2(t))\) and \(\xi (t)\) satisfying \(\Vert \xi (t)\Vert \le R(t)\) and \(\rho _{i}\in [K,0]\) such that$$\begin{aligned} b=0,\quad x^*=0,\quad y^*= 0,\quad \beta _i= \int _0^Tg_{i\varepsilon }(t){\mathrm{d}}t + \rho _i. \end{aligned}$$$$\begin{aligned} \begin{aligned}&\left\ \displaystyle \int _0^T(\nu ({\mathrm{d}}t)  \xi (t){\mathrm{d}}t)\right\<\delta ; \\&\displaystyle \int _0^T\left\ \lambda (t)+\xi (t)\right\ {\mathrm{d}}t<\delta ;\\&\displaystyle \int _0^T\Big \Vert (1\sum _i\alpha _{1i})\nu (t)+\mu (t)(1\sum _i\alpha _{3i})\Big (\displaystyle \int _t^T \xi (\tau ){\mathrm{d}}\tau \Big ) \Big \Vert {\mathrm{d}}t<\delta ;\\&\left \displaystyle \int _0^T\left( \left( 1\sum _i\alpha _{1i}\right) \langle \nu (t),u_i(t)y_1(t)\rangle \right. \right. \\&\left. \left. \qquad \left( 1\sum _i\alpha _{3i}\right) \langle \eta (t),u_i(t)y_3(t)\rangle + \,g_{i\varepsilon }(t)\right) {\mathrm{d}}t+\rho _i\right <\delta ,\quad i=1,\ldots ,k. \end{aligned} \end{aligned}$$(11)

 3.Taking \(\delta _m\rightarrow 0\), we shall find sequences of \(z_{jm}\) converging to zero as \(m\rightarrow \infty \) and of \(\nu _m\), \((\lambda _m(\cdot ),\mu _m(\cdot ))\), \(\xi _m(\cdot )\) and \(\rho _{im}\in [K,0]\), \(i=1,\ldots ,k\) such thatSetting \(q_m(t) = \xi _m(t)\) and \(p_m(t)= \nu _m(t) \int _{t}^{T}q_m(\tau ){\mathrm{d}}\tau \), we rewrite (11) with some \(\gamma _m\rightarrow 0\) and \(y_m(\cdot )\rightarrow 0\) as follows:

\(\nu _m\in \partial _G(\varphi (z_{1m}))\);

\((\lambda _m(t),\mu _m(t))\in \partial _p\tilde{L}(t,x_{2m}(t),y_{2m}(t))\);

\(\Vert \xi _m(t)\Vert \le R(t)\) almost everywhere;

for every m (11) holds with \(\delta = \delta _m\), \(\nu =\nu _m\) etc.
As \(\varphi \) is Lipschitz near zero, total variations of \(\nu _m\) are uniformly bounded and we may assume that \(\nu _m\) weak\(^*\) converges to some \(\nu \) such that \(\nu \in \partial _G\varphi (0)\). The latter means, in particular, that \(\nu _m(t)\) converge to \(\nu (t)\) at every point of continuity of the latter, that is, everywhere except maybe countably many points. Finally, as all \(q_m(\cdot )\) are bounded by the same summable function R(t), this sequence is relatively compact in the weak topology of \(L^1\). By the Eberlein–Smulian theorem, we can assume that the sequence weak converges to a certain q(t). Hence, the integrals \(\int _t^Tq_m(s){\mathrm{d}}s\) uniformly converge to \(\int _t^Tq(s){\mathrm{d}}s\), total variations of \(p_m(\cdot )\) are uniformly bounded, and \(p_m(t)\) converge to$$\begin{aligned} \begin{aligned}&\Vert p_m(0)\Vert<\gamma _m;\\&\displaystyle \int _0^T\Vert \lambda _m (t)  q_m(t)\Vert {\mathrm{d}}t<\gamma _m;\\&\displaystyle \int _0^T\Vert \mu _m(t)  p_m(t)\Vert {\mathrm{d}}t<\gamma _m;\\&\Big \displaystyle \int _0^T(\langle p_m(t) ,u_i(t)y_m(t)\rangle + g_{i\varepsilon }(t)){\mathrm{d}}t+\rho _{im}\Big <\gamma _m,\quad i=1,\ldots ,k. \end{aligned} \end{aligned}$$(12)at every t at which \(\nu (t)\) is continuous. The second and the third relations in (12) imply that the distance from \((q_m(t),p_m(t))\) to \(\partial _pL(t,\cdot )(x_m(t),y_m(t))\) goes to zero for almost every t. On the other hand, by definition of the limiting subdifferential, almost everywhere on [0, T]$$\begin{aligned} p(t) = \int _t^Tq(s){\mathrm{d}}s  \nu (t), \end{aligned}$$By the Mazur theorem a certain sequence, \(\tilde{q}_m(\cdot )\) of convex combinations of \(q_m(\cdot )\) converges to \(q(\cdot )\) almost everywhere. Hence the distance from \((\tilde{q}_m(t),p_m(t))\) to the set \(\{(\tilde{q},p): \tilde{q}\in \mathrm{conv}~\{q: \; (q,p)\in \partial L(t,\cdot )(0,0) \} \}\) goes to zero as \(m\rightarrow \infty \). The set \(\partial L(t,\cdot )(0,0)\) is closed and its projection to the qspace is bounded as \(L(t,\cdot ,y)\) is Lipschitz near zero. Therefore, for any p the set \(\mathrm{conv}~\{q: \; (q,p)\in \partial L(t,\cdot )(0,0) \}\) is also closed and we conclude that (q(t), p(t)) belongs to this set for almost every t. The equality \(p(0)=0\) follows from the first relation in (12). This concludes the proof of (4) and (5). Finally, as sequences \((\rho _{im})\) are uniformly bounded, we can assume that each of them converges to some \(\rho _i\le 0\). It follows from the last relation in (12) that$$\begin{aligned} \limsup _{m\rightarrow \infty }\partial _p L(t,\cdot )(x_m(t),y_m(t)) \subset \partial L(t,\cdot )(0,0). \end{aligned}$$This is true for any \(\varepsilon >0\), and when \(\varepsilon \rightarrow 0\) the functions \(g_{i\varepsilon }(t)\) converge decreasingly to \(L(t,0,u_i(t))L(t,0,0)\) and we get eventually that$$\begin{aligned} \int _0^T(g_{i\varepsilon }(t) \langle p(t),u_i(t)\rangle ){\mathrm{d}}t=\rho _i\ge 0. \end{aligned}$$$$\begin{aligned} \int _0^T\big (L(t,0,u_i(t))  L(t,0,0)\langle p(t),u_i(t)\rangle \big ){\mathrm{d}}t\ge 0,\; i=1,\ldots ,k. \end{aligned}$$(13) 
 4.We can now conclude the proof. First we note that there is a countable collection \({{\mathcal {U}}}\) of bounded measurable selection of \(D(\cdot )\) such that for every \(u(\cdot )\in {{\mathcal {U}}}\) the function \(\max _{\Vert x\Vert \le \overline{\varepsilon }}L(t,x,u(t))\) is summable and for almost every t the set \(\{u(t): u(\cdot )\in {{\mathcal {U}}}\}\) is dense in D(t). Indeed, as \(D(\cdot )\) is a measurable setvalued mapping, there is a countable collection \(\{v_i(\cdot ),\; i=1,2,\ldots \}\) of measurable selections of \(D(\cdot )\) whose values for almost every t form a dense subset of D(t). Take \(\varepsilon =\overline{\varepsilon }/2\). Then \(\theta _i(t) = \max _{\Vert x\Vert \le \varepsilon }L(t,x,v_i(t))\) is finitevalued by (H\(_2\)). Hence, for every \(j=1,2,\ldots \) there is a subset \(\varDelta _{ij}\) with measure not smaller than \(Tj^{1}\) on which \(v_i\) is bounded and \(\theta _i(\cdot )\) is summable. SetThen \({{\mathcal {U}}}=\{u_{ij}:\; i,j=1,2,\ldots \}\) is a required set.$$\begin{aligned} u_{ij}(t)=\left\{ \begin{array}{ll} v_i(t),&{}\quad \mathrm{if}\; t\in \varDelta _{ij};\\ 0,&{} \quad \mathrm{otherwise}.\end{array}\right. \end{aligned}$$
The standard measurable selection arguments now allow to deduce that this may happen only if for almost every t the inequality \(L(t,0,u)  L(t,0,0)\langle p(t),u(t)\rangle \ge 0\) holds for any \(u(\cdot )\in {{\mathcal {U}}}\) almost everywhere. This in turn implies (6) since the values of u(t) when \(u(\cdot )\) runs through \({{\mathcal {U}}}\) and \(L(t,{\overline{x}}(t),\cdot )\) are continuous on D(t). This completes the proof of the theorem.
5 Optimal Control: Proof of Theorem 3.3
As was mentioned in introduction, reduction of the optimal control problem (OC) to the generalized Bolza problem (P) is based on a certain optimality alternative. Here is its statement.
Proposition 5.1
(Optimality alternative [17], Theorem 7,39) Let X be a complete metric space and M a closed subset of X. Let also f be a function on X that attains a local minimum on M at some \({\overline{x}}\in M\). If f is Lipschitz near \({\overline{x}}\), then for any lsc nonnegative function \(\varphi \) on X equal to zero at \({\overline{x}}\) the following alternative holds: either there is a \(K>0\) such that \(d(x,M)\le K\varphi (x)\) for all x of a neighborhood of \({\overline{x}}\), so that \(f+K_0\varphi \) has a local minimum at \({\overline{x}}\) for some \(K_0>0\) (not smaller than K times the Lipschitz constant of f), or there is a sequence \((x_n)\rightarrow \overline{x}\) of elements of \(X\backslash M\) such that for each n the inequality \(\varphi (x) + n^{1}d(x,x_n)\ge \varphi (x_n)\) holds for all \(x\in X\).
The very fact, which is a simple consequence of Ekeland’s principle, was already used in [9]. The nice feature of the approach based on the optimality alternative is that actually there is no need to verify whether the distance estimate is satisfied or not.
Proposition 5.2
([19], Theorem 2.8.2; [21], Example 2.5.2) Assume (H\(_4\)). Then for any \(\nu \in \partial _C\eta ({\overline{x}}(\cdot ))\) there are a probability measure \(\mu \) supported on the set \(\varDelta ({\overline{x}}(\cdot ))\) and a \(\mu \)measurable selection of the setvalued mapping \(t\rightarrow \bar{\partial }_Cg(t,\cdot )({\overline{x}}(\cdot ))\) such that \(\nu ({\mathrm{d}}t) = \gamma (t)\mu ({\mathrm{d}}t)\).
 1.We shall first prove the theorem under the assumption that\(d(y,F(t,\cdot ))\)is\({\overline{R}}(t)\)Lipschitz for all\(y\in \mathbb {R}^n\) (which is the case when \(F(t,\cdot )\) if \({\overline{R}}(t)\)Lipschitz in the Hausdorff metric). SetClearly \(\psi \) is nonnegative and \(\psi (0)=0\), so we can apply the optimality alternative. Denote by M the feasible set in (OC): \(M=\{x(\cdot )\in W^{1,1}:\; \psi (x(\cdot ))=0 \}\). Then either there is a \(K>0\) such that \(d(x(\cdot ),M)\le K\psi (x(\cdot ))\) for all \(x(\cdot )\) of a neighborhood of zero, so that there is a \(K_0\) such that \(\varphi +K_0\psi \) attains an unconditional local minimum at zero (regular case), or there are \(x_m(\cdot )\in W^{1,1}\), \(m=1,2,\ldots \) converging to zero and such that \(\psi (x_m(\cdot ))>0\) and$$\begin{aligned} \psi (x(\cdot ))= d((x(0),x(T)),S)+\int _0^Td(\dot{x}(t),F(t,x(t))){\mathrm{d}}t. \end{aligned}$$(singular case). In the regular case, we shall actually consider the problem with slightly different cost functions:$$\begin{aligned}&\psi (x(\cdot )) +m^{1}(\Vert x(0)x_m(0)\Vert +\int _0^T\Vert \dot{x}(t)\dot{x}_m(t)\Vert {\mathrm{d}}t)>\psi (x_m(\cdot )),\\&\quad \forall \; x(\cdot )\ne x_m(\cdot ). \end{aligned}$$Then \(0=\varphi (0)\le \varphi (x(\cdot ))\le \varphi _m(x(\cdot ))\le \varphi (x(\cdot ))+ m^{2}\) for all feasible \(x(\cdot )\in V\). If m is so big that V contains the \(m^{1}\) ball around zero, then by Ekeland’s principle there is a \(x_{m}(\cdot )\in W^{1,1}\) feasible in (OC) and such that \(\Vert x_{m}(\cdot )\Vert _{1,1}\le m^{1}\) and$$\begin{aligned} \varphi _{m}(x(\cdot ))= \max \{\ell (x(0),x(T))+m^{2},\displaystyle \max _t g(t,x(t)) \},\quad m=1,2,\ldots . \end{aligned}$$for all feasible \(x(\cdot )\in V\). Clearly, \(a_{m}>0\) for all sufficiently large m. Otherwise, \(\ell (x_{m}(0),x_{m}(T))\) would be strictly smaller than zero and \(g(t,x_{m}(t))\le 0\) for all t, which contradicts to our assumption that the minimal value of the cost function on V in the original formulation of (OC) is zero. Since \(x_{m}(\cdot )\rightarrow 0\) as \(m\rightarrow \infty \), the inequality \(d(x(\cdot ),M)\le K\psi (x(\cdot ))\) holds for \(x(\cdot )\) of a neighborhood of \(x_{m}(\cdot )\). The Lipschitz constants of \(\varphi \) and \(\varphi _m\) coincide; hence, \(x_{m}(\cdot )\) is an unconditional local minimum of \(\varphi _{m}+K_0\psi \). Summarizing, we conclude that there are \(\lambda _0\in \{0,1\}\) (\(\lambda _0=1\) in the regular case and \(\lambda _0=0\) otherwise) such that for sufficiently large \(m>0\) the functional$$\begin{aligned} \varphi _{m}(x(\cdot )) + m^{1}\Vert x(\cdot ) x_{m}(\cdot )\Vert _{1,1}\ge \varphi _{m}(x_{m}(\cdot ))=a_{m}\quad \end{aligned}$$attains a local minimum at some \(x_{m}(\cdot )\) (feasible in the regular case and not belonging to M in the singular case) with \(\Vert x_{m}(\cdot )\Vert _{1,1}<m^{1}\). In either case, it is an easy matter to see that \(J_m\) satisfies the hypotheses (H\(_1\)) and (H\(_2\)) and we can apply Theorem 3.1. Let us start with the regular case\(\lambda _0=1\). Note first that as follows from Propositions 2.3 and 5.2 for any \(\nu \) belonging to the Gsubdifferential of \(\varphi _m\) at \(x(\cdot )\) there are \(\lambda \in [0,1]\), a positive measure \(\mu \) with \(\mu ([0,T])=1\lambda \) supported on the set \(\varDelta _m\) where \(g(t,x_m(t))=\varphi _m(x_m(\cdot ))\), a \(\mu \)measurable selection \(\gamma (t)\) of the setvalued mapping \(\bar{\partial }_Cg(t,\cdot )(x(t))\) and a pair of vectors g, h in the subdifferential of \(\lambda \ell (\cdot )+ d(\cdot ,S)\) at (x(0), x(T)) such that for any continuous \(\mathbb {R}^n\)valued u(t)$$\begin{aligned} J_{m}(x(\cdot ))= & {} \lambda _0\varphi _{m}(\cdot )+K_0\Big (\displaystyle \int _0^Td(\dot{x}(t), F(t,x(t))){\mathrm{d}}t+ d((x(0),x(T)),S) \Big )\\&+\, m^{1}\Big (\Vert x(0)x_{m}(0)\Vert +\displaystyle \int _0^T\Vert \dot{x}(t)\dot{x}_{m}(t)\Vert {\mathrm{d}}t \Big ) \end{aligned}$$Thus, applying Theorem 3.1 to \(J_{m}\) and \(x_{m}(\cdot )\) we shall find some \(\lambda _m\in [0,1],g_m,h_m,\mu _m,\gamma _m(\cdot )\), a function \(p_{m}(\cdot )\) of bounded variation, a summable \(q_{m}(\cdot )\) satisfying \(\Vert q_m(t)\Vert \le {\overline{R}}(t)\) a.e. such that \((g_m,h_m)\in \lambda _m\partial (\lambda _0\ell +K_0d(\cdot ,S))(x_m(0),x_m(T))\), \(\mu _m\) is a positive measure with \(\mu _m([0,T])=1\lambda _m\) supported on the \(\varDelta (x_m(\cdot ))\) on which \(g(t,x_m(t))\) attains maximum , \(\gamma _m(\cdot )\in {\bar{\partial }}_Cg(t,\cdot )(x_m(t))\) and the relations$$\begin{aligned} \int _0^Tu(t)\nu ({\mathrm{d}}t)=\lambda (\langle h,u(T)\rangle +\langle g,u(0)\rangle )+\int _0^T\langle u(t),\gamma (t)\rangle \mu ({\mathrm{d}}t). \end{aligned}$$hold for almost every t. Here we have set \(L_m(t,x,y)= K_0d(y,F(t,x))+m^{1}\Vert y\dot{x}_m(t)\Vert \). We may assume that \(\lambda _m\) and \((g_m,h_m)\) converge to some \(\lambda \in [0,1]\) and \((h,g)\in \lambda \partial (\ell +d(\cdot ,S))(0,0)\). Furthermore, since g(t, x(t)) is upper semicontinuous, the sets \(\varDelta (x_{m})\) are closed and the excess of \(\varDelta (x_{m})\) over \(\varDelta (0)\) goes to zero as \(x_m(\cdot )\) goes to zero uniformly. Therefore, \(\mu _{m}\) weak\(^*\) converge (along a subsequence of m) to a nonnegative measure \(\mu \) supported on \(\varDelta (0)\) and such that \(\mu ([0,T])=1\lambda \), whence (i). By (H\(_4\)) \(\Vert \gamma _{m}(t)\Vert \le K\), so the sequence \((\gamma _m(\cdot ))\) is weakly compact in \(L^1\) which implies (in view of upper semicontinuity of generalized gradients of a Lipschitz function) that for almost any t the value of any weak limit point of this set at t is contained in \(\bar{\partial }_Cg(t,\cdot )(0)\). If \(\mu _m\ne 0\), we necessarily have \(\max _tg(t,x_{m}(t))=a_{m}>0\), whence the appearance of \(\partial _C^{>}g(t,\cdot )(0)\) at the limit. The sequence \((q_m(\cdot ))\) is weakly compact in \(L^1\) (as \(q_m(\cdot )\) are bounded by a summable function) and we can assume that it weakly converges to some \(q(\cdot )\) in which case \(p_m(\cdot )\) converge almost everywhere to$$\begin{aligned} \begin{aligned}&q_m(t)\in \mathrm{conv}~\{q:\; (q,p_m(t))\in \partial L_m(t,\cdot ,\cdot )\};\\&p_m(t) = h_m\displaystyle \int _t^Tq_m(s){\mathrm{d}}s\displaystyle \int _t^T\gamma _m(s)d\mu _m(s),\quad p_m(0)= g_m;\\&\langle p_m(t),u\dot{x}_m(t)\rangle \ge 0, \quad \mathrm{for\ all }\; u\in F(t,x_m(t)) \end{aligned} \end{aligned}$$(14)with \(p(0)=g\). We have \(p(T)= h\gamma (T)\mu (\{T\})\) and (ii) follows from the second relation in (14) (recall that the normal cone to a set is generated by the subdifferential of the distance function to the set at the same point). As \(\Vert \dot{x}_m(t)\Vert \rightarrow 0\) almost everywhere, the last relation in (14) implies (iv). Finally, a certain sequence of convex combinations of \(q_m(\cdot )\) converges to \(q(\cdot )\) almost everywhere, and as follows from the first relation in (13), \(q(t)\in \mathrm{conv}~\{q:\; (q,p(t))\in \partial L(t,\cdot ,\cdot )(0,0) \}\), with \(L(t,x,y)=K_0d(y,F(t,x))\), whence (iii). This completes the proof in the regular case. In the singular case, when \(\lambda _0=0\), the same arguments work in substantially simpler situation as the offintegral term reduces to \(d((x(0),x(t)),S)+m^{1}\Vert x(0)x_m(0)\Vert \). The only difference is the proof of (i). In the singular case, \(\psi (x_m(\cdot ))>0\). This means that for any m either \(d((x_m(0),x_m(T)),S)>0\) in which case \(\max \{\Vert g_m\Vert ,\Vert h_m\Vert \}=1\), or \(d(\dot{x}_m(t),F(t,x_m(t)))>0\) on the set of positive measure which means that \(\Vert p_m(t)\Vert \ge 1m^{1}\) at least at one point. Thus, in either case \(\Vert p_m(t)\Vert \ge 1m^{1}\) at least at one point. On the other hand, as \(\mu _m=0\) in the singular case, \(p_m(\cdot )\) are continuous functions converging uniformly, so that \(p(\cdot )\ne 0\). This completes the proof of the theorem for the case when \(d(y,F(t,\cdot ))\) is \({\overline{R}}(t)\)Lipschitz for any y.$$\begin{aligned} p(t)= h \int _t^Tq(s){\mathrm{d}}s \int _t^T\gamma (s)\mu ({\mathrm{d}}s) \end{aligned}$$
 2.Let us turn to the proof under the stated conditions. First we apply Proposition 2.7 to \(F=F(t,\cdot )\), \(C=\{0\}\), \(r={\overline{r}}(t)\), \(R={\overline{R}}(t)\), \(\varepsilon =\overline{\varepsilon }\) and \(\eta \in [\bar{\eta },1)\). Let \(\Gamma _0(t,x)\) be the corresponding mapping into \(\mathbb {R}\times \mathbb {R}^n\). It is obviously measurable. By (H\(_5\)), \({\overline{r}}(t)\) is bounded away from zero, that is, there is an \({\overline{r}}>0\) such that \({\overline{r}}(t)\ge {\overline{r}}\) for almost all t. Fix some positive \(r<{\overline{r}}\), \(R>0\), \(\eta \in [\bar{\eta },1]\) and \(\varepsilon \in (0,\overline{\varepsilon })\) and let C(t) be the set of all \(u\in F(t,0)\) such that (9’) holds with chosen \(r,R,\eta \) and \(\varepsilon \). Clearly, C(t) is a closed set (that in principle can be empty). Again, it is not a difficult matter to verify that \(C(\cdot )\) is a measurable mapping. (Consider the functions(where \(\mathrm{ex}(A,B)\) is the excess of A over B and d(A, B) is the distance between A and B) and the sets of \(u\in F(t,0)\) where both are equal zero.)$$\begin{aligned} g(t,u)= & {} \sup \{\mathrm{ex}(F(x)\cap B(u,r),F(x'+R(\Vert x'x\Vert B))):\; x,x'\in \varepsilon B\};\\ h(t,u)= & {} \max \{ d(F(t,x),B((u,(1\eta )r) ))\} \end{aligned}$$
Let \(t\in Q_{\varepsilon }\) and \(((0,q),(\kappa ,p(t)))\in \partial G(t,\cdot ,\cdot )((t,0),(1,0))\). Recall that \(\Gamma (t,\cdot )=\Gamma _0(t,\cdot )\) for such t. Choose further small positive \(\delta \) and \(\theta \) to ensure that \((1\eta ){\overline{r}}>3\delta \) and \(\theta {\overline{R}}(t) <\delta \). The latter implies by (H\(_5\)) that \(d(0,F(t,x))<\delta \) if \(\Vert x\Vert <\theta \).
We have \(d((\sigma ,y),\Gamma _0(t,x))= \inf \{\sigma \nu +\Vert y\nu z\Vert :\; z\in F(t,x),\; \Vert z\Vert \le (1\sigma \nu ){\overline{r}}(t) \}\). However, if \(1\sigma <\delta \), \(\Vert y\Vert <\delta \), \(\Vert x\Vert <\theta \), then the inequality \(\Vert z\Vert \le (1\sigma \nu ){\overline{r}}(t)\) is automatically satisfied for \((\nu ,z)\) realizing the infimum. Hence, \(d((\sigma ,y),\Gamma _0(t,x))= \inf \{\sigma \nu +\Vert y\nu z\Vert :\; z\in F(t,x) \}\). On the other hand, we can be sure that \(d((y/\nu ),F(t,x))<1\) for \((\nu ,y)\) close to (1, 0). Therefore, the infimum is attained with \(\nu =\sigma \) and is equal to \(\nu d((y/\nu ),F(t,x))= \nu L(t,x,y/\nu )\).
Set \(g_t(\nu ,,x,y)= \nu L(t,x,y/\nu )\). We can view \(g_t\) as a composition \(h_t\circ A\), where \(h_t(\nu ,x,y)= \nu L(t,x,y)\) and \(A(\nu ,x,y)= (\nu ,x,y/\nu )\) is a smooth map. Applying Proposition 2.5 (with a reference to Remark 2.2), we get the equality \(\partial g_t(1,0,0)= (0,\partial L(t,\cdot ,\cdot ))(0,0)\) which proves the claim.
Denote by \(\Lambda \), the collection of Lagrange multipliers \(\lambda ,p(\cdot ),q(\cdot ),\nu \) (where \(\nu ({\mathrm{d}}t)=\gamma (t)\mu ({\mathrm{d}}t)\)) satisfying (15) (with \(\kappa (\cdot )=0\)), (17). By Proposition 2.9, \(dp(t)\le {\overline{R}}(t)p(t)+d\nu (t)\), where p and \(\nu \) stand for variations of p and \(\nu \). The total variation of \(\nu \) is bounded by the Lipschitz constant of \(\varphi \); hence, by Grownwall’s lemma the total variation of \(p(\cdot )\) is also bounded and (again by Proposition 2.9) \(\Vert q(t)\Vert \le {\overline{R}}(t)p([0,T])\). This means that \(\Lambda \) is a compact set if \(p(\cdot )\) and \(\nu \) are considered with the weak\(^*\)topology of measures and \(q(\cdot )\) with the weak topology of \(L^1\).
Let now \(r_i\rightarrow 0\), \(\eta _i\rightarrow 0\) (both decreasingly), \(R_i\rightarrow \infty \) (increasingly) and \(\varepsilon _i\rightarrow 0\) (decreasingly) satisfy \(\varepsilon _iR_i<(1\eta _i)r_i\), and let \(C_i'(t)\) be the corresponding subsets of F(t, 0). If \(Q_{\varepsilon _i}\) are chosen to guarantee that \(Q_{\varepsilon _{i+1}}\subset Q_{\varepsilon _i}\) (which is of course possible), then \(C_i(t)\subset C_{i+1}(t)\) and every \(u\in U(t)\) belongs to \(C_i'(t)\) if i is sufficiently large. As we have seen, for any i the set of \(\Lambda _i\) of multipliers guaranteeing that the conclusion of the theorem holds with U(t) replaced by \(({\overline{r}}(t)B)\cup C_i(t)\) in (iv) is nonempty. It is obvious that \(\Lambda _{i+1}\subset \Lambda _i\), and all \(\Lambda _i\) are compact, \(\cap \Lambda _i\ne \emptyset \). This completes the proof of the theorem.
Remark 5.1
The only purpose of introducing functionals \(\varphi _m\) in the regular case was to get more precise information about location of values of \(\gamma (\cdot )\). In case when we know a priori that \(\overline{\partial }_C g(t,\cdot )\) and \(\partial _C^>g(t,\cdot )\) coincide (e.g., when \(g(t,\cdot )\) is differentiable and the derivative is continuous in both variables), the proof will be noticeably shorter.
6 An Open Problem
Similar extension is possible for optimal control problems in the classical Pontryagin form when the right side of the equation is continuous with respect to the state variable (see, e.g., [22] and references therein).
7 Conclusions
As is stressed in the introduction, the purpose of the paper has been not just to prove new and stronger necessary optimality conditions but also to show that how a heavily constrained optimal control problem can be equivalently reduced to an unconstrained Bolza problem with simply structured integrand and offintegral term. It is to be once again emphasized that such a reduction can be performed and effectively used for further analysis only in the framework of modern variational analysis that allows to work with nonsmoothness and setvaluedness. I believe that the approach has a strong potential and can be effectively used far beyond firstorder optimality conditions, in particular, to study secondorder conditions for a strong minimum in optimal control and the Hamilton–Jacobi theory.
Footnotes
 1.
Actually, it will be sufficient to assume that L(t, x(t), y(t)) is measurable if so are x(t) and y(t)
Notes
Acknowledgements
I wish to express my gratitude to the anonymous referee for helpful remarks.
References
 1.Clarke, F.H.: Necessary conditions in dynamic optimization. Mem. AMS 816, 113 (2005)MathSciNetzbMATHGoogle Scholar
 2.Vinter, R.B.: Optimal Control. Birkhauser, Basel (2000)zbMATHGoogle Scholar
 3.Ioffe, A.D.: Necessary and sufficient conditions for a local minimum 1. Reduction theorem and first order conditions. SIAM J. Control Optim. 17, 245–251 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
 4.Ioffe, A.D., Tikhomirov, V.M.: Theory of Extremal Problems, Nauka, Moscow 1974 (in Russian) (English transl: North Holland) (1979)Google Scholar
 5.Rockafellar, R.T.: Existence theorems for general control problems of Bolza and Lagrange. Adv. Math. 15, 312–333 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
 6.Clarke, F.H.: The generalized problem of Bolza. SIAM J. Control Optim. 14, 682–699 (1976)MathSciNetCrossRefzbMATHGoogle Scholar
 7.Clarke, F.H.: The maximum principle under minimal hypotheses. SIAM J. Control Optim. 14, 1078–1091 (1976)MathSciNetCrossRefzbMATHGoogle Scholar
 8.Loewen, P.D.: Optimal control via nonsmooth analysis. In: CRM Proceedings and Lecture Notes, vol. 2. AMS (1993)Google Scholar
 9.Ioffe, A.D.: Euler–Lagrange and Hamiltonian formalisms in dynamic optimization. Trans. Am. Math. Soc. 349, 2871–2900 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
 10.Ioffe, A.D., Rockafellar, R.T.: The Euler and Weierstrass conditions for nonsmooth variational problems. Calc. Var. PDEs 4, 59–87 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
 11.Smirnov, G.V.: Discrete approximations and optimal solutions to differential inclusions, Kibernetika (Kiev) 1991, (1), 76–79 (in Russian); [English transl.: Cybernetics 27(1), 101–107 (1991)]Google Scholar
 12.Loewen, P.D., Rockafellar, R.T.: Optimal control of unbounded differential inclusions. SIAM J. Control Optim. 32, 442–470 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
 13.Mordukhovich, B.S.: Optimization and finite difference approximations of nonconvex differential inclusions with free time. In: Mordukhovich, B.S., Sussmann, H.J. (eds.) Nonsmooth Analysis and Geometric Methods in Deterministic Optimal Control, pp. 153–202. Springer, Berlin (1996)CrossRefGoogle Scholar
 14.Vinter, R.B., Zheng, H.: The extended Euler–Lagrange conditions in nonconvex variational problems. SIAM J. Control Optim. 35, 56–77 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
 15.Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin (1998)CrossRefzbMATHGoogle Scholar
 16.Clarke, F.H., Ledyaev, Yu.S., Stern, R.J., Wolenski, P.R.: Nonsmooth Analysis and Control Theory. Springer, Berlin (1998)zbMATHGoogle Scholar
 17.Ioffe, A.D.: Variational Analysis of Regular Mappings. Springer, Berlin (2017)CrossRefzbMATHGoogle Scholar
 18.Penot, J.P.: Calculus Without Derivatives, Graduate Texts in Mathematics, vol. 266. Springer, Berlin (2012)Google Scholar
 19.Clarke, F.H.: Optimization and Nonsmooth Analysis. WileyInterscience, New York (1983)zbMATHGoogle Scholar
 20.Bogolyubov, N.N.: Sur quelques méthodes nouvelles dans le calcul des variations. Ann. Math. Pure Appl. Ser. 4 7, 249–271 (1930)CrossRefzbMATHGoogle Scholar
 21.Ioffe, A.D.: Necessary conditions in nonsmooth optimization. Math. Oper. Res. 9, 159–189 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
 22.Arutyunov, A.V., Vinter, R.B.: A simple finite approximations proof of the Pontryagin maximum principle, under deduced differentiability Hypotheses. Set Valued Anal. 12, 5–24 (2004)MathSciNetCrossRefzbMATHGoogle Scholar