1 Introduction and main results

This paper considers random ensembles of uniformly parabolic systems

$$\begin{aligned} u_t=\nabla \cdot a\nabla u, \end{aligned}$$
(1)

where the law of the coefficient field a is assumed to be stationary with respect to space-time translations and ergodic. Precisely, for a probability space of coefficient fields \((\varOmega ,{\mathcal {F}},\left\langle \cdot \right\rangle )\), where \(\left\langle \cdot \right\rangle \) is used simultaneously to denote the law and expectation of the ensemble, the stationarity asserts that the coefficients are statistically homogeneous in time and space in the sense that

$$\begin{aligned} \forall x\in {\mathbb {R}}^d, \forall t\in {\mathbb {R}}: \;\;a(\cdot , \cdot )\;\;\text {and}\;\;a(\cdot +x, \cdot +t)\;\;\text {have the same law under} \left\langle \cdot \right\rangle . \end{aligned}$$
(2)

The ergodicity asserts that every translationally invariant function of the coefficient field is constant. That is, for every bounded random variable F:

$$\begin{aligned} \begin{aligned}&\text { if } \forall x\in {\mathbb {R}}^d, \forall t\in {\mathbb {R}}, \text { and for } \left\langle \cdot \right\rangle \text {-a.e.}\;a : F(a)=F\left( a(\cdot +x,\cdot +t)\right) , \\&\quad \text { then } F = c \; \left\langle \cdot \right\rangle \text {-a.s.} \end{aligned} \end{aligned}$$
(3)

The ensemble is stochastically continuous in the sense that, for each \(\delta >0\), for \(h\in {\mathbb {R}}^d\) and \(s\in {\mathbb {R}}\),

$$\begin{aligned} \lim _{\left| h\right| ,\left| s\right| \rightarrow 0}\left\langle \left\{ a\in \varOmega |\;\left| a(\cdot +h,\cdot +s)-a\right| >\delta \right\} \right\rangle =0, \end{aligned}$$
(4)

which, in particular, guarantees the almost-sure measurability of the random fields appearing in this paper with respect to the Lebesgue measure on \({\mathbb {R}}^{d+1}\). Finally, the ensemble is bounded and uniformly elliptic in the sense that there exists a deterministic \(\lambda \in (0,1]\) such that

$$\begin{aligned} \left| a\xi \right| \le \left| \xi \right| \;\;\text {and}\;\;\lambda \left| \xi \right| ^2\le \xi \cdot a\xi \qquad \forall \xi \in {\mathbb {R}}^d, \text { and for }\left\langle \cdot \right\rangle \text {-a.e. } a. \end{aligned}$$
(5)

Assumptions (2) and (3) are statistical requirements for the ensemble \(\left\langle \cdot \right\rangle \) that guarantee the qualitative homogenization of equations like (1), see (13). Their role in this paper, and in homogenization theory generally, appears most essentially through applications of the ergodic theorem. See, for instance, the foundational work of Papanicolaou and Varadhan [23], who worked in the elliptic setting.

However, conditions (2) and (3) are merely qualitative and contain no quantitative information about the mixing properties of the ensemble. Therefore, while the results of this paper apply to a very general class of environments, the corresponding homogenization may occur at an arbitrarily slow rate. In order to obtain more quantitative statements, such as in the recent work Armstrong et al. [3], it would be necessary to quantify the ergodicity in the way, for example, of a spectral gap inequality or a finite-range of dependence.

The qualitative theory of homogenization for systems like (1) aims to characterize, for \(\left\langle \cdot \right\rangle \)-a.e. a, the limiting behavior, as \(\epsilon \rightarrow 0\), of solutions to the rescaled equation

$$\begin{aligned} \left\{ \begin{array}{ll} u^\epsilon _t=\nabla \cdot a^\epsilon \nabla u^\epsilon &{} \quad \text {in}\;\;{\mathbb {R}}^d\times (0,\infty ) \\ u^\epsilon =u_0 &{}\quad \text {on}\;\;{\mathbb {R}}^d\times \{0\}, \end{array}\right. \end{aligned}$$
(6)

where

$$\begin{aligned} a^\epsilon (\cdot ,\cdot ):=a\left( \frac{\cdot }{\epsilon }, \frac{\cdot }{\epsilon ^2}\right) \end{aligned}$$

is a parabolic rescaling of the coefficient field. This is understood classically through the introduction of a space-time corrector \(\phi =\{\phi _i\}_{i\in \{1,\ldots ,d\}}\) satisfying, for each \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} \phi _{i,t}=\nabla \cdot a(\nabla \phi _i+e_i)\;\;\text {in}\;\;{\mathbb {R}}^{d+1}. \end{aligned}$$
(7)

Then, in view of the linearity, for each \(\xi \in {\mathbb {R}}^d\) the corresponding corrector \(\phi _\xi \) is defined by the sum

$$\begin{aligned} \phi _\xi :=\xi _i\phi _i, \end{aligned}$$
(8)

where here, and throughout the paper, the notation employs Einstein’s summation convention over repeated indices.

The gradient of the corrector \(\nabla \phi \) is a random field which is stationary with finite energy. That is, for each \(x\in {\mathbb {R}}^d\), \(t\in {\mathbb {R}}\) and \(a\in \varOmega \),

$$\begin{aligned} \nabla \phi (x,t;a)=\nabla \phi \left( 0,0;a(\cdot +x,\cdot +t)\right) , \end{aligned}$$

and, for each \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} \left\langle \left| \nabla \phi _i\right| ^2 \right\rangle <\infty . \end{aligned}$$

These facts are used to prove the strict sublinearity of the large-scale \(L^2\)-averages of \(\phi \) on parabolic cylinders. Namely, for each \(R>0\), let \(B_R\) denote the ball of radius R centered at the origin and let \({\mathcal {C}}_R\) denote the parabolic cylinder

$$\begin{aligned} {\mathcal {C}}_R:=B_R\times (-R^2,0]. \end{aligned}$$

The corrector satisfies, for \(\left\langle \cdot \right\rangle \)-a.e. a, for each \(i\in \{1,\ldots ,d\}\),

(9)

where here, and throughout the paper, the integration variables will be omitted unless there is a possibility of confusion. This sublinearity is essentially equivalent to homogenization, see (13) below, and is crucial for the arguments of this paper.

The corrector is used to identify the homogenized coefficient field \(a_{\mathrm {hom}}\) as the expectation of the components of the flux \(\left\{ q_i\right\} _{i\in \{1,\ldots ,d\}}\). The flux is defined, for each \(i\in \{1,\ldots ,d\}\), by

$$\begin{aligned} q_i:=a(\nabla \phi _i+e_i), \end{aligned}$$
(10)

and the homogenized coefficient field is defined, for each \(i\in \{1,\ldots ,d\}\), by

$$\begin{aligned} a_{\mathrm {hom}}e_i:=\left\langle a(\nabla \phi _i+e_i) \right\rangle . \end{aligned}$$
(11)

It is a classical fact that the homogenized coefficient field \(a_{\mathrm {hom}}\) is uniformly elliptic and bounded, as is shown in Lemma 1. The solution of the corresponding constant-coefficient parabolic equation

$$\begin{aligned} \left\{ \begin{array}{ll} v_t=\nabla \cdot a_{\mathrm {hom}}\nabla v &{}\quad \text {in}\;\;{\mathbb {R}}^d\times (0,\infty ) \\ v=u_0 &{}\quad \text {on}\;\;{\mathbb {R}}^d\times \{0\}, \end{array}\right. \end{aligned}$$
(12)

then characterizes the limiting behavior, for \(\left\langle \cdot \right\rangle \)-a.e. a and as \(\epsilon \rightarrow 0\), of the solutions to (6). Indeed, by obtaining an energy estimate for the error in the asymptotic expansion

$$\begin{aligned} u^\epsilon \simeq v+\epsilon \phi _i\left( \frac{\cdot }{\epsilon },\frac{\cdot }{\epsilon ^2}\right) \partial _iv, \end{aligned}$$

which relies upon the sublinearity (9), it follows that, for \(\left\langle \cdot \right\rangle \)-a.e. a, for every \(u_0\in L^2({\mathbb {R}}^d)\) and \(T>0\), as \(\epsilon \rightarrow 0\),

$$\begin{aligned} u^\epsilon \rightarrow v\;\;\text {strongly in}\;\;L^2({\mathbb {R}}^d\times [0,T]). \end{aligned}$$
(13)

This almost sure convergence is the qualitative homogenization of the original ensemble.

Looking ahead, observe that the behavior of the solution \(u^\epsilon \) to (6) on a unit scale, for \(\epsilon >0\) small, corresponds to a characterization of the large-scale behavior of the solution u satisfying (1). Namely, the behavior of the solution \(u^\epsilon \) on a unit scale corresponds to the behavior of u on scale \(\epsilon ^{-1}\) in space and \(\epsilon ^{-2}\) in time. The purpose of this paper will be to characterize the extent to which solutions of (1) inherit, on large-scales and for \(\left\langle \cdot \right\rangle \)-a.e. a, the regularity of solutions to constant-coefficient parabolic equations.

A concise statement of this large-scale regularity is contained in the following first-order Liouville theorem, which is the main theorem of the paper.

Theorem 1

Suppose that \(\left\langle \cdot \right\rangle \) is stationary (2), ergodic (3), stochastically continuous (4), and bounded and uniformly elliptic (5). Then, \(\left\langle \cdot \right\rangle \)-a.e. a satisfies the following first-order Liouville property: if u is an ancient whole-space a-caloric function, that is if u is a distributional solution of

$$\begin{aligned} u_t=\nabla \cdot a\nabla u\;\;\text {in}\;\;{\mathbb {R}}^d\times (-\infty ,0), \end{aligned}$$

which is strictly subquadratic on parabolic cylinders in the sense that, for some \(\alpha \in (0,1)\),

then there exists \(c\in {\mathbb {R}}\) and \(\xi \in {\mathbb {R}}^d\) such that

$$\begin{aligned} u(x,t)=c+x\cdot \xi +\phi _\xi (x,t)\;\;\text {in}\;\;{\mathbb {R}}^d\times (-\infty ,0), \end{aligned}$$

for the corrector \(\phi _\xi \) defined in (8).

The proof of Theorem 1 is strongly motivated by the work of Gloria, Neukamm and Otto [18], who considered precisely these questions for stationary and ergodic ensembles of elliptic equations. It is based on controlling the large-scale \(L^2\)-deviation of the gradient of an a-caloric function from the span of the a-caloric gradients \(\{\xi +\nabla \phi _\xi \}_{\xi \in {\mathbb {R}}^d}\). The excess of an a-caloric function measures this deviation, and is defined, for each \(R>0\) and a-caloric function u on \({\mathcal {C}}_R\), by

(14)

In Proposition 2 below, for \(\left\langle \cdot \right\rangle \)-a.e. a, the excess of an a-caloric function will be shown to decay like a power law in the radius. However, before the statement, it is useful to observe some essential differences between the parabolic and elliptic settings. In what follows, the superscript “\(\text {ell}\)” will be used to differentiate elliptic objects from their parabolic counterparts.

In the elliptic case, for a stationary and ergodic ensemble \(\left\langle \cdot \right\rangle ^\text {ell}\) of bounded, uniformly elliptic coefficient fields \(a^\text {ell}\), the corrector \(\phi ^\text {ell}=\{\phi ^{\text {ell}}_i\}_{i\in \{1,\ldots ,d\}}\) is defined by the equations, for \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} -\nabla \cdot a^\text {ell}(\nabla \phi ^\text {ell}_i+e_i)=0\;\;\text {in}\;\;{\mathbb {R}}^d, \end{aligned}$$
(15)

and, for each \(\xi \in {\mathbb {R}}^d\),

$$\begin{aligned} \phi ^\text {ell}_\xi :=\phi ^\text {ell}_i\xi _i. \end{aligned}$$

These correctors play a virtually identical role to the parabolic correctors (7) in elliptic homogenization theory.

We remark that a version of excess for uniformly elliptic ensembles was first defined by Armstrong and Smart [6], where it was also used in a Campanato iteration. The methods of this paper and definition (16) below follow most closely [18, Lemma 2], although they worked with the equivalent \(L^2\)-energy as opposed to the intrinsic energy defined by the coefficient field. This differs, for instance, from the definition used in the work of the Bella et al. [12], which considered degenerate elliptic ensembles for which it was essential to incorporate the environment a. These notions motivated definition (14), and measured the deviation of the gradient of an \(a^\text {ell}\)-harmonic function u on \(B_R\), by which is meant a solution

$$\begin{aligned} -\nabla \cdot a^\text {ell}\nabla u=0\;\;\text {in}\;\;B_R, \end{aligned}$$

from the span of \(a^{\text {ell}}\)-harmonic gradients \(\{\xi +\nabla \phi ^\text {ell}_\xi \}_{\xi \in {\mathbb {R}}^d}\). Precisely, for each \(R>0\) and \(a^\text {ell}\)-harmonic function u on \(B_R\),

(16)

The decay of the excess was controlled in [18, Lemma 2] through the introduction of a flux correction \(\sigma ^\text {ell}=\{\sigma ^\text {ell}_i\}_{i\in \{1,\ldots ,d\}}\). Namely, the flux \(q^\text {ell}=\{q^\text {ell}_i\}_{i\in \{1,\ldots ,d\}}\) is defined, for each \(i\in \{1,\ldots ,d\}\), by

$$\begin{aligned} q^\text {ell}_i:=a^\text {ell}(\nabla \phi ^\text {ell}_i+e_i), \end{aligned}$$

where, in analogy with the parabolic setting, the homogenized coefficient field \(a_{\mathrm {hom}}^\text {ell}\) is defined by the expectation of the components of the flux, for \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} a_{\mathrm {hom}}^\text {ell} e_i:=\left\langle a^\text {ell}(\nabla \phi ^\text {ell}_i+e_i) \right\rangle ^\text {ell}. \end{aligned}$$

Therefore, strictly in the elliptic case, the corrector Eq. (15) asserts that the components of the flux are divergence-free and may be viewed as closed \((d-1)\)-forms on the whole space. Hence, for each \(i\in \{1,\ldots ,d\}\), there exists a \((d-2)\)-form, which is represented by a skew-symmetric matrix \(\sigma ^\text {ell}_i=(\sigma _{ijk})^\text {ell}_{j,k\in \{1,\ldots ,d\}}\), satisfying

$$\begin{aligned} \nabla \cdot \sigma ^\text {ell}_i=q^{\text {ell}}_i-a_{\mathrm {hom}}^\text {ell} e_i, \end{aligned}$$
(17)

where the divergence of the tensor-field \(\sigma _i^\text {ell}\) is defined, for each \(i,j\in \{1,\ldots ,d\}\), by

$$\begin{aligned} (\nabla \cdot \sigma ^\text {ell}_i)_j=\sum _{k=1}^d\partial _k\sigma ^\text {ell}_{ijk}. \end{aligned}$$

Furthermore, the flux correction \(\sigma ^\text {ell}\) is fixed according to the choice of gauge, for each \(i,j,k\in \{1,\ldots ,d\}\),

$$\begin{aligned} \varDelta \sigma ^\text {ell}_{ijk}=\partial _kq^\text {ell}_{ij}-\partial _jq^\text {ell}_{ik}. \end{aligned}$$

In [18, Lemma 2], the sublinearity of the large-scale \(L^2\)-averages of the extended corrector \((\phi ^\text {ell},\sigma ^\text {ell})\) on large balls is shown to imply that the elliptic excess (16) decays as a power law in the radius.

Precisely, for each \(\alpha \in (0,1)\), there exists \(C^\text {ell}_0=C^\text {ell}_0(\alpha ,d,\lambda )>0\) and \(C_1^\text {ell}=C_1^\text {ell}(\alpha , d,\lambda )>0\) for which, whenever a pair of radii \(0<r<R<\infty \) satisfy, for every \(\rho \in [r,R]\), for each \(i\in \{1,\ldots ,d\}\),

then, for every \(a^\text {ell}\)-harmonic function u in \(B_R\),

$$\begin{aligned} \text {Exc}^\text {ell}(u;r)\le C^\text {ell}_1\left( \frac{r}{R}\right) ^{2\alpha }\text {Exc}^\text {ell}(u;R). \end{aligned}$$
(18)

Observe, in particular, that this is a deterministic result. Indeed, the stochastic properties of the extended corrector \((\phi ^\text {ell},\sigma ^\text {ell})\) are necessary to prove that, for \(\left\langle \cdot \right\rangle \)-a.e. a, the large-scale \(L^2\)-averages are sublinear. But, by taking this fact as an input, it follows from a Campanato iteration that the excess of an arbitrary \(a^\text {ell}\)-harmonic function decays according to (18). In this paper, the analogous result will also be obtained for the parabolic excess, as shown in Proposition 2 below.

The first essential difference is that, unlike in the elliptic case, the fluxes \(\{q_i\}_{i\in \{1,\ldots ,d\}}\) defined in (10) are not divergence-free, and so an immediate analogue of the flux correction \(\sigma ^\text {ell}=\{\sigma ^\text {ell}_i\}_{i\in \{1,\ldots ,d\}}\) cannot be defined. Instead, the flux is essentially decomposed according to the Weyl decomposition, where the parabolic \(\sigma \) is constructed to correct the divergence-free component. Precisely, for each \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} q_i=q_{i,\text {sol}}+\nabla \psi _i+c_i, \end{aligned}$$

where \(c_i\) is a constant, the solenoidal part \(q_{i,\text {sol}}\) is divergence-free, and the potential part \(\nabla \psi _i\) is constructed to be a stationary, finite-energy gradient satisfying

$$\begin{aligned} \varDelta \psi _i=\nabla \cdot q_i. \end{aligned}$$
(19)

Indeed, for each \(i\in \{1,\ldots ,d\}\), one first defines \(\nabla \psi _i\) according to (19) and then observes that \(q_i-\nabla \psi _i\) is divergence-free.

The flux correction \(\sigma =\{\sigma _i\}_{i\in \{1,\ldots ,d\}}\) is then defined, for each \(i\in \{1,\ldots ,d\}\), by the equation

$$\begin{aligned} \nabla \cdot \sigma _i=\left( q_i-\nabla \psi _i\right) -\left\langle q_i\;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle , \end{aligned}$$
(20)

where \(\left\langle q_i\;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle \) denotes the conditional expectation of \(q_i\) with respect to the sub-sigma-algebra \({\mathcal {F}}_{{\mathbb {R}}^d}\subset {\mathcal {F}}\) of subsets of \(\varOmega \) which are invariant with respect to spatial translations of the coefficient fields. They are fixed following the choice of gauge, for each \(i,j,k\in \{1,\ldots ,d\}\),

$$\begin{aligned} \varDelta \sigma _{ijk}=\partial _k(q_i-\nabla \psi _i)_j-\partial _j(q_i-\nabla \psi _i)_k. \end{aligned}$$
(21)

We remark that the choice of gauge (21) fixes the normalization constant appearing in (20). Indeed, informally, for each \(i,j\in \{1,\ldots ,d\}\), it follows from the definition of the divergence of the tensor \(\sigma _i\), Eq. (21), and the equality of mixed derivatives that

$$\begin{aligned} \varDelta \left( \nabla \cdot \sigma _i\right) _j= & {} \varDelta \left( \partial _k\sigma _{ijk}\right) =\partial _k\left( \varDelta \sigma _{ijk}\right) \\= & {} \partial _k\partial _k(q_i-\nabla \psi _i)_j-\partial _k\partial _j(q_i-\nabla \psi _i)_k \\= & {} \varDelta (q_i-\nabla \psi _i)_j-\partial _j\nabla \cdot (q_i-\nabla \psi _i) \\= & {} \varDelta (q_i-\nabla \psi _i)_j, \end{aligned}$$

where the final inequality uses the fact that \((q_i-\nabla \psi _i)\) is divergence free. That is, for each \(i,j\in \{1,\ldots ,d\}\),

$$\begin{aligned} \varDelta \left( \left( \nabla \cdot \sigma _i\right) _j-(q_i-\nabla \psi _i)_j\right) =0, \end{aligned}$$

from which it follows that the stationary random field

$$\begin{aligned} \left( \nabla \cdot \sigma _i\right) _j-(q_i-\nabla \psi _i)_j \end{aligned}$$

is invariant with respect to spatial shifts of the coefficients \(a\in \varOmega \). From this it follows that the difference is equal to the conditional expectation appearing in (20), after observing that the conditional expectation of \(\nabla \psi _i\) with respect to \({\mathcal {F}}_{\mathbb {R}}^d\) vanishes, which is proven in Lemma 3 of Sect. 3.

Finally, for each \(i\in \{1,\ldots ,d\}\), it is necessary to correct the oscillations of the conditional expectation \(\left\langle q_i\;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle \) about its mean. The corrector \(\zeta =\{\zeta _{ij}\}_{i,j\in \{1,\ldots ,d\}}\) is constructed explicitly for this purpose and satisfies, for each \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} \partial _t \zeta _i=\left\langle q_i\;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle -\left\langle q_i \right\rangle =\left\langle q_i\;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle -a_{\mathrm {hom}}e_i. \end{aligned}$$
(22)

In particular, this final correction \(\zeta \) is constant in space, as a \({\mathcal {F}}_{{\mathbb {R}}^d}\)-measurable field, and depends only on time.

In comparison with the elliptic setting, where the decay of the excess was determined by the sublinearity of the large-scale \(L^2\)-averages of \((\phi ^\text {ell},\sigma ^\text {ell})\), the decay of the parabolic excess will be determined by the sublinearity of the large-scale \(L^2\)-averages of the corrector \((\phi ,\psi ,\sigma )\), measured with respect to the scaling in space, and the sublinearity of the large-scale \(L^2\)-averages of \(\zeta \), measured with respect to the scaling in time. The first lemma of the paper establishes the existence of the extended corrector \((\phi ,\psi ,\sigma ,\zeta )\).

Lemma 1

Suppose that the ensemble \(\left\langle \cdot \right\rangle \) satisfies (2), (3), (4), and (5). There exist \(C=C(d,\lambda )>0\) and random fields \(\phi =\{ \phi _i \}_{i\in \{1,\ldots ,d\}}\), \(\psi = \{\psi _i\}_{i\in \{1,\ldots ,d\}}\), \(\sigma =\{\sigma _{ijk}\}_{i,j,k\in \{1,\ldots ,d\}}\) and \(\zeta =\{\zeta _{ij}\}_{i,j\in \{1,\ldots ,d\}}\) on \({\mathbb {R}}^{d+1}\) with the following properties:

The gradient fields are stationary, finite energy random processes with vanishing expectation: for each \(i,j,k\in \{1,\ldots ,d\}\),

$$\begin{aligned} \left\langle \left| \nabla \phi _i\right| ^2 \right\rangle + \left\langle \left| \nabla \psi _i\right| ^2 \right\rangle + \left\langle \left| \nabla \sigma _{ijk}\right| ^2 \right\rangle +\left\langle \left| \partial _t\zeta _{ij}\right| ^2 \right\rangle \le C, \end{aligned}$$

and

$$\begin{aligned} \left\langle \nabla \phi _i \right\rangle = \left\langle \nabla \psi _i \right\rangle = \left\langle \nabla \sigma _{ijk} \right\rangle =\left\langle \partial _t \zeta _{ij} \right\rangle = 0. \end{aligned}$$

For each \(i\in \{1,\ldots ,d\}\), the field \(\sigma _i=(\sigma _{ijk})_{j,k\in \{1,\ldots ,d\}}\) is skew-symmetric in its last two indices: for each \(i,j,k\in \{1,\ldots ,d\}\),

$$\begin{aligned} \sigma _{ijk} = -\sigma _{ikj}. \end{aligned}$$

The fields \(\psi \) and \(\sigma \) are stationary in time: for each \(x\in {\mathbb {R}}^d\), \(t\in {\mathbb {R}}\) and \(a\in \varOmega \),

$$\begin{aligned} \psi (x,t;a)=\psi \left( x,0;a(\cdot ,\cdot +t)\right) , \end{aligned}$$

and

$$\begin{aligned} \sigma (x,t;a)=\sigma \left( x,0;a(\cdot ,\cdot +t)\right) . \end{aligned}$$

Furthermore, for \(\left\langle \cdot \right\rangle \)-a.e. a, the following equations are satisfied in the sense of distributions on \({\mathbb {R}}^{d+1}\). The field \(\phi \) satisfies (7): for each \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} \phi _{i,t}=\nabla \cdot a(\nabla \phi _i+e_i). \end{aligned}$$

The potential part of the flux is corrected by \(\psi \) according to (19): for each \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} \varDelta \psi _i=\nabla \cdot q_i. \end{aligned}$$

The field \(\sigma \) corrects the divergence-free part of the flux according to (20): for each \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} \nabla \cdot \sigma _i= q_i -\nabla \psi _i-\left\langle q_i\;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle , \end{aligned}$$

where \(\left\langle \cdot \;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle \) denotes the conditional expectation with respect to the sub-sigma-algebra \({\mathcal {F}}_{{\mathbb {R}}^d}\subset {\mathcal {F}}\) of subsets of \(\varOmega \) which are invariant with respect to spatial translations of the coefficient field. Furthermore, \(\sigma \) is constructed according to the choice of gauge, for each \(i,j,k\in \{1,\ldots ,d\}\),

$$\begin{aligned} \varDelta \sigma _{ijk}=\partial _k(q_i-\nabla \psi _i)_j-\partial _j(q_i-\nabla \psi _i)_k. \end{aligned}$$

The field \(\zeta \) corrects the oscillation of the conditional expectation about its mean: for each \(i\in \{1,\ldots ,d\}\), the random vector field \(\zeta _i\) is constant in space and satisfies

$$\begin{aligned} \partial _t\zeta _i=\left\langle q_i\;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle -\left\langle q_i \right\rangle =\left\langle q_i\;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle -a_{\mathrm {hom}}e_i. \end{aligned}$$

Finally, the homogenized coefficient field \(a_{\mathrm {hom}}\) defined in (11) is bounded and uniformly elliptic: for each \(\xi \in {\mathbb {R}}^d\),

$$\begin{aligned} \lambda \left| \xi \right| ^2\le \xi \cdot a_{\mathrm {hom}}\xi \;\;\text {and}\;\;\left| a_{\mathrm {hom}}\xi \right| \le \frac{1}{\lambda } \left| \xi \right| . \end{aligned}$$

The following two propositions effectively split the probabilistic and deterministic aspects of the paper. Proposition 1 contains the probabilistic parts, and uses the stationarity and ergodicity of the ensemble to prove that the large-scale \(L^2\)-averages of \((\phi ,\psi ,\sigma )\) are sublinear with respect to the spatial scaling and that those of \(\zeta \) are sublinear with respect to the time scaling. This fact is essentially classical for the case of the correctors \(\phi \) and \(\zeta \), although a new argument for the sublinearity of \(\phi \) is presented which may be of independent interest. A new argument is required to prove the sublinearity of \(\sigma \) and \(\psi \).

The difference is the following. The corrector \(\phi \) is, in general, not stationary in either space or time but Eq. (7) yields some control over both its spatial and temporal derivatives. Similarly, the corrector \(\zeta \) has an explicit, stationary time derivative but is itself not stationary. In the second case, since Eqs. (19) and (20) yield only the spatial regularity for \(\psi \) and \(\sigma \), it is necessary to use the fact that both fields are stationary in time in order to obtain the convergence.

In fact, the following proposition will prove the sublinearity of the normalized corrector where, in the case of \(\phi \), the components are normalized by their large-scale averages on a parabolic cylinder, in the case of \(\zeta \), using the fact that the Sobolev embedding implies that \(\zeta \) is continuous, the components are normalized by their values at time zero and, in the case of \(\psi \) and \(\sigma \), the functions are normalized, for each fixed time, by their large-scale averages on a ball. This is in fact equivalent to the sublinearity of the corrector \((\phi ,\psi ,\sigma ,\zeta )\) without a normalization, see for instance [12, Lemma 2], but since this observation is not necessary for the arguments of the paper it is omitted.

For an arbitrary function \(\varphi :{\mathbb {R}}^{d+1}\rightarrow {\mathbb {R}}\), define, for each \(R>0\) and \(t\in {\mathbb {R}}\),

The precise normalization considered and the corresponding sublinearity are contained in the following proposition.

Proposition 1

Suppose that the ensemble \(\left\langle \cdot \right\rangle \) satisfies (2), (3), (4), and (5). Then, for \(\left\langle \cdot \right\rangle \)-a.e a, the corrector \((\phi ,\psi ,\sigma )\) is strictly sublinear with respect to the spatial scaling and the corrector \(\zeta \) is strictly sublinear with respect to the time scaling in the sense that, for each \(i,j,k\in \{1,\ldots ,d\}\),

(23)

and

(24)

Furthermore, for \(\left\langle \cdot \right\rangle \)-a.e. a, for each \(i\in \{1,\ldots ,d\}\), the large-scale \(L^2\)-averages of the components of the flux satisfy

(25)

It is important to observe at this point that Eqs. (19) and (20) defining \(\psi \) and \(\sigma \) are invariant if either \(\psi \) or \(\sigma \) is altered by a time stationary constant. This explains why in (23), for each \(R>0\), it is possible and, for our arguments required, to allow for a time-dependent normalization. The Eqs. (7) and (22) defining \(\phi \) and \(\zeta \) are not likewise invariant, and therefore the corresponding normalizations appearing in (23) and (24) are necessarily achieved by subtracting a true constant.

The deterministic aspect of the paper uses a Campanato iteration, which takes the conclusion of Proposition 1 as input. Namely, it will be shown that, for any \(\alpha \in (0,1)\), the parabolic excess decays like a power law in the radius as soon as the quantities appearing in (23) and (24) are sufficiently small and as soon as (25) is sufficiently close to its expectation. This is to say that there exists a random but \(\left\langle \cdot \right\rangle \)-a.e. a finite radius \(r_*(a)\) such that, whenever \(r_*<r<R<\infty \), for every a-caloric function u in \({\mathcal {C}}_R\), the parabolic excess satisfies, for \(C_1=C_1(\alpha ,d,\lambda )>0\),

$$\begin{aligned} \text {Exc}(u;r)\le C_1\left( \frac{r}{R}\right) ^{2\alpha }\text {Exc}(u;R). \end{aligned}$$

This is the content of the following proposition.

Proposition 2

Suppose that the ensemble \(\left\langle \cdot \right\rangle \) satisfies (2), (3), (4), and (5). Fix a Hölder exponent \(\alpha \in (0,1)\). Then, there exist constants \(C_0=C_0(\alpha , d,\lambda )>0\) and \(C_1(\alpha ,d,\lambda )>0\) with the following property:

If \(R_1<R_2\) are two radii such that, for each \(R\in [R_1,R_2]\) and for each \(i,j,k\in \{1,\ldots ,d\}\),

and

and such that, for each \(i\in \{1,\ldots ,d\}\) and \(R\in [R_1,R_2]\),

then any distributional solution u to the parabolic equation

$$\begin{aligned} u_t=\nabla \cdot a\nabla u\;\;\text {in}\;\;{\mathcal {C}}_{R_2} \end{aligned}$$

satisfies

$$\begin{aligned} \mathrm {Exc}(u;R_1)\le C_1\left( \frac{R_1}{R_2}\right) ^{2\alpha } \mathrm {Exc}(u;R_2). \end{aligned}$$

The proof of Proposition 2 is motivated by the proof of [18, Lemma 2] from the elliptic setting. There, the flux correction \(\sigma ^\text {ell}\) was used to express the residuum of the homogenization error in a useful divergence form. That is, for \(R>0\), given an \(a^\text {ell}\)-harmonic function u in \(B_R\), define its \(a_{\mathrm {hom}}^\text {ell}\)-harmonic extension v into \(B_R\) to be the solution

$$\begin{aligned} \left\{ \begin{array}{ll} -\nabla \cdot a_{\mathrm {hom}}^{\text {ell}}\nabla v =0 &{}\quad \text {in}\;\;B_R \\ v =u &{}\quad \text {on}\;\;\partial B_R.\end{array}\right. \end{aligned}$$

Then, for a smooth cutoff function \(\eta \) vanishing along the boundary \(\partial B_R\), define the augmented homogenization error \(w^\text {ell}\) to be the following modification of the classical two-scale expansion

$$\begin{aligned} w^\text {ell}:=u-(1+\eta \phi ^\text {ell}_i\partial _i)v, \end{aligned}$$

where the cutoff is used in order to guarantee the difference \(w^\text {ell}\) vanishes on the boundary.

It was proven that the augmented homogenization error \(w^\text {ell}\) satisfies, in \(B_R\),

$$\begin{aligned} -\nabla \cdot a^\text {ell}\nabla w^\text {ell}= \nabla \cdot \left( (1-\eta )(a^\text {ell}-a_{\mathrm {hom}}^\text {ell})\nabla v+(\phi ^\text {ell}_ia^\text {ell}-\sigma ^\text {ell}_i)\nabla (\eta \partial _iv)\right) \end{aligned}$$
(26)

which, by testing the equation with \(w^\text {ell}\), yields a useful energy estimate that provides the starting point for a Campanato iteration.

In particular, by analyzing the right hand side of (26), the energy of \(w^\text {ell}\) can be controlled by the growth of the extended corrector \((\phi ^\text {ell},\sigma ^\text {ell})\), the choice of the cutoff function \(\eta \) and the interior and boundary regularity of the \(a_{\mathrm {hom}}^\text {ell}\)-harmonic function v. The argument is completed by observing that, owing to the regularity of \(a_{\mathrm {hom}}^\text {ell}\)-harmonic functions, the energy of the homogenization error is a good approximation for the excess.

The methods of this paper apply the same philosophy to the parabolic setting. However, similarly to what was done in the proof of [12, Theorem 2], it is furthermore necessary to introduce a spatial regularization of the a-caloric function u. The purpose of this is to quantify the regularity in time, since such functions are already sufficiently regular in space. Precisely, if u is an a-caloric function then, in general, its time derivative \(u_t\in H^{-1}\) and no better, where \(H^{-1}\) denotes the dual space of the Sobolev space \(H^1\). However, for every \(\epsilon >0\), if \(u^\epsilon \) denotes the spatial convolution of u on scale \(\epsilon >0\), then it is possible to show that \(u^\epsilon _t\in L^2\) with a precise quantification of the \(L^2\)-norm of \(u^\epsilon _t\) in terms of the energy of \(\nabla u\), see Sect. 5.1 below. This additional approximation is necessary in order to quantify the boundary estimate of Sect. 5.3, which is necessary for the Campanato iteration and the quantitative homogenization result contained in Proposition 3 below.

For \(\epsilon >0\), the \(a_{\mathrm {hom}}\)-caloric function \(v^\epsilon \) will then be the \(a_{\mathrm {hom}}\)-caloric extension of the spatial regularization \(u^\epsilon \) into \({\mathcal {C}}_R\). Namely, for \(R>0\) and \(\epsilon >0\), given an a-caloric function u in \({\mathcal {C}}_{R+\epsilon }\), define the \(a_{\mathrm {hom}}\)-caloric extension \(v^\epsilon \) of \(u^\epsilon \) into \({\mathcal {C}}_R\) to be the solution

$$\begin{aligned} \left\{ \begin{array}{ll} v^\epsilon _t =\nabla \cdot a_{\mathrm {hom}}\nabla v^\epsilon &{}\quad \text {in}\;\;{\mathcal {C}}_R \\ v^\epsilon = u^\epsilon &{}\quad \text {on}\;\;\partial _p{\mathcal {C}}_R,\end{array}\right. \end{aligned}$$
(27)

where \(\partial _p{\mathcal {C}}_R\) denotes the parabolic boundary

$$\begin{aligned} \partial _p{\mathcal {C}}_R:=\left( B_R\times \{-R^2\}\right) \cup \left( \partial B_R\times [-R^2,0]\right) . \end{aligned}$$

Then, again motivated by the classical two-scale expansion, for \(\epsilon >0\) and a smooth cutoff function \(\eta \) vanishing on the parabolic boundary \(\partial _p{\mathcal {C}}_R\), the augmented homogenization error w will be defined as

$$\begin{aligned} w:=u-(1+\eta \phi _i\partial _i)v^\epsilon . \end{aligned}$$

It will be shown in the proof of Proposition 2 that the augmented homogenization error satisfies, in \({\mathcal {C}}_R\),

$$\begin{aligned} w_t-\nabla \cdot a\nabla w= & {} \nabla \cdot \left( (1-\eta )(a-a_{\mathrm {hom}})\nabla v^\epsilon +(\phi _ia+\psi _i-\sigma _i)\nabla (\eta \partial _iv^\epsilon )\right) \nonumber \\&+\, \partial _t\zeta _i\cdot \nabla (\eta \partial _iv^\epsilon )-\phi _i \left( \eta \partial _iv^\epsilon \right) _t-\psi _i\varDelta (\eta \partial _iv^\epsilon ), \end{aligned}$$
(28)

with

$$\begin{aligned} w=u-u^\epsilon \;\;\text {on}\;\;\partial _p{\mathcal {C}}_R. \end{aligned}$$

As in the elliptic setting, the energy estimate obtained by testing this equation with w, for an appropriately chosen cutoff \(\eta \), will be the starting point of the Campanato iteration used to control the decay of the excess. In this case, there is a contribution from the boundary, which will be controlled first by fixing \(\epsilon >0\) small. From the right hand side of (28), the energy of the homogenization error will then be controlled by the growth of the extended parabolic corrector \((\phi ,\psi ,\sigma )\) and, after integrating in parts by time, the growth of \(\zeta \) and q. It is furthermore necessary to make a good choice for the cutoff function \(\eta \) and to use the interior and boundary regularity of the \(a_{\mathrm {hom}}\)-caloric function \(v^\epsilon \). The argument is completed by observing that, owing to the interior regularity of \(a_{\mathrm {hom}}\)-caloric functions, the homogenization error w provides a good approximation for the excess.

The following proposition makes the previous intuition rigorous, and its proof is contained in Sect. 5.4 of the proof of Proposition 2. In the following, the constant \(\epsilon >0\) quantifies the regularization on the boundary, and the constant \(\rho >0\) quantifies a boundary layer introduced by a cutoff function. This is done in the proof to exploit the interior regularity of \(a_{\mathrm {hom}}\)-caloric functions, and in order to impose useful boundary conditions for the homogenization error.

Proposition 3

Suppose that the ensemble \(\left\langle \cdot \right\rangle \) satisfies (2), (3), (4), and (5). Let \(a\in \varOmega \) and \(R>0\) be arbitrary. For every a-caloric function u on \({\mathcal {C}}_R\) and \(\epsilon \in (0,\frac{R}{4})\), there exists \(R_\epsilon =R_\epsilon (u,\epsilon )\in \left( \frac{R}{2},\frac{3R}{4}\right) \) such that, for the \(a_{\mathrm {hom}}\)-caloric extension \(v^\epsilon \) of u on \({\mathcal {C}}_{R_\epsilon }\) in the sense of (27), for the homogenization error

$$\begin{aligned} w:=u-(1+\phi _i\partial _i)v^\epsilon , \end{aligned}$$

there exists \(C=C(d,\lambda )>0\) such that, for \(\left\langle \cdot \right\rangle \)-a.e. a, for each \(\rho \in (0,\frac{1}{8})\),

for the extended corrector \((\phi ,\psi ,\sigma ,\zeta )\) from Lemma 1.

Finally, the following parabolic Caccioppoli inequality will be used in the proofs of Theorem 1 and Proposition 2. The proof is classical, and is included for the convenience of the reader.

Lemma 2

Suppose that \(\left\langle \cdot \right\rangle \) satisfies (5). There exists \(C=C(\lambda )>0\) such that, for \(\left\langle \cdot \right\rangle \)-a.e. a, for every \(R>0\) and distributional solution u of the equation

$$\begin{aligned} u_t=\nabla \cdot a\nabla u\;\;\text {in}\;\;{\mathcal {C}}_R, \end{aligned}$$

and for every \(c\in {\mathbb {R}}\) and \(\rho \in (0,\frac{R}{2})\),

$$\begin{aligned} \int _{{\mathcal {C}}_{R-\rho }}\left| \nabla u\right| ^2\le \frac{C}{\rho ^2}\int _{{\mathcal {C}}_R{\setminus }{\mathcal {C}}_{R-\rho }}\left| u-c\right| ^2. \end{aligned}$$
(29)

In comparison with the elliptic setting, the qualitative homogenization theory of divergence-form operators with coefficients depending on time and space is relatively under studied. While the case of periodic coefficients has long been understood, and a full explanation can be found in the classic reference Bensoussan et al. [13, Chapter 3], the qualitative stochastic homogenization of stationary and ergodic ensembles like (1) was obtained only more recently by Rhodes [25, 26]. However, related problems were earlier handled, such as the case of a Brownian motion in the presence of a divergence-free drift, by Komorowski and Olla [19], Landim et al. [21] and Oelschläger [22]. In the discrete setting, related questions have been considered, for instance, by Andres [1], Bandyopadhyay and Zeitouni [10] and Rassoul-Agha and Seppäläinen [24] in the uniformly elliptic setting and, for degenerate environments, by Andres et al. [2].

The quantitative homogenization of such ensembles has only recently been considered, and the preprint [3] contains, to our knowledge, the first results in this direction. In particular, in [3, Theorem 1.2], a full hierarchy of Liouville theorems is obtained for ensembles satisfying a finite-range dependence in space and time. Their method is motivated by the work of Armstrong and Smart [6] from the elliptic setting, which adapted the approach of Avellaneda and Lin [7] from the context of periodic homogenization.

In [7], a full hierarchy of Liouville properties was established for uniformly elliptic and periodic coefficient fields based upon the previous works Avellaneda and Lin [8, 9], which developed a large-scale regularity theory in Hölder and \(L^p\)-spaces. In [6], the approach of [7] was adapted to stationary and ergodic ensembles satisfying a finite-range dependence. Their proof, which obtained a large-scale \(C^{0,1}\)-regularity theory, was based upon a variational approach and the quantification of the convergence of certain sub-additive and super-additive energies. Their work was later extended by Armstrong and Mourrat [5] to more general mixing conditions, and subsequently gave rise to a significant literature on the subject. The interested reader is pointed to the recent monograph Armstrong et al. [4], and the references therein.

The approach of this paper follows closely the work [18], which derived, for uniformly elliptic ensembles, a large-scale \(C^{1,\alpha }\)-regularity estimate and first-order Liouville property under the qualitative assumptions of stationarity and ergodicity. The method was based upon the introduction of an intrinsic notion of excess, as defined in (16), as well as the construction of the flux correction \(\sigma ^\text {ell}\) defined in (17). The introduction of \(\sigma ^\text {ell}\) was used to prove that the homogenization error solves the divergence-form Eq. (26), which provided the starting point for a Campanato iteration as explained above.

Subsequently, Fischer and Otto [16] obtained a full hierarchy of Liouville properties under a mild quantification of the ergodicity. In Fischer and Otto [17], the necessary quantification of ergodicity from [16] was shown to be satisfied by a general class of Gaussian environments. However, absent some mild quantification of ergodicity in the sense of either [5] or [16], the existence of higher order Liouville and large-scale regularity statements remains an open question.

Finally, motivated by the work of Chiarini and Deuschel [15], the Bella et al. [12] derived a large-scale \(C^{1,\alpha }\)-regularity theory and first-order Liouville theorem for degenerate elliptic equations, where the boundedness and uniform ellipticity (5) was replaced by certain moment conditions. It is expected that the results of this paper can be similarly extended to degenerate environments, and the setting of [2] will serve as the starting point for future work.

In principle, one could also hope to combine the methods of this paper with those of [18], in the presence of a logarithmic Sobolev inequality like that used in [18, Theorem 1], to obtain more quantitative information. For example, the minimal radius \(r_*(a)>0\) quantifying the first scale for which the assumptions of Proposition 2 are satisfied, and which effectively defines the initial scale on which the \(C^{1,\alpha }\)-regularity of Proposition 2 begins to take effect, is expected to have stretched exponential moments in the sense of [18, Theorem 1]. Furthermore, again assuming a logarithmic Sobolev inequality, it should be possible to obtain a quantitative two-scale expansion for a-caloric functions like [18, Corollary 3]. Lastly, following the methods of [16], it may be possible to prove higher order Liouville statements under a mild quantification of the ergodicity.

The paper is organized as follows. The proofs are presented in the order of their appearance: Theorem 1, Lemma 1, Propositions 1, 2 and Lemma 2. In order to simplify the notation, the statements and proofs are written for the non-symmetric scalar setting. However, at the cost of increasing some constants, all of the arguments carry through unchanged for non-symmetric systems. Throughout, the notation \(\lesssim \) is used to denote a constant whose dependencies are specified in every case by the statement of the respective theorem, proposition or lemma.

2 The proof of Theorem 1

Fix a coefficient field a satisfying the conclusions of Lemma 1, Propositions 1 and 2, and suppose that u is a distributional solution of

$$\begin{aligned} u_t=\nabla \cdot a\nabla u\;\;\text {on}\;\;{\mathbb {R}}^d\times (-\infty ,0), \end{aligned}$$

which is strictly subquadratic in the sense that, for some \(\alpha \in (0,1)\),

Fix \(C_0=C_0(\alpha ,d,\lambda )>0\) satisfying the hypothesis of Proposition 2. Then, using Proposition 1, fix \(R_0>0\) such that, for every \(R>R_0\), for each \(i,j,k\in \{1,\ldots ,d\}\),

and

and such that, for each \(i\in \{1,\ldots ,d\}\) and \(R>R_0\),

The definition of the excess, Proposition 2 and the Caccioppoli inequality (29) imply that, for each \(R_0<\rho <R\),

Therefore, since u is strictly subquadratic,

This implies that, for every \(\rho >0\),

$$\begin{aligned} \mathrm {Exc}(u;\rho )=0. \end{aligned}$$

It is then immediate from the definition of parabolic excess (14), since \({\mathcal {C}}_{R_1}\subset {\mathcal {C}}_{R_2}\) whenever \(R_1<R_2\), that there exists \(\xi \in {\mathbb {R}}^d\) for which the difference

$$\begin{aligned} z(x,t):=u(x,t)-\xi \cdot x-\phi _\xi (x,t) \end{aligned}$$

satisfies

$$\begin{aligned} \nabla z=0\;\;\text {in}\;\;{\mathbb {R}}^d\times (-\infty ,0). \end{aligned}$$

However, because z is a distributional solution of

$$\begin{aligned} z_t=\nabla \cdot a\nabla z \;\;\text {in}\;\;{\mathbb {R}}^d\times (-\infty ,0), \end{aligned}$$

it follows that z is necessarily constant in time as well. Therefore, there exists \(c\in {\mathbb {R}}\) such that \(u=c+\xi \cdot x+\phi _\xi \), which completes the argument.

3 The proof of Lemma 1

The construction of the corrector \((\phi ,\psi ,\sigma ,\zeta )\) will be achieved by lifting the relevant Eqs. (7), (19), (20) and (22) to the probability space \(\varOmega \), and thereby identifying \(\phi \) by its stationary, finite energy gradient and time derivative, \(\psi \) and \(\sigma \) by their stationary, finite energy gradients, and \(\zeta \) by its stationary time derivative. For this, it is necessary to define the horizontal derivative of a random variable as induced by shifts of the coefficient field in space and time. Then, these will be used to define an analogue of the Sobolev space \(H^1\) on the probability space.

Given an \(L^2(\varOmega )\) random variable f, define, for each \(i\in \{1,\ldots ,d\}\), the horizontal derivative

$$\begin{aligned} D_if(a):=\lim _{h\rightarrow 0}\frac{f\left( a(\cdot +he_i,\cdot )\right) -f(a)}{h}, \end{aligned}$$
(30)

where the above limit is understood in the sense of strong \(L^2(\varOmega )\)-convergence. Of course, it is not true in general that the above limit exists for every \(f\in L^2(\varOmega )\), but the horizontal derivatives are closed, densely defined operators on \(L^2(\varOmega )\), see [23], with domains, for \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} {\mathcal {D}}(D_i):=\left\{ f\in L^2(\varOmega )\;|\;D_if\;\;\text {exists as an element of}\;\;L^2(\varOmega )\right\} . \end{aligned}$$

Similarly, for each \(f\in L^2(\varOmega )\), define the horizontal time derivative

$$\begin{aligned} D_0f(a):=\lim _{h\rightarrow 0}\frac{f\left( a(\cdot ,\cdot +h)\right) -f(a)}{h}, \end{aligned}$$
(31)

which is a closed, densely defined operator on \(L^2(\varOmega )\) with domain

$$\begin{aligned} {\mathcal {D}}(D_0):=\left\{ f\in L^2(\varOmega )\;|\;D_0f\;\;\text {exists as an element of}\;\;L^2(\varOmega )\right\} . \end{aligned}$$

The analogue of the Hilbert space \(H^1\) is then defined on the probability space as the intersection

$$\begin{aligned} {\mathcal {H}}^1:=\cap _{i=0}^d{\mathcal {D}}(D_i), \end{aligned}$$

equipped with the inner product, for each \(f,g\in {\mathcal {H}}^1\),

$$\begin{aligned} (f,g)_{{\mathcal {H}}^1}=\left\langle fg \right\rangle +\left\langle D_0f D_0g \right\rangle +\left\langle Df\cdot Dg \right\rangle , \end{aligned}$$

for the horizontal spatial gradient

$$\begin{aligned} Df:=(D_1f,\ldots , D_df). \end{aligned}$$

Finally, define the space of spatial potentials

$$\begin{aligned} L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d):=\overline{\left\{ Df\;|\;f\in {\mathcal {H}}^1\right\} }^{L^2(\varOmega )\text {-weak}}, \end{aligned}$$
(32)

as the \(L^2(\varOmega )\)-weak closure of spatial gradients arising from \({\mathcal {H}}^1\) functions. Indeed, it is immediate from the weak convergence that elements of \(L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d)\) are potentials in the sense that every \(F=(F_1,\ldots ,F_d)\in L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d)\) satisfies the distributional equality, for each \(i,j\in \{1,\ldots ,d\}\),

$$\begin{aligned} D_iF_j=D_jF_i. \end{aligned}$$

In other words, every \(F\in L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d)\) is curl-free.

The following general fact about potential vector fields will be used in the construction of \(\sigma \) and to prove the sublinearity for the corrector \((\phi , \psi , \sigma , \zeta )\). It will be shown that, with respect to the sub-sigma-algebra of subsets that are invariant with respect to spatial translations of the coefficient fields, the conditional expectation of a potential vector field vanishes as a random variable.

Lemma 3

For every \(F=(F_1,\ldots ,F_d)\in L^2_\text {pot}(\varOmega ;{\mathbb {R}}^d)\), for each \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} \left\langle F_i\;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle =0\;\;\text {in}\;\;L^2(\varOmega ), \end{aligned}$$
(33)

where \(\left\langle \cdot \;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle \) denotes the conditional expectation with respect to the sub-sigma-algebra \({\mathcal {F}}_{{\mathbb {R}}^d}\subset {\mathcal {F}}\) of subsets which are invariant with respect to spatial translations of the coefficient field.

In particular, for every \(F=(F_1,\ldots ,F_d)\in L^2_\text {pot}(\varOmega ;{\mathbb {R}}^d)\), for every \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} \left\langle F_i \right\rangle =0. \end{aligned}$$

The proof of Lemma 3 now follows. Let \(F=(F_1,\ldots ,F_d)\in L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d)\) be arbitrary. Owing to the definition of the conditional expectation, it is sufficient to show that, for every \({\mathcal {F}}_{{\mathbb {R}}^d}\)-measurable function \(g\in L^2(\varOmega )\), for each \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} \left\langle F_ig \right\rangle =0. \end{aligned}$$
(34)

To prove (34), owing to definition (32) there exists a sequence of functions \(\{\varphi _n\}_{n=1}^\infty \subset {\mathcal {H}}^1\) such that, as \(n\rightarrow \infty \),

$$\begin{aligned} D\varphi _n\rightharpoonup F\;\;\text {weakly in}\;\;L^2(\varOmega ;{\mathbb {R}}^d). \end{aligned}$$
(35)

The weak convergence (35) implies that, for each \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} \lim _{n\rightarrow \infty }\left\langle \left( D_i\varphi _n\right) g \right\rangle =\left\langle F_i g \right\rangle . \end{aligned}$$
(36)

Then, for each \(n\ge 1\), since \(\varphi _n\in {\mathcal {H}}^1\), it follows that, since spatial translations of the coefficient field preserve the measure, see (2), for each \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} \begin{aligned} \left\langle \left( D_i\varphi _n\right) g \right\rangle =&\lim _{h\rightarrow 0}\frac{1}{h}\left\langle \left( \varphi _n\left( a(\cdot +he_i,\cdot )\right) -\varphi _n(a)\right) g \right\rangle \\ =&\lim _{h\rightarrow 0}\frac{1}{h}\left\langle \varphi _n\left( g\left( a(\cdot -he_i,\cdot )\right) -g(a)\right) \right\rangle =0, \end{aligned} \end{aligned}$$
(37)

where the final equality follows from the fact that \(g\in L^2(\varOmega )\) and the fact that g is invariant with respect to spatial shifts of the coefficient field as an \({\mathcal {F}}_{{\mathbb {R}}^d}\)-measurable function. In combination, (36) and (37) imply (34). Since the \({\mathcal {F}}_{{\mathbb {R}}^d}\)-measurable \(g\in L^2(\varOmega )\) and \(F\in L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d)\) were arbitrary, this completes the proof of (33).

The final statement is then immediate since, for every \(F\in L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d)\) and \(i\in \{1,\ldots ,d\}\), the conditional expectation satisfies

$$\begin{aligned} \left\langle F_i \right\rangle =\left\langle \left\langle F_i\;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle \right\rangle =0, \end{aligned}$$

where the final equality follows from (33) and completes the proof of Lemma 3.

3.1 The construction of \(\phi \)

For the construction of the corrector \(\phi \), it is sufficient to construct, for each \(k\in \{1,\ldots ,d\}\), a stationary gradient \(D\phi _k\in L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d)\) and a stationary time-derivative \(D_0\phi _k\in {\mathcal {H}}^{-1}\), where \({\mathcal {H}}^{-1}\) denotes the dual-space of \({\mathcal {H}}^1\), satisfying

$$\begin{aligned} \left\langle D_0\phi _kf \right\rangle +\left\langle Df\cdot a(D\phi _k+e_k) \right\rangle =0\;\;\text {for every}\;\;f\in {\mathcal {H}}^1. \end{aligned}$$
(38)

The corrector \(\phi \) will then be defined on \({\mathbb {R}}^{d+1}\), for \(\left\langle \cdot \right\rangle \)-a.e. a, by integration. Observe that in order to define \(\phi \) via integration it is necessary to choose a base point, and it is this choice that ruins the stationarity.

We remark that the following proof is essentially the Lax–Milgram argument, where the only small subtlety is that the operator defining Eq. (38) is not coercive for \({\mathcal {H}}^1\), since it is not coercive with respect to the horizontal derivative in time.

The first step is to introduce an approximation of (38) which is coercive with respect to the \({\mathcal {H}}^1\)-norm. The Riesz representation theorem and the uniform ellipticity of the ensemble (5) guarantee that, for each \(k\in \{1,\ldots ,d\}\) and \(\beta \in (0,1)\), there exists a unique element \(\phi ^\beta _k\in {\mathcal {H}}^1\) satisfying, for every \(f\in {\mathcal {H}}^1\),

$$\begin{aligned} \beta \left\langle \phi ^\beta _kf \right\rangle +\beta \left\langle D_0\phi ^\beta _kD_0f \right\rangle +\left\langle D_0\phi ^\beta _kf \right\rangle +\left\langle Df\cdot a(D\phi ^\beta _k+e_k) \right\rangle =0. \end{aligned}$$
(39)

Therefore, for each \(k\in \{1,\ldots ,d\}\) and \(\beta \in (0,1)\), after testing (39) with \(\phi ^\beta _k\), the uniform ellipticity of the ensemble and Hölder’s inequality yield the estimate

$$\begin{aligned} \left\langle \left| D\phi ^\beta _k\right| ^2 \right\rangle +\beta \left\langle \left| \phi ^\beta _k\right| ^2+\left| D_0\phi ^\beta _k\right| ^2 \right\rangle \lesssim 1, \end{aligned}$$
(40)

where the fact that

$$\begin{aligned} \left\langle D_0\phi ^\beta _k\phi ^\beta _k \right\rangle =\frac{1}{2}\left\langle D_0(\phi ^\beta _k)^2 \right\rangle =\lim _{h\rightarrow 0}\frac{1}{2h}\left\langle (\phi ^\beta _k)^2\left( a(\cdot ,\cdot +h)\right) -(\phi ^\beta _k)^2(a) \right\rangle =0 \end{aligned}$$
(41)

is also used to obtain (40), and follows from the fact that shifts of the coefficient field in time and space preserve the underlying measure of the ensemble, see (2), and since \(\phi ^\beta _k\in L^2(\varOmega )\).

Then, for each \(k\in \{1,\ldots ,d\}\) and \(\beta \in (0,1)\), Eq. (39), Hölder’s inequality and (40) imply that, for each \(f\in {\mathcal {H}}^1\),

$$\begin{aligned} \left| \left\langle D_0\phi ^\beta _kf \right\rangle \right| \lesssim ||f||_{{\mathcal {H}}^1}\;\;\text {and, therefore,}\;\;||D_0\phi ^\beta _k||_{{\mathcal {H}}^{-1}}\lesssim 1. \end{aligned}$$
(42)

Hence, for each \(k\in \{1,\ldots ,d\}\), the definition of the potential space (32) with estimates (40) and (42) imply that there exist \(\varPsi _k\in L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d)\) and \(\xi _k\in {\mathcal {H}}^{-1}\) such that, after passing to a subsequence \(\{\beta _j\rightarrow 0\}_{j=1}^\infty \), as \(j\rightarrow \infty \),

$$\begin{aligned} D\phi ^{\beta _j}_k\rightharpoonup \varPsi _k\;\;\text {weakly in}\;\;L^2(\varOmega ;{\mathbb {R}}^d)\;\;\text {and}\;\;D_0\phi ^{\beta _j}_k\rightharpoonup \xi _k\;\;\text {weakly in}\;\;{\mathcal {H}}^{-1}. \end{aligned}$$
(43)

The convergence (43) combined with Eq. (39) and estimate (40) prove that, for each \(k\in \{1,\ldots ,d\}\),

$$\begin{aligned} \left\langle \xi _k f \right\rangle +\left\langle Df\cdot a(\varPsi _k+e_k) \right\rangle =0\;\;\text {for every}\;\;f\in {\mathcal {H}}^1. \end{aligned}$$

Finally, for each \(k\in \{1,\ldots ,d\}\), as weak limits of functions \(\phi ^\beta _k\in {\mathcal {H}}^1\), the pair \((\varPsi _k,\xi _k)\) are curl-free in the sense that, for each \(i\in \{1,\ldots ,d\}\), distributionally

$$\begin{aligned} D_0\varPsi _{ki}=D_i\xi _k. \end{aligned}$$

Therefore, for each \(k\in \{1,\ldots ,d\}\), the argument is completed by defining \(D\phi _k:=\varPsi _k\) and \(D_0\phi _k:=\xi _k\).

3.2 The construction of \(\psi \)

For each \(k\in \{1,\ldots ,d\}\), let \(D\phi _k\in L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d)\) denote the stationary gradient corresponding to \(\phi _k\), which was constructed in the previous step. Then, for each \(k\in \{1,\ldots ,d\}\), define the lift of the flux \(q_k\) to the probability space according to the rule

$$\begin{aligned} Q_k:=a(D\phi _k+e_k). \end{aligned}$$
(44)

The existence of \(\psi \) follows from the following general fact, which follows again from a small modification of the Lax–Milgram argument.

Lemma 4

For every \(F\in L^2(\varOmega ;{\mathbb {R}}^d)\), there exists \(\varPsi \in L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d)\) satisfying

$$\begin{aligned} \left\langle \varPsi \cdot Df \right\rangle =\left\langle F\cdot Df \right\rangle \;\;\text {for every}\;\;f\in {\mathcal {H}}^1. \end{aligned}$$
(45)

The existence of \(\psi \) follows from Lemma 4 in the following way. For each \(k\in \{1,\ldots ,d\}\), choose \(F=Q_k\) and define \(D\psi _k:=\varPsi \), which defines \(\psi _k\), for \(\left\langle \cdot \right\rangle \)-a.e. a, as a function on \({\mathbb {R}}^d\) via integration. In this case, it is the choice of spatial base point that destroys the spatial stationarity of \(\psi \). However, for \(\left\langle \cdot \right\rangle \)-a.e. a, for each \(k\in \{1,\ldots ,d\}\), the function \(\psi _k\) can then be extended to \({\mathbb {R}}^{d+1}\) as a stationary function in time.

In order to prove Lemma 4, the Riesz representation theorem asserts that, for each \(\beta \in (0,1)\), there exists a unique \(\psi ^\beta \in {\mathcal {H}}^1\) satisfying

$$\begin{aligned} \beta \left\langle \psi ^\beta f \right\rangle +\beta \left\langle D_0\psi ^\beta D_0f \right\rangle +\left\langle D\psi ^\beta \cdot Df \right\rangle =\left\langle F\cdot Df \right\rangle \;\;\text {for every}\;\;f\in {\mathcal {H}}^1. \end{aligned}$$
(46)

For each \(\beta \in (0,1)\), after testing (46) with \(\psi ^\beta \), Hölder’s inequality and Young’s inequality yield the estimate

$$\begin{aligned} \left\langle \left| D\psi ^\beta \right| ^2 \right\rangle +\beta \left\langle \left| \psi ^\beta \right| ^2+\left| D_0\psi ^\beta \right| ^2 \right\rangle \lesssim 1. \end{aligned}$$
(47)

Therefore, the definition of the potential space (32) and estimate (47) imply that there exists \(\varPsi \in L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d)\) such that, after passing to a subsequence \(\{\beta _j\rightarrow 0\}_{j=1}^\infty \), as \(j\rightarrow \infty \),

$$\begin{aligned} D\psi ^{\beta _j}\rightharpoonup \varPsi \;\;\text {weakly in}\;\;L^2(\varOmega ;{\mathbb {R}}^d). \end{aligned}$$
(48)

In combination with Eq. (46), estimates (47) and the convergence (48) imply that \(\varPsi \) satisfies (45), which completes the proof of Lemma 4.

3.3 The construction of \(\sigma \)

For each \(k\in \{1,\ldots ,d\}\), let \(D\psi _k\in L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d)\) denote the stationary gradient corresponding to \(\psi _k\) constructed in the previous step and let \(Q_k\) denote the lift of the flux from (44). Lemma 4 applies directly to this situation, and proves that, for each \(i,j,k\in \{1,\ldots ,d\}\), there exists \(\varSigma _{ijk}\in L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d)\) satisfying

$$\begin{aligned} \begin{aligned} \left\langle \varSigma _{ijk}\cdot Df \right\rangle =&\left\langle (Q_i-D\psi _i)_k D_jf \right\rangle -\left\langle (Q_i-D\psi _i)_jD_kf \right\rangle \\ =&\left\langle \left( (Q_i-D\psi _i)_ke_j-(Q_i-D\psi _i)_je_k\right) \cdot Df \right\rangle \;\;\text {for every}\;\;f\in {\mathcal {H}}^1. \end{aligned} \end{aligned}$$
(49)

Then, for each \(i,j,k\in \{1,\ldots ,d\}\), the definition \(D\sigma _{ijk}:=\varSigma _{ijk}\) defines \(\sigma _{ijk}\), for \(\left\langle \cdot \right\rangle \)-a.e. a, on \({\mathbb {R}}^d\) via integration in space. As in the case of \(\psi \), the choice of base point ruins the spatial stationarity of \(\sigma \). However, for each \(i,j,k\in \{1,\ldots ,d\}\), for \(\left\langle \cdot \right\rangle \)-a.e. a, the corrector \(\sigma _{ijk}\) can then be extended to \({\mathbb {R}}^{d+1}\) as a stationary function in time.

Since it is clear from the proof of existence that, for each \(i,j,k\in \{1,\ldots ,d\}\), the gradients can be constructed to satisfy

$$\begin{aligned} \varSigma _{ijk}=-\varSigma _{ikj}, \end{aligned}$$

after integrating it follows that, for each \(i,j,k\in \{1,\ldots ,d\}\),

$$\begin{aligned} \sigma _{ijk}=-\sigma _{ikj}. \end{aligned}$$

Or, perhaps more simply, for each \(i\in \{1,\ldots ,d\}\), one may first construct \(\sigma _{ijk}\) for every \(j>k\in \{1,\ldots ,d\}\) and then simply define \(\sigma _{ikj}:=-\sigma _{ijk}\).

It remains to prove that, for each \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} D\cdot \sigma _i=(Q_i-D\psi _i)-\left\langle Q_i\;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle , \end{aligned}$$
(50)

where, for each \(i,j\in \{1,\ldots ,d\}\),

$$\begin{aligned} (D\cdot \sigma _i)_j=\sum _{k=1}^dD_k\sigma _{ijk}. \end{aligned}$$
(51)

To simplify the notation in what follows, define the vector, for each \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} \varPsi _i:=(Q_i-D\psi _i), \end{aligned}$$
(52)

where the construction of \(\psi _i\) guarantees that the vector \(\varPsi _i\) is divergence-free in the sense of the distributional equality

$$\begin{aligned} D_l\varPsi _{il}=0. \end{aligned}$$
(53)

Equation (50) now follows from the following distributional equalities. For each \(i,j\in \{1,\ldots ,d\}\), thanks to Eq. (49), in the sense of distributions

$$\begin{aligned} D_l(D_l(D\cdot \sigma _i)_j)=D_l(D_l(D_k\sigma _{ijk}) =D_k\left( D_l(D_l\sigma _{ijk})\right) =D_k(D_k\varPsi _{ij}-D_j\varPsi _{ik}). \end{aligned}$$

Therefore, for each \(i,j\in \{1,\ldots ,d\}\), in view of (53) and after relabeling the final integral,

$$\begin{aligned} D_l(D_l(D\cdot \sigma _i)_j)=D_k(D_k\varPsi _{ij}-D_j\varPsi _{ik})=D_k(D_k\varPsi _{ij})-D_j(D_k\varPsi _{ik})=D_l(D_l\varPsi _{ij}). \end{aligned}$$

Hence, in the sense of distributions, for each \(i,j\in \{1,\ldots ,d\}\),

$$\begin{aligned} D_l\left( D_l\left( (D\cdot \sigma _i)_j-\varPsi _{ij}\right) \right) =0. \end{aligned}$$
(54)

Equation (54) implies that, for each \(i,j\in \{1,\ldots ,d\}\), the difference

$$\begin{aligned} \left( (D\cdot \sigma _i)_j-\varPsi _{ij}\right) \end{aligned}$$

is invariant with respect to spatial translations of the coefficient field. That is, for each \(i,j\in \{1,\ldots ,d\}\),

$$\begin{aligned} \left( (D\cdot \sigma _i)_j-\varPsi _{ij}\right) =\left\langle \left( (D\cdot \sigma _i)_j-\varPsi _{ij}\right) \;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle . \end{aligned}$$
(55)

The fact that (55) implies (50) follows from Lemma 3. Indeed, for each \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} D\psi _i \in L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d), \end{aligned}$$

and by a straightforward repetition of the arguments leading to Lemma 3, for each \(i\in \{1,\ldots ,d\}\),

$$\begin{aligned} \left\langle (D\cdot \sigma _i)\;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle =0\;\;\text {in}\;\; L^2(\varOmega ;{\mathbb {R}}^d). \end{aligned}$$

Therefore, for each \(i,j\in \{1,\ldots ,d\}\),

$$\begin{aligned} \left( (D\cdot \sigma _i)_j-\varPsi _{ij}\right) =\left\langle \left( (D\cdot \sigma _i)_j-\varPsi _{ij}\right) \;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle =-\left\langle Q_{ij}\;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle , \end{aligned}$$

which is (50). This completes the argument.

3.4 The construction of \(\zeta \)

The construction of \(\zeta \) is explicit. Namely, for each \(i\in \{1,\ldots ,d\}\), for the lift of the flux \(Q_i\) defined in (44), define the stationary derivative in time according to the rule

$$\begin{aligned} D_0\zeta _i=\left\langle Q_i\;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle -\left\langle Q_i \right\rangle . \end{aligned}$$

For each \(i\in \{1,\ldots ,d\}\), the function \(\zeta _i\) is defined on \({\mathbb {R}}\), for \(\left\langle \cdot \right\rangle \)-a.e a, by fixing a base point and integrating in time, which destroys the stationarity, and then extended to \({\mathbb {R}}^{d+1}\) as a spatially constant function.

3.5 The boundedness and uniform ellipticity of \(a_{\mathrm {hom}}\)

For the reader’s convenience, the argument of [18, Lemma 2] is repeated here. For each \(\xi \in {\mathbb {R}}^d\), the linearity and (11) assert that the homogenized coefficients are defined according to the rule

$$\begin{aligned} a_{\mathrm {hom}}\xi :=\left\langle a(\nabla \phi _\xi +\xi ) \right\rangle . \end{aligned}$$

It is first shown that, for each \(\xi \in {\mathbb {R}}^d\),

$$\begin{aligned} \left| a_{\mathrm {hom}}\xi \right| \le \frac{1}{\lambda } \left| \xi \right| . \end{aligned}$$
(56)

For each \(\xi \in {\mathbb {R}}^d\), since \(\phi _\xi \) satisfies (7), the uniform ellipticity of the ensemble (5) and Jensen’s inequality imply that

$$\begin{aligned} \begin{aligned} \left| a_{\mathrm {hom}}\xi \right| ^2&=\left| \left\langle a(\nabla \phi _\xi +\xi ) \right\rangle \right| ^2\le \left\langle \left| a(\nabla \phi _\xi +\xi )\right| ^2 \right\rangle \\&\le \left\langle \left| (\nabla \phi _\xi +\xi )\right| ^2 \right\rangle \le \frac{1}{\lambda }\left\langle (\nabla \phi _\xi +\xi )\cdot a(\nabla \phi _\xi +\xi ) \right\rangle . \end{aligned} \end{aligned}$$

Then, using the corrector Eqs. (7), (41) and the Cauchy-Schwarz inequality,

$$\begin{aligned} \left| a_{\mathrm {hom}}\xi \right| ^2\le \frac{1}{\lambda }\left\langle (\nabla \phi _\xi +\xi )\cdot a(\nabla \phi _\xi +\xi ) \right\rangle =\frac{1}{\lambda }\xi \cdot \left\langle a(\nabla \phi _\xi +\xi ) \right\rangle \le \frac{1}{\lambda }\left| \xi \right| \left| a_{\mathrm {hom}}\xi \right| . \end{aligned}$$

Dividing by \(\left| a_{\mathrm {hom}}\xi \right| \) yields (56) and completes the proof.

It remains only to prove that, for each \(\xi \in {\mathbb {R}}^d\),

$$\begin{aligned} \lambda \left| \xi \right| ^2\le \xi \cdot a_{\mathrm {hom}}\xi . \end{aligned}$$
(57)

This follows from the convexity of the map

$$\begin{aligned} (X,v)\in {\mathcal {S}}(d)_{> 0}\times {\mathbb {R}}^d\rightarrow v\cdot X^{-1}v, \end{aligned}$$
(58)

where \({\mathcal {S}}(d)_{>0}\) denotes the space of positive, \(d\times d\) symmetric matrices. Indeed, if \(X\in {\mathcal {S}}(d)_{>0}\) and \(v\in {\mathbb {R}}^d\),

$$\begin{aligned} \frac{1}{2}\left( v\cdot X^{-1}v\right) =\sup _{w\in {\mathbb {R}}^d}\left\{ w\cdot v-\frac{1}{2}w\cdot Xw\right\} , \end{aligned}$$

is a supremum over linear functions in (Xv), and therefore the map (58) is convex. Hence, for each \(\xi \in {\mathbb {R}}^d\), using the corrector Eq. (7) and Jensen’s inequality,

$$\begin{aligned} \begin{aligned} \xi \cdot a_{\mathrm {hom}}\xi =&\left\langle (\nabla \varphi _\xi +\xi )\cdot a(\nabla \varphi _\xi +\xi ) \right\rangle =\left\langle (\nabla \phi _\xi +\xi )\cdot a_{\text {sym}}(\nabla \phi _\xi +\xi ) \right\rangle \\ =&\left\langle \left( \nabla \phi _\xi +\xi \right) \cdot \left( a_{\text {sym}}^{-1}\right) ^{-1}(\nabla \phi _\xi +\xi ) \right\rangle \\ \ge&\left\langle \nabla \phi _\xi +\xi \right\rangle \cdot \left\langle (a_{\text {sym}}^{-1}) \right\rangle ^{-1}\left\langle \nabla \phi _\xi +\xi \right\rangle \ge \lambda \left| \xi \right| ^2, \end{aligned} \end{aligned}$$

where \(a_{\text {sym}}\) denotes the symmetric part of a, and where the final inequality is obtained using the boundedness of the ensemble (5) and the vanishing expectation of the gradient from Lemma 3. This completes the proof of (57).

4 The proof of Proposition 1

4.1 The sublinearity of \(\psi \) and \(\sigma \)

The sublinearity of \(\psi \) and \(\sigma \) will follow from the following general fact. Recall that, for a function \(\varphi :{\mathbb {R}}^{d+1}\rightarrow {\mathbb {R}}\), for each \(R>0\) and \(t\in {\mathbb {R}}\),

Furthermore, for each \(x\in {\mathbb {R}}^d\), define

Lemma 5

Suppose that \(\varphi \) is a scalar random field on \({\mathbb {R}}^{d+1}\) which is stationary in time and has a stationary, finite energy gradient \(\nabla \varphi \) in the potential space \(L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d)\). That is, assume that

$$\begin{aligned} \left\langle \left| \nabla \varphi \right| ^2 \right\rangle <\infty \;\;\text {with}\;\;\nabla \varphi \in L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d). \end{aligned}$$
(59)

Then, for \(\left\langle \cdot \right\rangle \)-a.e. a, the normalized large-scale \(L^2\)-averages of \(\varphi \) are strictly sublinear in the sense that

(60)

To prove Lemma 5, let \(\epsilon \in (0,1)\) and \(R>0\) be arbitrary, and define

$$\begin{aligned} \delta _{\epsilon R} \varphi (x,t)=\varphi (x,t)-(\varphi )_{t,\epsilon R}(x)\;\;\text {for}\;\;(x,t)\in {\mathbb {R}}^{d+1}. \end{aligned}$$

The triangle inequality implies that

(61)

For the first term of (61), observe that from the standard convolution estimate

(62)

where the penultimate step follows from \(\epsilon \in (0,1)\) and Jensen’s inequality. Therefore, the triangle inequality and (62) imply that

(63)

For the second term of (62), the Poincaré inequality in space implies that

(64)

The maximal ergodic theorem, see Becker [11, Corollary 2], implies that, for \(C=C(d)>0\),

(65)

Furthermore, the ergodic theorem [11, Theorem 2, Theorem 3] implies that, for \(\left\langle \cdot \right\rangle \)-a.e. a, for each \(\epsilon \in (0,1)\),

(66)

The proof now follows from a straightforward application of Egorov’s theorem. Let \(\epsilon \in (0,1)\) be arbitrary but fixed. For each \(\eta \in (0,1)\), use Egorov’s theorem and (66) to find a measurable subset \(A_\eta \subset \varOmega \) and \(R_\eta >0\) such that, for every \(a\in A_\eta \) and \(R>R_\eta \),

(67)

where \(\chi _{A_\eta }\in L^\infty (\varOmega )\) denotes the indicator function of \(A_\eta \). Returning to (64), form the decomposition

(68)

For the second term of (68), it follows from the stationarity of \(\nabla \varphi \), the stationarity of \(\chi _{A_\eta }\), and the ergodic theorem [11, Theorems 2, 3] that

(69)

In combination (64), (67), (68), and (69) imply that, since \(\eta \in (0,1)\) was arbitrary,

(70)

where the final equality follows from (65), (67), and the dominated convergence theorem. Hence, returning to (61), it is immediate from the stationarity of \(\nabla \varphi \), the ergodic theorem [11, Theorems 2, 3], (63), and (70) that

This, since \(\epsilon \in (0,1)\) is arbitrary, completes the proof.

4.2 The sublinearity of \(\phi \)

The sublinearity of \(\phi \) will follow from the following general fact.

Lemma 6

Suppose that \(\varphi \) is a scalar random field on \({\mathbb {R}}^{d+1}\) which has a stationary, finite energy gradient \(\nabla \varphi \) in the potential space \(L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d)\). That is, assume that

$$\begin{aligned} \left\langle \left| \nabla \varphi \right| ^2 \right\rangle <\infty \;\;\text {with}\;\;\nabla \varphi \in L^2_{\text {pot}}(\varOmega ;{\mathbb {R}}^d). \end{aligned}$$
(71)

Furthermore, assume that, for \(\left\langle \cdot \right\rangle \)-a.e. a, the field \(\varphi \) satisfies

$$\begin{aligned} \varphi _t=\nabla \cdot F\;\;\text {distributionally in}\;\;{\mathbb {R}}^{d+1}, \end{aligned}$$
(72)

for the stationary extension of a finite energy field \(F\in L^2(\varOmega ;{\mathbb {R}}^d)\). Then, for \(\left\langle \cdot \right\rangle \)-a.e. a, the normalized large-scale \(L^2\)-averages of \(\varphi \) are strictly sublinear in the sense that

(73)

To prove Lemma 6, for each \(R>0\), use the triangle inequality to obtain

(74)

Since \(\varphi \) has a stationary spatial gradient, for each \(R>0\) and \(t\in {\mathbb {R}}\), the scalar random field

is stationary in time, with a stationary, finite-energy gradient in the potential space. Therefore, Lemma 5 applies to this random field, and asserts that, for \(\left\langle \cdot \right\rangle \)-a.e. a,

(75)

It remains to prove that, for \(\left\langle \cdot \right\rangle \)-a.e. a,

Let \(\rho \in \mathcal {C}^\infty _c({\mathbb {R}}^d)\) be a smooth, symmetric convolution kernel supported in \(B_1\) and, for each \(\epsilon >0\), define the rescaling \(\rho ^\epsilon (\cdot )=\epsilon ^{-d}\rho (\frac{\cdot }{\epsilon })\). Then, for each \(\epsilon >0\), define the spatial convolution, for each \(x\in {\mathbb {R}}^d\) and \(t\in {\mathbb {R}}\),

$$\begin{aligned} \varphi ^\epsilon (x,t):=\int _{{\mathbb {R}}^d} \rho ^\epsilon (y-x)\varphi (y,t)\, \mathrm {d} y. \end{aligned}$$

The introduction of the convolution kernel provides a test function which will be used to apply the Eq. (72) satisfied by \(\varphi \).

First, for each \(R>0\) and \(\epsilon \in (0,R)\), it follows from the support of the convolution kernel, Fubini’s theorem, and Jensen’s inequality that

(76)

Similarly, the identical argument yields, for each \(R>0\), \(t\in {\mathbb {R}}\), and \(\epsilon \in (0,R)\),

(77)

Therefore, for each \(R>0\) and \(\epsilon \in (0,R)\), the triangle inequality, (76) and (77) imply that, after adding and subtracting \(((\varphi ^\epsilon )_{t,R}-(\varphi ^\epsilon )_R)\),

(78)

For the first term on the right hand side of (78), the Eq. (72) satisfied by \(\varphi \) and the Poincaré inequality in time yields, for each \(R\ge 1\) and \(\epsilon \in (0,R)\),

(79)

Then, continuing with (79), the definition of the convolution kernel and Jensen’s inequality yield

(80)

Therefore, combining (78) with (80), for each \(R>0\) and \(\epsilon \in (0,R)\),

Let \(\delta \in (0,1)\) be arbitrary and, for each \(R>0\), fix \(\epsilon (R):=\delta R\). For this choice, for each \(R>0\) and \(\delta \in (0,1)\),

(81)

Since the ergodic theorem [11, Theorems 2, 3] implies that, for \(\left\langle \cdot \right\rangle \)-a.e. a,

it follows from (81) that, for \(\left\langle \cdot \right\rangle \)-a.e. a, for every \(\delta \in (0,1)\),

Therefore, since \(\delta \in (0,1)\) is arbitrary,

(82)

In combination, (74), (75) and (82) combine to prove (73), and thereby complete the proof of Lemma 6. The sublinearity of the corrector \(\phi \), for \(\left\langle \cdot \right\rangle \)-a.e. a, is then immediate from Lemma 1.

4.3 The sublinearity of \(\zeta \)

The fact that, for \(\left\langle \cdot \right\rangle \)-a.e. a, for each \(i,j\in \{1,\ldots ,d\}\),

is essentially classical. For each \(i\in \{1,\ldots ,d\}\), the Poincaré inequality and the ergodic theorem [11, Theorems 2, 3] together with the Rellich-Kondrachov embedding theorem imply that the family

is compact in \(L^2([0,1])\) and converges weakly to zero, as \(\epsilon \rightarrow 0\), in \(H^1([0,1])\). Therefore, for \(\left\langle \cdot \right\rangle \)-a.e. a, for each \(i,j\in \{1,\ldots ,d\}\), as \(\epsilon \rightarrow 0\),

(83)

Furthermore, now exploiting the fact that \(\zeta \) is a one-dimensional function, it follows from the Sobolev embedding theorem and the Arzelà-Ascoli theorem that, for \(\left\langle \cdot \right\rangle \)-a.e. a, for each \(i,j\in \{1,\ldots ,d\}\), the family

is pre-compact in \(C^{0,\frac{1}{2}}([0,1])\) and, by repeating the argument of (61), converges weakly to zero, as \(\epsilon \rightarrow 0\), in \(H^1([0,1])\). Therefore, for \(\left\langle \cdot \right\rangle \)-a.e. a, for each \(i,j\in \{1,\ldots ,d\}\), as \(\epsilon \rightarrow 0\),

In particular, for each \(i,j\in \{1,\ldots ,d\}\), as \(\epsilon \rightarrow 0\),

(84)

Hence, in combination, (83) and (84) prove after rescaling that, for \(\left\langle \cdot \right\rangle \)-a.e. a, for each \(i,j\in \{1,\ldots ,d\}\),

which completes the argument since \(\zeta \) is constant in space.

4.4 The large-scale averages of q

It is an immediately consequence of the ergodic theorem [11, Theorems 2, 3] and the fact that the flux q is stationary with finite energy that, for \(\left\langle \cdot \right\rangle \)-a.e. a, for each \(i\in \{1,\ldots ,d\}\),

which completes the argument, and the proof of Proposition 1.

5 The proof of Proposition 2

The proof of Proposition 2 is split into five steps. The first defines the augmented homogenization error. The second proves that the augmented homogenization error satisfies a parabolic equation. The third recalls some classical estimates governing the interior and boundary regularity of \(a_{\mathrm {hom}}\)-caloric functions. The fourth uses the equation satisfied by the augmented homogenization error to derive an energy estimate. And, finally, the fifth uses the energy estimate to complete the proof of excess decay.

In what follows, to simplify the notation, observe that it may be assumed without loss of generality that, for the \(R>0\) of interest, \(t\in {\mathbb {R}}\) and \(i,j,k\in \{1,\ldots ,d\}\),

$$\begin{aligned} (\phi _i)_R=(\psi _i)_{t,R}=(\sigma _{ijk})_{t,R}=\zeta _{ij}(0)=0. \end{aligned}$$

Indeed, otherwise in the arguments to follow, at each step replace the components of the corrector, for the \(R>0\) of interest, \(t\in {\mathbb {R}}\) and \(i,j,k\in \{1,\ldots ,d\}\), by the normalizations defined by

$$\begin{aligned} \tilde{\phi }_i:= & {} \phi _i(x,t)-(\phi _i)_R,\;\;\tilde{\psi }_i:=\psi _i-(\psi _i)_{t,R},\;\;\tilde{\sigma }_{ijk}:=\sigma _{ijk}-(\sigma _{ijk})_{t,R}, \nonumber \\&\text {and}\;\;\tilde{\zeta }_{ij}:=\zeta _{ij}-\zeta _{ij}(0). \end{aligned}$$
(85)

The argument now begins with the definition of the augmented homogenization error.

5.1 The augmented homogenization error

The analysis of the augmented homogenization error and its corresponding energy estimate will first be obtained on scale \(R=1\). The general results will then follow by scaling. Suppose that u is an a-caloric function in \({\mathcal {C}}_1\). That is, in the sense of distributions, suppose that u satisfies

$$\begin{aligned} u_t=\nabla \cdot a\nabla u\;\;\text {in}\;\;{\mathcal {C}}_1. \end{aligned}$$
(86)

Then, let \(\rho \in \mathcal {C}^\infty _c({\mathbb {R}}^d)\) be a standard convolution kernel satisfying \(\mathop {\mathrm {supp}}(\rho )\subset B_1\). For each \(\epsilon \in (0,\frac{1}{4})\), let \(\rho ^\epsilon (\cdot ):=\epsilon ^{-d}\rho (\frac{\cdot }{\epsilon })\) and define the spatial convolution

$$\begin{aligned} u^\epsilon (x,t)=\int _{{\mathbb {R}}^d}\rho ^\epsilon (y-x)u(y,t)\, \mathrm {d} y\;\;\text {on}\;\;{\mathcal {C}}_{1-\epsilon }. \end{aligned}$$

It is necessary to observe some useful energy estimates for u and its convolution.

First, it is immediate from (86) and the uniform ellipticity of a from (5) that

$$\begin{aligned} ||u_t||_{L^2\left( [-1,0];H^{-1}(B_1)\right) }\lesssim \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2. \end{aligned}$$

Therefore, since the convolution preserves this estimate, for each \(\epsilon \in (0,\frac{1}{4})\),

$$\begin{aligned} ||u^\epsilon _t||_{L^2\left( [-1,0];H^{-1}(B_{1-\epsilon })\right) }\lesssim \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2. \end{aligned}$$
(87)

It is important to keep these estimates in mind when considering the application of the constant-coefficient regularity estimates (106) and (110) below.

Next, although there is no convolution in time, the spatial convolution nevertheless provides some temporal regularity in the sense that, for each \(\epsilon \in (0,\frac{1}{4})\),

$$\begin{aligned} u^\epsilon _t(x,t)=-\int _{{\mathbb {R}}^d}\nabla \rho ^\epsilon (y-x)\cdot a\nabla u(y,t)\, \mathrm {d} y\;\;\text {in}\;\;{\mathcal {C}}_{1-\epsilon }. \end{aligned}$$

Therefore, the time-derivative has a uniformly bounded energy. That is, for each \(\epsilon \in (0,\frac{1}{4})\), the Minkowski integral inequality, Hölder’s inequality, the definition of the convolution kernel, and the uniform ellipticity of a imply that

$$\begin{aligned} \begin{aligned} \left( \int _{{\mathcal {C}}_{\frac{3}{4}}}\left| u^\epsilon _t\right| ^2\right) ^\frac{1}{2}&\le \left( \int _{{\mathcal {C}}_{1-\epsilon }}\left| \int _{{\mathbb {R}}^d}\nabla \rho ^\epsilon (y-x)\cdot a\nabla u(y)\, \mathrm {d} y\right| ^2\, \mathrm {d} x\, \mathrm {d} t\right) ^\frac{1}{2} \\&\lesssim \int _{{\mathbb {R}}^d}\left( \int _{{\mathcal {C}}_{1-\epsilon }}\left| \nabla \rho ^\epsilon (y)\right| ^2\left| \nabla u(y+x,t)\right| ^2\, \mathrm {d} x\, \mathrm {d} t\right) ^\frac{1}{2}\, \mathrm {d} y\\&= \int _{{\mathbb {R}}^d}\left| \nabla \rho ^\epsilon (y)\right| \left( \int _{{\mathcal {C}}_{1-\epsilon }}\left| \nabla u(y+x,t)\right| ^2\, \mathrm {d} x\, \mathrm {d} t\right) ^\frac{1}{2}\, \mathrm {d} y\\&\lesssim \frac{1}{\epsilon }\left( \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2\right) ^\frac{1}{2}. \end{aligned} \end{aligned}$$

The convolution error is also well controlled by the energy of u. Precisely, for each \(\epsilon \in (0,\frac{1}{4})\), it follows from Jensen’s inequality and the definition of the convolution kernel that

$$\begin{aligned} \begin{aligned} \left( \int _{{\mathcal {C}}_{\frac{3}{4}}}\left| u^\epsilon -u\right| ^2\right) ^\frac{1}{2}&= \left( \int _{{\mathcal {C}}_{\frac{3}{4}}}\left| \int _0^1\int _{\mathbb {R^d}}\rho ^\epsilon (y)\nabla u(x+sy,t)\cdot y \, \mathrm {d} y\, \mathrm {d} s\right| ^2\, \mathrm {d} x\, \mathrm {d} t\right) ^\frac{1}{2}\\&\le \epsilon \left( \int _{{\mathbb {R}}^d}\rho ^\epsilon (y)\int _0^1\int _{{\mathcal {C}}_{\frac{3}{4}}}\left| \nabla u(x+sy,t)\right| ^2\, \mathrm {d} x\, \mathrm {d} t\, \mathrm {d} s\, \mathrm {d} y\right) ^\frac{1}{2} \\&\le \epsilon \left( \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2\right) ^\frac{1}{2}. \end{aligned} \end{aligned}$$

Lastly, it is immediate from Jensen’s inequality that the convolution preserves the energy in the sense that, for each \(\epsilon \in (0,\frac{1}{4})\),

$$\begin{aligned} \left( \int _{{\mathcal {C}}_{\frac{3}{4}}}\left| \nabla u^\epsilon \right| ^2\right) ^\frac{1}{2}\le \left( \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2\right) ^\frac{1}{2}. \end{aligned}$$

Fubini’s theorem therefore implies that, for each \(\epsilon \in (0,\frac{1}{4})\), there exists \(r_\epsilon \in (\frac{1}{2},\frac{3}{4})\) such that

$$\begin{aligned} \left( \int _{\partial _p{\mathcal {C}}_{r_\epsilon }}\left| u^\epsilon -u\right| ^2\right) ^\frac{1}{2}\lesssim \epsilon \left( \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2\right) ^\frac{1}{2}, \end{aligned}$$
(88)

and

$$\begin{aligned} \left( \int _{\partial _p{\mathcal {C}}_{r_\epsilon }}\left| u^\epsilon _t\right| ^2\right) ^\frac{1}{2}\lesssim \frac{1}{\epsilon }\left( \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2\right) ^\frac{1}{2}, \end{aligned}$$
(89)

and, finally, such that

$$\begin{aligned} \left( \int _{\partial _p{\mathcal {C}}_{r_\epsilon }}\left| \nabla u\right| ^2\right) ^\frac{1}{2}+\left( \int _{\partial _p{\mathcal {C}}_{r_\epsilon }}\left| \nabla u^\epsilon \right| ^2\right) ^\frac{1}{2}\lesssim \left( \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2\right) ^\frac{1}{2}. \end{aligned}$$
(90)

It will be for this radius that the \(a_{\mathrm {hom}}\)-caloric extension of \(u^\epsilon \) is constructed.

Namely, for each \(\epsilon \in (0,\frac{1}{4})\), let \(v^\epsilon \) denote the solution

$$\begin{aligned} \left\{ \begin{array}{ll} v^\epsilon _t =\nabla \cdot a\nabla v^\epsilon &{}\quad \text {in}\;\;C_{r_\epsilon } \\ v^\epsilon = u^\epsilon &{}\quad \text {on}\;\;\partial _p C_{r_\epsilon }.\end{array}\right. \end{aligned}$$
(91)

These functions will now come to define the augmented homogenization error after the introduction of a cutoff function.

For each \(\epsilon \in (0,\frac{1}{4})\) and \(\rho \in (0,\frac{1}{8})\), let \(\eta ^\epsilon _\rho \in \mathcal {C}^\infty _c({\mathbb {R}}^{d+1})\) be smooth cutoff function satisfying \(0\le \eta _\rho ^\epsilon \le 1\) with, for each \(x\in {\mathbb {R}}^d\) and \(t\in {\mathbb {R}}\),

$$\begin{aligned} \eta _\rho ^\epsilon (x,t)=\left\{ \begin{array}{ll} 1 &{}\quad \text {if}\;\;(x,t)\in \overline{{\mathcal {C}}}_{r_\epsilon -2\rho } \\ 0 &{}\quad \text {if}\;\;(x,t)\in \left( {\mathbb {R}}^d\times (-\infty ,0)\right) {\setminus }{\mathcal {C}}_{r_\epsilon -\rho }.\end{array}\right. \end{aligned}$$
(92)

Furthermore, for each \(\epsilon \in (0,\frac{1}{4})\) and \(\rho \in (0,\frac{1}{8})\), for each \(x\in {\mathbb {R}}^d\) and \(t\in {\mathbb {R}}\),

$$\begin{aligned} \left| \nabla \eta _\rho ^\epsilon (x,t)\right| +\left| \partial _t\eta _\rho ^\epsilon (x,t)\right| \lesssim \frac{1}{\rho }\;\;\text {and}\;\;\left| \nabla ^2\eta _\rho ^\epsilon (x,t)\right| \lesssim \frac{1}{\rho ^2}. \end{aligned}$$
(93)

Then, for \(\rho \in (0,\frac{1}{4})\) and \(\epsilon \in (0,\frac{1}{4})\) to be specified later, define the augmented homogenization error w according to the rule

$$\begin{aligned} w=u-\left( 1+\eta _\rho ^\epsilon \phi _i\partial _i\right) v^\epsilon . \end{aligned}$$
(94)

The augmented homogenization error (94) will now be shown to satisfy a useful parabolic equation. The computation is motivated by the analogous computation in [18, Lemma 2], but there are significant differences owing to the parabolic setting and the use of the parabolic extended corrector \((\phi ,\psi ,\sigma ,\zeta )\).

5.2 The equation satisfied by the augmented homogenization error

It is now shown that the augmented homogenization error (94) satisfies, in \({\mathcal {C}}_{r_\epsilon }\),

$$\begin{aligned} \begin{aligned} w_t-\nabla \cdot a\nabla w&= \nabla \cdot \left( \left( 1-\eta _\rho ^\epsilon \right) (a-a_{\mathrm {hom}})\nabla v^\epsilon +(\phi _ia+\psi _i-\sigma _i)\nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \right) \\&\quad + \partial _t\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) -\phi _i \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) _t-\psi _i\varDelta \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \end{aligned} \end{aligned}$$
(95)

with

$$\begin{aligned} w= u-u^\epsilon \;\;\text {on}\;\;\partial _p{\mathcal {C}}_{r_\epsilon }. \end{aligned}$$

Fix \(\rho \in (0,\frac{1}{8})\) and \(\epsilon \in (0,\frac{1}{4})\) and let w be defined by (94). Since the boundary condition is immediate from the definition, it remains only to compute the equation. First, using definition (94), the gradient is defined by

$$\begin{aligned} \nabla w=\nabla u-\nabla v^\epsilon -\nabla \left( \phi _i\eta _\rho ^\epsilon \partial _iv^\epsilon \right) . \end{aligned}$$

Then, because u satisfies (86),

$$\begin{aligned} -\nabla \cdot a\nabla w=-u_t+\nabla \cdot a\nabla v^\epsilon +\nabla \cdot a\left( \nabla \phi _i\eta _\rho ^\epsilon \partial _iv^\epsilon +\phi _i\nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \right) . \end{aligned}$$
(96)

It is necessary to further analyze the term

$$\begin{aligned} \nabla \cdot a\nabla v^\epsilon +\nabla \cdot a\left( \nabla \phi _i\eta _\rho ^\epsilon \partial _iv^\epsilon \right) , \end{aligned}$$

which, after adding and subtracting the unit vectors \(\{e_i\}_{i\in \{1,\ldots ,d\}}\), satisfies, for the fluxes \(\{q_i\}_{i\in \{1,\ldots ,d\}}\) defined in (10),

$$\begin{aligned} \nabla \cdot a\nabla v^\epsilon +\nabla \cdot a\left( \nabla \phi _i\eta _\rho ^\epsilon \partial _iv^\epsilon \right) =\nabla \cdot \left( \left( 1-\eta _\rho ^\epsilon \right) a\nabla v^\epsilon \right) +\nabla \cdot \left( q_i \eta _\rho ^\epsilon \partial _iv^\epsilon \right) . \end{aligned}$$

Then, after adding and subtracting the vectors \(\{a_{\mathrm {hom}}e_i\}_{i\in \{1,\ldots ,d\}}\),

$$\begin{aligned} \begin{aligned}&\nabla \cdot a\nabla v^\epsilon +\nabla \cdot a\left( \nabla \phi _i\eta _\rho ^\epsilon \partial _iv^\epsilon \right) \\&\quad =\nabla \cdot \left( \left( 1-\eta _\rho ^\epsilon \right) a\nabla v^\epsilon \right) +\nabla \cdot \left( \left( q_i-a_{\mathrm {hom}}e_i\right) \eta _\rho ^\epsilon \partial _iv^\epsilon \right) +\nabla \cdot \left( \eta _\rho ^\epsilon a_{\mathrm {hom}}\nabla v^\epsilon \right) . \end{aligned} \end{aligned}$$

Therefore, since v satisfies (91),

$$\begin{aligned}&\nabla \cdot a\nabla v^\epsilon +\nabla \cdot a\left( \nabla \phi _i\eta _\rho ^\epsilon \partial _iv^\epsilon \right) \nonumber \\&\quad = v^\epsilon _t+\nabla \cdot \left( \left( 1-\eta _\rho ^\epsilon \right) \left( a-a_{\mathrm {hom}}\right) \nabla v^\epsilon \right) +\nabla \cdot \left( \left( q_i-a_{\mathrm {hom}}e_i\right) \eta _\rho ^\epsilon \partial _iv^\epsilon \right) .\qquad \end{aligned}$$
(97)

Returning to (96), in view of (97),

$$\begin{aligned} -\nabla \cdot a\nabla w= & {} -u_t+v^\epsilon _t+\nabla \cdot \left( \left( 1-\eta _\rho ^\epsilon \right) (a-a_{\mathrm {hom}})\nabla v^\epsilon \right) \nonumber \\&+ \nabla \cdot \left( (q_i-a_{\mathrm {hom}}e_i)\eta _\rho ^\epsilon \partial _iv^\epsilon +\phi _ia\nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \right) . \end{aligned}$$
(98)

For the derivative in time, owing to definition (94),

$$\begin{aligned} w_t=u_t-v^\epsilon _t-\phi _{i,t}\eta _\rho ^\epsilon \partial _iv^\epsilon -\phi _i \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) _t, \end{aligned}$$

which, in combination with (98), yields the distributional equality

$$\begin{aligned} \begin{aligned} w_t-\nabla \cdot a\nabla w&= \nabla \cdot \left( \left( 1-\eta _\rho ^\epsilon \right) (a-a_{\mathrm {hom}})\nabla v^\epsilon +\phi _ia\nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \right) \\&\quad +(q_i-a_{\mathrm {hom}}e_i)\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \\&\quad + \left( (\nabla \cdot q_i)-\phi _{i,t}\right) \eta _\rho ^\epsilon \partial _iv^\epsilon -\phi _i \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) _t. \end{aligned} \end{aligned}$$

Therefore, since the correctors \(\{\phi _i\}_{i\in \{1,\ldots ,d\}}\) satisfy (7), distributionally

$$\begin{aligned} w_t-\nabla \cdot a\nabla w= & {} \nabla \cdot \left( \left( 1-\eta _\rho ^\epsilon \right) (a-a_{\mathrm {hom}})\nabla v^\epsilon +\phi _ia\nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \right) \nonumber \\&+ (q_i-a_{\mathrm {hom}}e_i)\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) -\phi _i \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) _t. \end{aligned}$$
(99)

It remains to analyze the term

$$\begin{aligned} (q_i-a_{\mathrm {hom}}e_i)\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) . \end{aligned}$$

For the correctors \(\{\psi _i\}_{i\in \{1\ldots ,d\}}\) satisfying (19), for each \(i\in \{1,\ldots ,d\}\), add and subtract the gradient \(\nabla \psi _i\) and the conditional expectation \(\left\langle q_i\;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle \) to obtain

$$\begin{aligned} \begin{aligned}&(q_i-a_{\mathrm {hom}}e_i)\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \\&\quad =\left( q_i-\nabla \psi _i-\left\langle q_i\;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle \right) \cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \\&\qquad + \left( \left\langle q_i\;|\;{\mathcal {F}}_{{\mathbb {R}}^d} \right\rangle -a_{\mathrm {hom}}e_i\right) \cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) + \nabla \psi _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) . \end{aligned} \end{aligned}$$

Since the correctors \(\{\sigma _i\}_{i\in \{1,\ldots ,d\}}\) satisfy (20) and the correctors \(\{\zeta _i\}_{i\in \{1,\ldots ,d\}}\) satisfy (22),

$$\begin{aligned} (q_i-a_{\mathrm {hom}}e_i)\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) = (\nabla \cdot \sigma _i)\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) +\partial _t\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) + \nabla \psi _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) . \end{aligned}$$
(100)

Then, for each \(i\in \{1,\dots ,d\}\), the skew-symmetry of \(\sigma _i\) proven in Lemma 1 implies the distributional equality

$$\begin{aligned} \nabla \cdot \left( \sigma _i\nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \right) =-(\nabla \cdot \sigma _i)\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) . \end{aligned}$$
(101)

Indeed, for each \(i\in \{1,\ldots ,d\}\), distributionally

$$\begin{aligned} \begin{aligned} \nabla \cdot \left( \sigma _i\nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \right)&= \partial _j\left( \sigma _{ijk}\partial _{k}\left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \right) \\&= \partial _j\sigma _{ijk}\partial _k\left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) +\sigma _{ijk}\partial _j\partial _k\left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \\&= -\partial _j\sigma _{ikj}\partial _k\left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) +\sigma _{ijk}\partial _j\partial _k\left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \\&= -(\nabla \cdot \sigma _i)\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) , \end{aligned} \end{aligned}$$

where the penultimate inequality follows from the skew-symmetry of \(\sigma \) and the final equality from the skew-symmetry of \(\sigma \) and the equality of mixed partial derivatives.

Therefore, returning to (100), the equality (101) and the distributional equality

$$\begin{aligned} \nabla \psi _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) =\nabla \cdot \left( \psi _i\nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \right) -\psi _i\varDelta \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \end{aligned}$$

imply that

$$\begin{aligned}&(q_i-a_{\mathrm {hom}}e_i)\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \nonumber \\&\quad =\nabla \cdot \left( (\psi _i-\sigma _i)\nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \right) +\partial _t\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) -\psi _i\varDelta \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) .\quad \end{aligned}$$
(102)

Therefore, returning to (99), in view of (102), it follows that

$$\begin{aligned} \begin{aligned}w_t-\nabla \cdot a\nabla w&= \nabla \cdot \left( \left( 1-\eta _\rho ^\epsilon \right) (a-a_{\mathrm {hom}})\nabla v^\epsilon +(\phi _ia+\psi _i-\sigma _i)\nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \right) \\&\quad +\partial _t\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) -\phi _i \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) _t-\psi _i\varDelta \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) , \end{aligned} \end{aligned}$$
(103)

which completes the proof of (95). This equation will later be used to obtain an energy estimate for the augmented homogenization error. However, before this it is useful to recall three classical estimates concerning the boundary and interior regularity of \(a_{\mathrm {hom}}\)-caloric functions.

5.3 Interior and boundary estimates for \(a_{\mathrm {hom}}\)-caloric functions

In this subsection, three classical estimates are presented to control the interior and boundary regularity of \(a_{\mathrm {hom}}\)-caloric functions.

In what follows, the boundary conditions will be assumed to be the trace of a function \(\tilde{u}:\overline{{\mathcal {C}}}_1\rightarrow {\mathbb {R}}\) satisfying

$$\begin{aligned} \tilde{u}\in L^2\left( [-1,0];H^1(B_1)\right) \;\;\text {and}\;\;\tilde{u}_t\in L^2\left( [-1,0];H^{-1}(B_1)\right) . \end{aligned}$$
(104)

The first estimate is the a priori energy estimate for the \(a_{\mathrm {hom}}\)-caloric extension of \(\tilde{u}\) into \({\mathcal {C}}_1\). That is, if \(\tilde{v}\) satisfies

$$\begin{aligned} \left\{ \begin{array}{ll} \tilde{v}_t =\nabla \cdot a_{\mathrm {hom}}\nabla \tilde{v} &{}\quad \text {in}\;\;{\mathcal {C}}_1 \\ \tilde{v} = \tilde{u} &{}\quad \text {on}\;\;\partial _p{\mathcal {C}}_1,\end{array}\right. \end{aligned}$$
(105)

then,

$$\begin{aligned} \int _{{\mathcal {C}}_1}\left| \nabla \tilde{v}\right| ^2\lesssim \int _{{\mathcal {C}}_1}\left| \nabla \tilde{u}\right| ^2+||\tilde{u}_t||^2_{L^2\left( [-1,0];H^{-1}(B_1)\right) }. \end{aligned}$$
(106)

To prove (106), let \(\tilde{z}\) denote the distributional solution of

$$\begin{aligned} \left\{ \begin{array}{ll} \tilde{z}_t =\nabla \cdot a_{\mathrm {hom}}\nabla \tilde{z}+\nabla \cdot a_{\mathrm {hom}}\nabla \tilde{u}-\tilde{u}_t &{}\quad \text {in}\;\;{\mathcal {C}}_1 \\ \tilde{z} = 0 &{}\quad \text {on}\;\;\partial {\mathcal {C}}_1.\end{array}\right. \end{aligned}$$
(107)

Then, testing (107) with \(\tilde{z}\) and, after applying the Poincaré inequality, Hölder’s inequality and Young’s inequality and using the uniform ellipticity of \(a_{\mathrm {hom}}\) from Lemma 1, it follows that

$$\begin{aligned} \int _{{\mathcal {C}}_1}\left| \nabla \tilde{z}\right| ^2\lesssim \int _{{\mathcal {C}}_1}\left| \nabla \tilde{u}\right| ^2+||\tilde{u}_t||^2_{L^2\left( [-1,0];H^{-1}(B_1)\right) }. \end{aligned}$$
(108)

However, thanks to (104) and (105), it is then immediate that

$$\begin{aligned} \tilde{v}=\tilde{z}+\tilde{u}. \end{aligned}$$

Hence, with (108) and the triangle inequality,

$$\begin{aligned} \int _{{\mathcal {C}}_1}\left| \nabla \tilde{v}\right| ^2\lesssim \int _{{\mathcal {C}}_1}\left| \nabla \tilde{u}\right| ^2+||\tilde{u}_t||^2_{L^2\left( [-1,0];H^{-1}(B_1)\right) }, \end{aligned}$$

which proves (106).

An interior regularity estimate will now be obtained for \(a_{\mathrm {hom}}\)-caloric functions. Suppose that \(\tilde{v}\) satisfies (105) for \(\tilde{u}\) satisfying (104). It then follows from a repeated application of the Caccioppoli inequality (29) that, for each \(k\ge 0\), there exists \(C(k)>0\) such that

$$\begin{aligned} \int _{B_{1-\rho }}\left| \nabla ^k \tilde{v}\right| ^2\le \frac{C(k)}{(R\rho )^{2k}}\int _{B_1}\left| \nabla \tilde{v}\right| ^2. \end{aligned}$$

Therefore, by choosing \(k=\frac{d}{2}\), \(k=\frac{d}{2}+1\), and \(k=\frac{d}{2}+2\), the Sobolev embedding theorem implies that

$$\begin{aligned}&\sup _{{\mathcal {C}}_{1-\rho }}\left( \left| \nabla \tilde{v}\right| +\rho \left| \nabla ^2 \tilde{v}\right| +\rho ^2\left| \nabla ^3 \tilde{v}\right| \right) ^2\lesssim \rho ^{-(d+2)}\int _{{\mathcal {C}}_1}\left| \nabla \tilde{v}\right| ^2 \nonumber \\&\quad \lesssim \rho ^{-(d+2)}\left( \int _{{\mathcal {C}}_1}\left| \nabla \tilde{u}\right| ^2+||\tilde{u}_t||^2_{L^2\left( [-1,0];H^{-1}(B_1)\right) }\right) , \end{aligned}$$
(109)

where the final inequality follows from (106).

The boundary regularity statement follows from a simplified version of Ladyzenskaja, Solonnikov and Uraltceva [20, Theorem 9.1] or, for the optimal statement, Weidemaier [27, Theorem 3.1]. This estimate will obtain \(H^2\)-regularity, and therefore requires more from the boundary condition. In particular, this estimate explains the necessity of introducing the boundary regularization in the definition of the augmented homogenization error. Suppose that \(\tilde{u}\) satisfies the trace estimates

$$\begin{aligned} \tilde{u}\in L^2\left( [-1,0];H^1(\partial B_1)\right) \cap H^1(B_1\times \{-1\})\;\;\text {and}\;\;\tilde{u}_t\in L^2\left( [-1,0];L^2(\partial B_1)\right) , \end{aligned}$$

and that \(\tilde{v}\) is the \(a_{\mathrm {hom}}\)-caloric extension of \(\tilde{u}\) into \({\mathcal {C}}_1\) in the sense of (105). Then, it follows from [20, Theorem 9.1] or [27, Theorem 3.1] that

$$\begin{aligned} \int _{{\mathcal {C}}_1}\left| \nabla \tilde{v}\right| ^2+\left| \nabla ^2\tilde{v}\right| ^2\lesssim \int _{\partial _p{\mathcal {C}}_1}\left| \nabla ^\text {tan}\tilde{u}\right| ^2+\int _{-1}^0\int _{\partial B_1}\left| \tilde{u}_t\right| ^2, \end{aligned}$$
(110)

where \(\nabla ^{\text {tan}}\tilde{u}\) denotes the tangential derivative of \(\tilde{u}\) on the parabolic boundary. In particular, \(\nabla ^{\text {tan}}\tilde{u}\) coincides with the full gradient on \(B_1\times \{-1\}\). Estimates (106), (109) and (110) will play an important role in the energy estimate to follow.

5.4 The energy estimate for the augmented homogenization error

Equation (95) will now be used to obtain an energy estimate for the augmented homogenization error w defined in (94). Precisely, it will be shown that

$$\begin{aligned} \int _{{\mathcal {C}}_{r_\epsilon }}\nabla w\cdot a\nabla w\lesssim & {} \epsilon \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 + \frac{\rho ^\frac{2}{d}}{\epsilon ^2}\int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 \nonumber \\&+ \frac{1}{\rho ^{d+4}}\int _{{\mathcal {C}}_1}\left( \left| \phi \right| ^2+\left| \psi \right| ^2+\left| \sigma \right| ^2\right) \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 \nonumber \\&+ \left( \frac{1}{\rho ^{\frac{d}{2}+3}}\left( \int _{{\mathcal {C}}_1}\left| \zeta \right| ^2\right) ^{\frac{1}{2}}+\frac{1}{\rho ^{d+6}}\int _{{\mathcal {C}}_1}\left| \zeta \right| ^2\right) \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 \nonumber \\&+ \frac{1}{\rho ^{d+4}}\left( \int _{{\mathcal {C}}_1}\left| \zeta \right| ^2\right) ^\frac{1}{2}\left( \int _{{\mathcal {C}}_1}\left| q\right| ^2\right) ^\frac{1}{2}\int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2. \end{aligned}$$
(111)

The idea is to test Eq. (103) with w. However, for this it is necessary to introduce a cutoff to ensure that w vanishes along the upper boundary of the cylinder. For each \(\delta \in (0,1)\), define a smooth cutoff function \(\gamma _\delta :{\mathbb {R}}\rightarrow {\mathbb {R}}\) which is non-increasing and satisfies \(0\le \gamma _\delta \le 1\) with

$$\begin{aligned} \gamma _\delta (t)=\left\{ \begin{array}{ll} 1 &{}\quad \text {if}\;\;t\le -\delta , \\ 0 &{}\quad \text {if}\;\;t\ge 0.\end{array}\right. \end{aligned}$$
(112)

Furthermore, for the Dirac mass \(\delta _0\) at zero, as \(\delta \rightarrow 0\),

$$\begin{aligned} \left| (\gamma _\delta )_t\right| \rightharpoonup \delta _0\;\;\text {as distributions on }{\mathbb {R}}. \end{aligned}$$
(113)

To begin, Eq. (103) is tested against \(\gamma _\delta w\). Properties of the cutoff \(\eta _\rho ^\epsilon \) from (92) and (93), the uniform ellipticity of a from (5) and \(a_{\mathrm {hom}}\) from Lemma 1 and Hölder’s inequality imply that, after bounding the time derivative of \(v^\epsilon \) by its Hessian matrix,

$$\begin{aligned} \begin{aligned}&\int _{{\mathcal {C}}_{r_\epsilon }}\left| (\gamma _\delta )_t\right| \left| w\right| ^2+\int _{{\mathcal {C}}_{r_\epsilon }}\gamma _\delta \nabla w\cdot a\nabla w\lesssim \int _{-r_\epsilon ^2}^0\int _{\partial B_{r_\epsilon }}\gamma _\delta (u-u^\epsilon )\nu \cdot a(\nabla u-\nabla v^\epsilon ) \\&\quad + \int _{B_{r_\epsilon }\times \{-{r_\epsilon }^2\}}\gamma _\delta \left| u-u^\epsilon \right| ^2 + \int _{{\mathcal {C}}_{r_\epsilon }{\setminus }{\mathcal {C}}_{r_\epsilon -2\rho }}\gamma _\delta \left| \nabla v^\epsilon \right| \left| \nabla w\right| \\&\quad + \sup _{{\mathcal {C}}_{r_\epsilon -\rho }}\left( \left| \nabla ^2 v^\epsilon \right| +\frac{1}{R\rho }\left| \nabla v^\epsilon \right| \right) \int _{{\mathcal {C}}_{r_\epsilon }}\gamma _\delta \left( \left| \phi \right| +\left| \psi \right| +\left| \sigma \right| \right) \left| \nabla w\right| \\&\quad + \sup _{{\mathcal {C}}_{r_\epsilon -\rho }}\left( \frac{1}{\rho ^2}\left| \nabla v^\epsilon \right| +\frac{1}{\rho }\left| \nabla ^2 v^\epsilon \right| +\left| \nabla ^3 v^\epsilon \right| \right) \int _{{\mathcal {C}}_{r_\epsilon }}\gamma _\delta \left( \left| \phi \right| +\left| \psi \right| \right) \left| w\right| \\&\quad + \left| \int _{{\mathcal {C}}_{r_\epsilon }}\partial _t\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \gamma _\delta w\right| , \end{aligned} \end{aligned}$$
(114)

where \(\nu \) denotes the interior normal vector and

$$\begin{aligned} \left| \phi \right| :=\left( \sum _{i=1}^d\left| \phi _i\right| ^2\right) ^\frac{1}{2},\;\; \left| \psi \right| :=\left( \sum _{i=1}^d\left| \psi _i\right| ^2\right) ^\frac{1}{2}\;\;\text {and}\;\;\left| \sigma \right| :=\left( \sum _{i,j,k=1}^d\left| \sigma _{ijk}\right| ^2\right) ^\frac{1}{2}. \end{aligned}$$

For the first two boundary terms, it is immediate from the choice of \(r_\epsilon \in (\frac{1}{2}, \frac{3}{4})\) in (88) and (90), the uniform ellipticity of a, Hölder’s inequality and the estimate for the Dirichlet to Neumann map, see [14], that

$$\begin{aligned} \begin{aligned}&\int _{-r_\epsilon ^2}^0\int _{\partial B_{r_\epsilon }}(u-u^\epsilon )\nu \cdot a(\nabla u-\nabla v^\epsilon )+\int _{B_{r_\epsilon }\times \{-r_\epsilon ^2\}}\left| u-u^\epsilon \right| ^2 \\&\quad \lesssim \left( \int _{\partial _p{\mathcal {C}}_{r_\epsilon }}\left| u-u^\epsilon \right| ^2\right) ^\frac{1}{2}\left( \int _{\partial _p{\mathcal {C}}_{r_\epsilon }}\left| \nabla u\right| ^2+\left| \nabla ^\text {tan}u\right| ^2\right) ^\frac{1}{2}+\int _{\partial _p{\mathcal {C}}_{r_\epsilon }}\left| u-u^\epsilon \right| ^2 \\&\quad \lesssim \epsilon \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2. \end{aligned} \end{aligned}$$
(115)

It is then necessary to analyze the final term on the right hand side of (114). Using the definition of w from (94) and the fact that \(\zeta \) vanishes at \(t=0\) owing to (85), it follows after integrating by parts variously in time and space that

$$\begin{aligned} \begin{aligned} \int _{{\mathcal {C}}_{r_\epsilon }}\partial _t\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \gamma _\delta w&= \int _{{\mathcal {C}}_{r_\epsilon }}\left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) _t\zeta _i\cdot \gamma _\delta \nabla w-\int _{{\mathcal {C}}_{r_\epsilon }}\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) (\gamma _\delta )_t w \\&\quad - \int _{{\mathcal {C}}_{r_\epsilon }}\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \gamma _\delta (u_t-v_t)\\&\quad -\int _{{\mathcal {C}}_{r_\epsilon }}\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \gamma _\delta \left( \eta _\rho ^\epsilon \phi _j\partial _j v^\epsilon \right) _t, \end{aligned} \end{aligned}$$
(116)

where this equality uses the fact that the corrector \(\zeta \) and the cutoff \(\gamma _\delta \) are constant in space.

The first two terms of (116) are bounded immediately using the definition of the cutoff \(\eta _\rho ^\epsilon \) from (92) and (93), which yields

$$\begin{aligned} \left| \int _{{\mathcal {C}}_{r_\epsilon }}\left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) _t\zeta _i\cdot \gamma _\delta \nabla w\right| \lesssim \sup _{{\mathcal {C}}_{r_\epsilon -\rho }}\left( \frac{1}{\rho ^2}\left| \nabla v^\epsilon \right| +\left| \nabla ^3v^\epsilon \right| \right) \int _{{\mathcal {C}}_{r_\epsilon }}\gamma _\delta \left| \zeta \right| \left| \nabla w\right| , \end{aligned}$$
(117)

where

$$\begin{aligned} \left| \zeta \right| :=\left( \sum _{i,j=1}^d\left| \zeta _{ij}\right| ^2\right) ^\frac{1}{2}. \end{aligned}$$

Similarly,

$$\begin{aligned} \left| \int _{{\mathcal {C}}_{r_\epsilon }}\left( \zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \right) (\gamma _\delta )_tw\right| \lesssim \sup _{{\mathcal {C}}_{r_\epsilon -\rho }}\left( \frac{1}{\rho }\left| \nabla v^\epsilon \right| +\left| \nabla ^2 v^\epsilon \right| \right) \int _{{\mathcal {C}}_{r_\epsilon }}\left| (\gamma _\delta )_t\right| \left| \zeta \right| \left| w\right| . \end{aligned}$$
(118)

It is necessary to analyze the final two terms of (116). For the first of these, using the Eqs. (86) and (91) satisfied by u and v respectively,

$$\begin{aligned} \begin{aligned} \int _{{\mathcal {C}}_{r_\epsilon }}\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \gamma _\delta (u_t-v_t)&= -\int _{{\mathcal {C}}_{r_\epsilon }}\gamma _\delta \zeta _i\cdot \left( \nabla ^2\left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \cdot a\nabla u\right) \\&\quad + \int _{{\mathcal {C}}_{r_\epsilon }}\gamma _\delta \zeta _i\cdot \left( \nabla ^2\left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \cdot a_{\mathrm {hom}}\nabla v^\epsilon \right) . \end{aligned} \end{aligned}$$

Therefore, using the uniform ellipticity (5) of a and the uniform ellipticity of \(a_{\mathrm {hom}}\) from Lemma 1, after bounding the time derivative of v by the norm of its Hessian matrix,

$$\begin{aligned}&\left| \int _{{\mathcal {C}}_{r_\epsilon }}\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \gamma _\delta (u_t-v^\epsilon _t)\right| \nonumber \\&\quad \lesssim \sup _{{\mathcal {C}}_{r_\epsilon -\rho }}\left( \frac{1}{\rho ^2}\left| \nabla v^\epsilon \right| +\frac{1}{\rho }\left| \nabla ^2v^\epsilon \right| +\left| \nabla ^3v^\epsilon \right| \right) \int _{{\mathcal {C}}_{r_\epsilon -\rho }}\gamma _\delta \left| \zeta \right| \left( \left| \nabla u\right| +\left| \nabla v^\epsilon \right| \right) .\nonumber \\ \end{aligned}$$
(119)

For the final term of (116),

$$\begin{aligned} \begin{aligned} \int _{{\mathcal {C}}_{r_\epsilon }}\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \gamma _\delta \left( \eta _\rho ^\epsilon \phi _j\partial _j v^\epsilon \right) _t =&\int _{{\mathcal {C}}_{r_\epsilon }}\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \gamma _\delta \phi _j\left( \eta _\rho ^\epsilon \partial _j v^\epsilon \right) _t \\&+ \int _{{\mathcal {C}}_{r_\epsilon }}\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \gamma _\delta \phi _{j,t}\left( \eta _\rho ^\epsilon \partial _j v^\epsilon \right) , \end{aligned} \end{aligned}$$

and, therefore, using the Eq. (7) satisfied by the correctors \(\{\phi _i\}_{i\in \{1,\ldots ,d\}}\),

$$\begin{aligned} \begin{aligned} \int _{{\mathcal {C}}_{r_\epsilon }}\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \gamma _\delta \left( \eta _\rho ^\epsilon \phi _j\partial _j v^\epsilon \right) _t =&\int _{{\mathcal {C}}_{r_\epsilon }}\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \gamma _\delta \phi _j\left( \eta _\rho ^\epsilon \partial _j v^\epsilon \right) _t \\&- \int _{{\mathcal {C}}_{r_\epsilon }}\gamma _\delta \nabla \left( \zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \eta _\rho ^\epsilon \partial _jv^\epsilon \right) \cdot q_j, \end{aligned} \end{aligned}$$

for the fluxes \(\{q_i\}_{i\in \{1,\ldots ,d\}}\) defined in (10). Hence, after bounding the time derivative of v by its Hessian matrix,

$$\begin{aligned}&\left| \int _{{\mathcal {C}}_{r_\epsilon }}\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \gamma _\delta \left( \eta _\rho ^\epsilon \phi _j\partial _j v^\epsilon \right) _t\right| \nonumber \\&\quad \lesssim \sup _{{\mathcal {C}}_{r_\epsilon -\rho }}\left( \left| \nabla v^\epsilon \right| \left( \frac{1}{\rho ^2}\left| \nabla v^\epsilon \right| +\frac{1}{\rho }\left| \nabla ^2v^\epsilon \right| +\left| \nabla ^3v^\epsilon \right| \right) \right) \int _{{\mathcal {C}}_{r_\epsilon }}\gamma _\delta \left| \zeta \right| \left| q\right| \nonumber \\&\qquad +\sup _{{\mathcal {C}}_{r_\epsilon -\rho }}\left( \frac{1}{\rho }\left| \nabla v^\epsilon \right| +\left| \nabla ^2v^\epsilon \right| \right) ^2\int _{{\mathcal {C}}_{r_\epsilon }}\gamma _\delta \left| \zeta \right| \left| q\right| \nonumber \\&\qquad +\sup _{{\mathcal {C}}_{r_\epsilon -\rho }}\left( \left( \frac{1}{\rho }\left| \nabla v^\epsilon \right| +\left| \nabla ^2v^\epsilon \right| \right) \left( \frac{1}{\rho ^2}\left| \nabla v^\epsilon \right| +\left| \nabla ^3v^\epsilon \right| \right) \right) \int _{{\mathcal {C}}_{r_\epsilon }}\gamma _\delta \left| \zeta \right| \left| \phi \right| ,\nonumber \\ \end{aligned}$$
(120)

where

$$\begin{aligned} \left| q\right| :=\left( \sum _{i=1}^d\left| q_i\right| ^2\right) ^\frac{1}{2}. \end{aligned}$$

Therefore, in view of (87), (114) and (115), it follows from the uniform ellipticity of a from (5), the definition of \(\gamma _\delta \), the Poincaré inequality in space, Hölder’s inequality, and Young’s inequality that

$$\begin{aligned} \begin{aligned}&\int _{{\mathcal {C}}_{r_\epsilon }}\left| (\gamma _\delta )_t\right| w^2+\int _{{\mathcal {C}}_{r_\epsilon }}\gamma _\delta \nabla w\cdot a\nabla w\lesssim \epsilon \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 + \int _{{\mathcal {C}}_{r_\epsilon }{\setminus } {\mathcal {C}}_{r_\epsilon -2\rho }}\left| \nabla v^\epsilon \right| ^2 \\&\quad + \frac{1}{\rho ^{d+4}}\int _{{\mathcal {C}}_1}\left( \left| \phi \right| ^2+\left| \psi \right| ^2+\left| \sigma \right| ^2\right) \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 + \left| \int _{{\mathcal {C}}_{r_\epsilon }}\partial _t\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \gamma _\delta w\right| . \end{aligned} \end{aligned}$$
(121)

For the second term of (121), the choice of the radius \(r_\epsilon \in (\frac{1}{2}, \frac{3}{4})\) satisfying (89) and estimate (106) for \(a_{\mathrm {hom}}\)-caloric functions imply that

$$\begin{aligned} \int _{{\mathcal {C}}_{r_\epsilon }}\left| \nabla v^\epsilon \right| ^2+\left| \nabla ^2v^\epsilon \right| ^2\lesssim & {} \int _{\partial _p{\mathcal {C}}_{r_\epsilon }}\left| \nabla ^{\text {tan}}u^\epsilon \right| ^2+\int _{-r_\epsilon ^2}^0\int _{\partial B_{r_\epsilon }}\left| u^\epsilon _t\right| ^2 \nonumber \\\lesssim & {} \frac{1}{\epsilon ^2}\int _{{\mathcal {C}}_{r_\epsilon }}\left| \nabla u^\epsilon \right| ^2\le \frac{1}{\epsilon ^2}\int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2. \end{aligned}$$
(122)

Furthermore, for each \(i\in \{1,\ldots ,d\}\), differentiating (91) in space and testing with \(\partial _i v^\epsilon \) yields with an application of Hölder’s inequality and Young’s inequality the a priori estimate

$$\begin{aligned}&||\partial _i v^\epsilon ||^2_{L^\infty \left( [-r^2_\epsilon ,0];L^2(B_{r_\epsilon })\right) }+||\nabla \partial _iv^\epsilon ||^2_{L^2({\mathcal {C}}_{r_\epsilon })} \nonumber \\&\quad \lesssim ||\partial _iu^\epsilon ||^2_{L^2\left( B_{r_\epsilon }\times \{-r_\epsilon ^2\}\right) }+\int _{-r^2_\epsilon }^0\int _{\partial B_{r_\epsilon }}\partial _iu^\epsilon \left( \nu \cdot \nabla \partial _iv^\epsilon \right) \nonumber \\&\quad \lesssim ||\partial _i u^\epsilon ||^2_{L^2({\mathcal {C}}_{r_\epsilon })}+\int _{-r^2_\epsilon }^0\int _{\partial B_{r_\epsilon }}\left( \nu \cdot \nabla \partial _iv^\epsilon \right) ^2 \nonumber \\&\quad \lesssim ||\partial _i u^\epsilon ||^2_{L^2({\mathcal {C}}_{r_\epsilon })}+\int _{-r^2_\epsilon }^0\int _{\partial B_{r_\epsilon }}\left| \nabla ^{\text {tan}}\partial _i u^\epsilon \right| ^2 \nonumber \\&\quad \lesssim \frac{1}{\epsilon ^2}||\partial _i u^\epsilon ||^2_{L^2({\mathcal {C}}_{r_\epsilon })} \nonumber \\&\quad \lesssim \frac{1}{\epsilon ^2}\int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2, \end{aligned}$$
(123)

where \(\nu \) denotes the interior normal, the third inequality relies on the boundedness of the Dirichlet to Neumann map, see [14], and the final inequality follows from the choice of \(r_\epsilon \) in (90).

Therefore, it follows from the (122), (123), Hölder’s inequality, and the Sobolev embedding theorem that

$$\begin{aligned} \begin{aligned}&\int _{{\mathcal {C}}_{r_\epsilon }{\setminus } {\mathcal {C}}_{r_\epsilon -2\rho }}\left| \nabla v^\epsilon \right| ^2 \\&\quad =\int _{-(r_\epsilon -\rho )^2}^0\int _{B_{r_\epsilon }}\chi _{{\mathcal {B}}_{r_\epsilon }{\setminus } {\mathcal {B}}_{r_\epsilon -2\rho }}\left| \nabla v^\epsilon \right| ^2+\int _{-r_\epsilon ^2}^{-(r_\epsilon -\rho )^2}\int _{B_{r_\epsilon }}\left| \nabla v^\epsilon \right| ^2, \\&\quad \lesssim \int _{-(r_\epsilon -\rho )^2}^0\left( \int _{B_{r_\epsilon }}\chi _{{\mathcal {B}}_{r_\epsilon }{\setminus } {\mathcal {B}}_{r_\epsilon -2\rho }}\right) ^\frac{2}{d}\left( \int _{B_{r_\epsilon }}\left| \nabla v^\epsilon \right| ^{\frac{2d}{d+2}}\right) ^\frac{d+2}{d}+\frac{\rho }{\epsilon ^2}\int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 \\&\quad \lesssim \rho ^\frac{2}{d}\int _{{\mathcal {C}}_{r_\epsilon }}\left| \nabla ^2 v^\epsilon \right| ^2+\frac{\rho }{\epsilon ^2}\int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 \\&\quad \lesssim \frac{\rho ^\frac{2}{d}}{\epsilon ^2}\left| \nabla u\right| ^2, \end{aligned} \end{aligned}$$
(124)

where \(\chi _{{\mathcal {B}}_{r_\epsilon }{\setminus } {\mathcal {B}}_{r_\epsilon -2\rho }}\) is the indicator function of the set \(({\mathcal {B}}_{r_\epsilon }{\setminus } {\mathcal {B}}_{r_\epsilon -2\rho })\) in \({\mathbb {R}}^d\), and where the argument is only written for the case \(d\ge 3\), since the modifications necessary for the cases \(d=1\) and \(d=2\) are straightforward and rely only upon the Sobolev embedding theorem.

For the final term of (121), estimates (117), (118), (119) and (120) together with estimates (87) and (109), where Hölder’s inequality is used for the final term, prove that, since removing \(\gamma _\delta \) from the final three terms of the right hand side increases their magnitude,

$$\begin{aligned} \left| \int _{{\mathcal {C}}_{r_\epsilon }}\partial _t\zeta _i\cdot \nabla \left( \eta _\rho ^\epsilon \partial _iv^\epsilon \right) \gamma _\delta w\right|\lesssim & {} \frac{1}{\rho ^{\frac{d}{2}+3}}\int _{{\mathcal {C}}_{r_\epsilon }}\gamma _\delta \left| \zeta \right| \left| \nabla w\right| \left( \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2\right) ^\frac{1}{2} \nonumber \\&+ \frac{1}{\rho ^{\frac{d}{2}+1}}\int _{{\mathcal {C}}_{r_\epsilon }}\left| (\gamma _\delta )_t\right| \left| \zeta \right| \left| w\right| \left( \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2\right) ^\frac{1}{2} \nonumber \\&+ \frac{1}{\rho ^{\frac{d}{2}+3}}\left( \int _{{\mathcal {C}}_1}\left| \zeta \right| ^2\right) ^\frac{1}{2}\int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 \nonumber \\&+ \frac{1}{\rho ^{d+4}}\left( \int _{{\mathcal {C}}_1}\left| \zeta \right| ^2\right) ^\frac{1}{2}\left( \int _{{\mathcal {C}}_1}\left| q\right| ^2\right) ^\frac{1}{2}\int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 \nonumber \\&+ \frac{1}{\rho ^{d+5}}\left( \int _{{\mathcal {C}}_1}\left| \zeta \right| ^2\right) ^\frac{1}{2}\left( \int _{{\mathcal {C}}_1}\left| \phi \right| ^2\right) ^\frac{1}{2}\int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2.\nonumber \\ \end{aligned}$$
(125)

Therefore, following an application of Hölder’s inequality and then Young’s inequality, it follows from (121) and (125) that

$$\begin{aligned} \int _{{\mathcal {C}}_{r_\epsilon }}\gamma _\delta \nabla w\cdot a\nabla w\lesssim & {} \epsilon \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 + \frac{\rho ^\frac{2}{d}}{\epsilon ^2}\int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 \nonumber \\&+ \frac{1}{\rho ^{d+4}}\int _{{\mathcal {C}}_1}\left( \left| \phi \right| ^2+\left| \psi \right| ^2+\left| \sigma \right| ^2\right) \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 \nonumber \\&+ \left( \frac{1}{\rho ^{\frac{d}{2}+3}}\left( \int _{{\mathcal {C}}_1}\left| \zeta \right| ^2\right) ^{\frac{1}{2}}+\frac{1}{\rho ^{d+6}}\int _{{\mathcal {C}}_1}\left| \zeta \right| ^2\right) \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 \nonumber \\&+ \frac{1}{\rho ^{d+4}}\left( \int _{{\mathcal {C}}_1}\left| \zeta \right| ^2\right) ^\frac{1}{2}\left( \int _{{\mathcal {C}}_1}\left| q\right| ^2\right) ^\frac{1}{2}\int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 \nonumber \\&+ \frac{1}{\rho ^{d+2}}\int _{{\mathcal {C}}_1}\left| (\gamma _\delta )_t\right| \left| \zeta \right| ^2\int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2. \end{aligned}$$
(126)

In view of the construction of \(\gamma _\delta \) from (112), and owing to the distributional convergence (113), the fact that \(\zeta \) vanishes at \(t=0\) thanks to (85) implies, for \(\left\langle \cdot \right\rangle \)-a.e. a,

$$\begin{aligned} \lim _{\delta \rightarrow 0}\int _{{\mathcal {C}}_1}\left| (\gamma _\delta )_t\right| \left| \zeta \right| ^2=0. \end{aligned}$$

Therefore, after passing to the limit \(\delta \rightarrow 0\) in (126), the construction of \(\gamma _\delta \) in (112) implies that

$$\begin{aligned} \int _{{\mathcal {C}}_{r_\epsilon }}\nabla w\cdot a\nabla w\lesssim & {} \epsilon \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 + \frac{\rho ^\frac{2}{d}}{\epsilon ^2}\int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 \nonumber \\&+ \frac{1}{\rho ^{d+4}}\int _{{\mathcal {C}}_1}\left( \left| \phi \right| ^2+\left| \psi \right| ^2+\left| \sigma \right| ^2\right) \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 \nonumber \\&+ \left( \frac{1}{\rho ^{\frac{d}{2}+3}}\left( \int _{{\mathcal {C}}_1}\left| \zeta \right| ^2\right) ^{\frac{1}{2}}+\frac{1}{\rho ^{d+6}}\int _{{\mathcal {C}}_1}\left| \zeta \right| ^2\right) \int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2 \nonumber \\&+ \frac{1}{\rho ^{d+4}}\left( \int _{{\mathcal {C}}_1}\left| \zeta \right| ^2\right) ^\frac{1}{2}\left( \int _{{\mathcal {C}}_1}\left| q\right| ^2\right) ^\frac{1}{2}\int _{{\mathcal {C}}_1}\left| \nabla u\right| ^2, \end{aligned}$$
(127)

which completes the proof of (111).

To obtain (127) for an arbitrary radius \(R>0\), suppose that u is an a-caloric function on \(B_R\). Then, for each \(\epsilon \in (0,\frac{R}{4})\), there exists a radius \(R_\epsilon \in (\frac{R}{2},\frac{3R}{4})\) and a cutoff function \(\eta ^{R_\epsilon }_\rho \) with \(0\le \eta ^{R_\epsilon }_\rho \le 1\) and such that

$$\begin{aligned} \eta ^{R_\epsilon }_\rho (x,t)=\left\{ \begin{array}{ll} 1 &{}\quad \text {if}\;\;(x,t)\in \overline{{\mathcal {C}}}_{R_\epsilon -2\rho R_\epsilon } \\ 0 &{}\quad \text {if}\;\;(x,t)\in \left( {\mathbb {R}}^d\times (-\infty ,0)\right) {\setminus }{\mathcal {C}}_{R_\epsilon -\rho R_\epsilon },\end{array}\right. \end{aligned}$$
(128)

which, for the \(a_{\mathrm {hom}}\)-caloric extension \(v^\epsilon \) of \(u^\epsilon \) into \({\mathcal {C}}_{R_\epsilon }\), define the corresponding augmented homogenization error

$$\begin{aligned} w=u-(1+\eta ^{R_\epsilon }_\rho \phi _i\partial _i)v^\epsilon . \end{aligned}$$

Then, for each \(\epsilon \in (0,\frac{R}{4})\), after performing the rescalings

$$\begin{aligned} (\tilde{w},\tilde{u},\tilde{v}^\epsilon ,\tilde{q})(\cdot ,\cdot )=\frac{1}{R_\epsilon }(w,u,v^\epsilon ,q)\left( R_\epsilon \cdot ,R_\epsilon ^2\cdot \right) , \end{aligned}$$

it follows using Eqs. (7), (19), (20) and (22) that the correctors rescale according to the rules

$$\begin{aligned} (\tilde{\phi }, \tilde{\psi }, \tilde{\sigma })(\cdot ,\cdot )=\frac{1}{R_\epsilon }(\phi , \psi , \sigma )\left( R_\epsilon \cdot , R_\epsilon ^2\cdot \right) \;\;\text {and}\;\;\tilde{\zeta }(\cdot )=\frac{1}{R_\epsilon ^2}\zeta \left( R^2_\epsilon \cdot \right) . \end{aligned}$$

Hence, after applying (127) and returning to the original scaling, it follows that, for each \(\epsilon \in (0,\frac{R}{4})\) and \(\rho \in (0,\frac{1}{8})\),

(129)

which is the general form of the energy estimate that will be used in the proof of excess decay to follow.

5.5 The proof of excess decay

The energy estimate will now be used to prove the excess decay of Proposition 2. Fix \(R>0\) and suppose that u is an a-caloric function \({\mathcal {C}}_R\). Then, for each \(\epsilon \in (0,\frac{R}{4})\) and \(\rho \in (0,\frac{1}{8})\), choose a radius \(R_\epsilon \in (\frac{R}{2},\frac{3R}{4})\) and a cutoff \(\eta ^{R_\epsilon }_\rho \) such that, for the \(a_{\mathrm {hom}}\)-caloric extension \(v^\epsilon \) of \(u^\epsilon \) into \({\mathcal {C}}_{R_\epsilon }\), the conclusion of (129) is satisfied for the augmented homogenization error w defined by

$$\begin{aligned} w=u-\left( 1+\eta ^{R_\epsilon }_\rho \phi _i\partial _i\right) v^\epsilon \;\;\text {in}\;\;{\mathcal {C}}_{R_\epsilon }. \end{aligned}$$
(130)

The proof of excess decay will now proceed in four steps.

Step 1: In the first step of the proof, it will be shown that, for any \(\delta >0\), there exists \(C_2=C_2(d,\lambda ,\delta )>0\) such that, whenever, for each \(i\in \{1,\ldots ,d\}\),

(131)

and

(132)

then

(133)

The proof is a simple consequence of estimate (129) and the definition (130).

Fix \(\delta >0\). Then, assuming that (131) and (132) are satisfied for some \(C_2>0\) to be fixed later, estimate (127), the choice (130) and the uniform ellipticity of a imply that

Therefore, first choose \(\epsilon _0\in (0,\frac{1}{4})\) satisfying

$$\begin{aligned} \epsilon _0<\frac{1}{3}\delta . \end{aligned}$$

Then, choose \(\rho _0\in (0,\frac{1}{8})\) sufficiently small so as to guarantee that

$$\begin{aligned} \frac{\rho ^\frac{2}{d}}{\epsilon _0^2}<\frac{1}{3}\delta . \end{aligned}$$

Finally, fix \(C_2>0\) large enough to ensure that

$$\begin{aligned} \left( \frac{1}{C_2^2\rho _0^{d+4}}+\frac{1}{C_2\rho _0^{\frac{d}{2}+3}}+\frac{1}{C_2^2\rho _0^{d+6}}+\frac{1}{C_2\rho _0^{d+4}}\left\langle \left| q^2\right| \right\rangle ^\frac{1}{2} \right) <\frac{\delta }{3}. \end{aligned}$$

Then, it follows that, for this choice of \(\epsilon _0\), \(\rho _0\) and \(C_2\),

which proves (133).

In particular, since \(\rho _0\in (0,\frac{1}{8})\) and \(R_{\epsilon _0}\in (\frac{R}{2},\frac{3R}{4})\), using the definition (128) of the cutoff \(\eta ^{R_{\epsilon _0}}_{\rho _0}\), it follows that

$$\begin{aligned} w=u-(1+\phi _i\partial _i)v^{\epsilon _0}\;\;\text {on}\;\; {\mathcal {C}}_\frac{R}{4}. \end{aligned}$$
(134)

Therefore, since \(R_{\epsilon _0}\in (\frac{R}{2},\frac{3R}{4})\), it follows from (133) and (134) that, for any \(\delta >0\) there exists \(C_2=C_2(d,\lambda )>0\) such that, whenever (132) and (133) are satisfied, for each \(r\in (0,\frac{R}{4}]\),

(135)

This completes the first step of the proof.

Step 2: The second step will show that the left hand side of (135) is a good approximation for the excess by using the interior regularity of \(a_{\mathrm {hom}}\)-caloric functions and the Caccioppoli inequality. To simplify the notation in what follows, define

$$\begin{aligned} v:=v^{\epsilon _0}\;\;\text {in}\;\;{\mathcal {C}}_{R_{\epsilon _0}}. \end{aligned}$$

Then, form the decomposition

$$\begin{aligned} \begin{aligned} \nabla u-\nabla v-\nabla (\phi _i\partial _iv)&= \nabla u-\nabla v(0,0)(I_d+\nabla \phi ) \\&\quad +\left( \nabla v(0,0)-\nabla v\right) (I_d+\nabla \phi )-\phi _i\nabla (\partial _iv), \end{aligned} \end{aligned}$$
(136)

where \(I_d\) denotes the \((d\times d)\)-identity matrix and, for each \(i,j\in \{1,\ldots ,d\}\),

$$\begin{aligned} \left( \nabla \phi \right) _{ij}:=\partial _j\phi _i. \end{aligned}$$

After fixing \(\xi _0=\nabla v(0,0)\), use (136), the triangle inequality, and Young’s inequality to prove that, in \({\mathcal {C}}_r\) for any \(r\in (0,\frac{R}{4}]\),

$$\begin{aligned} \begin{aligned}&\left| \nabla u-\xi _0-\nabla \phi _{\xi _0}\right| ^2 \\&\quad \lesssim \left| \nabla w\right| ^2+\sup _{{\mathcal {C}}_r}\left( \left| \nabla v-\nabla v(0,0)\right| \right) ^2\left| I+\nabla \phi \right| ^2+\sup _{{\mathcal {C}}_r}\left( \left| \nabla (\partial _iv)\right| \right) ^2\left| \phi _i\right| ^2. \end{aligned} \end{aligned}$$
(137)

Estimate (109) implies that, after bounding the time derivative of v by the norm of its Hessian matrix, and using the uniform ellipticity of a and the choice \(R_{\epsilon _0}\in (\frac{R}{2}, \frac{3R}{4})\), for each \(r\in (0,\frac{R}{4}]\),

(138)

Similarly, for each \(i\in \{1,\ldots ,d\}\), using estimate (109), the uniform ellipticity of a and \(R_{\epsilon _0}\in (\frac{R}{2},\frac{3R}{4})\), it follows that, for each \(r\in (0,\frac{R}{4}]\),

(139)

Finally, since for each \(i\in \{1,\ldots ,d\}\) the a-caloric coordinate \((x_i+\phi _i)\) satisfies

$$\begin{aligned} \partial _t(x_i+\phi _i)=\nabla \cdot a\nabla (x_i+\phi _i)\;\;\text {in}\;\;{\mathbb {R}}^{d+1}, \end{aligned}$$

the Caccioppoli inequality (29) implies that, for each \(r\in (0,\frac{R}{4}]\),

(140)

Therefore, returning to (137), estimates (138), (139) and (140), with the uniform ellipticity of a and the choice \(R_{\epsilon _0}\in (\frac{R}{2}, \frac{3R}{4})\), imply that, for each \(r\in (0,\frac{R}{4}]\),

(141)

This completes the second step.

Step 3: In the third step, inequality (141) will be combined with (133) to prove the excess decay along a subsequence. Namely, for every \(\alpha \in (0,1)\), it will be shown that there exists \(C_0=C_0(d,\lambda ,\alpha )>0\) and \(\theta _0=\theta _0(\alpha ,d,\lambda )\in (0,\frac{1}{4})\) such that, if \(r_1=\theta _0R\) and if, for each \(r\in [r_1,R]\),

(142)

and

(143)

then

$$\begin{aligned} \text {Exc}\left( u;r_1\right) \le \left( \frac{r_1}{R}\right) ^{2\alpha }\text {Exc}(u;R). \end{aligned}$$
(144)

Notice that the inequality appearing in (144) is exact.

Let \(\delta >0\) be arbitrary. In view of (133), there exists \(C_2=C_2(\delta ,d,\lambda )\ge 1\) such that, whenever (142) and (143) are satisfied for the constant \(C_2\), then, since \(R_{\epsilon _0}\in (\frac{R}{2}, \frac{3R}{4})\),

Therefore, it follows from inequality (141) and (143) that, for \(C_3=C_3(d,\lambda )>0\), since \(R_{\epsilon _0}\in (\frac{R}{2}, \frac{3R}{4})\),

(145)

Choose \(\theta _0\in (0,\frac{1}{4})\) sufficiently small so as to guarantee

$$\begin{aligned} C_3\theta _0^2\le \frac{1}{2}\theta _0^{2\alpha }, \end{aligned}$$
(146)

which is possible because \(\alpha \in (0,1)\), and choose \(\delta _0>0\) sufficiently small so as to guarantee

$$\begin{aligned} C_3\delta _0\theta _0^{-(d+2)}\le \frac{1}{2}\theta _0^{2\alpha }. \end{aligned}$$
(147)

It is then immediate from (145) that, by choosing \(\theta _0\) as in (146) and choosing \(C_0:=C_2(\delta _0,d,\lambda )\) for \(\delta _0\) defined in (147), whenever (142) and (143) are satisfied for the constant \(C_0\) and \(r_1=\theta _0R\),

(148)

Since the excess is defined, for each \(R>0\), by

inequality (148) implies that

(149)

However, because the left hand side of inequality (149) is invariant with respect to the addition of an arbitrary a-caloric gradient \((\xi +\nabla \phi _\xi )\) in the sense that, with (149), for every \(\xi \in {\mathbb {R}}^d\),

(150)

taking an infimum on the right hand side with respect to \(\xi \in {\mathbb {R}}^d\) yields

$$\begin{aligned} \text {Exc}(u;r_1)\le \left( \frac{r_1}{R}\right) ^{2\alpha }\text {Exc}(u;R), \end{aligned}$$
(151)

which completes the proof of (144), and the argument’s third step.

Step 4: The final step completes the proof using (151) and an iteration argument. Fix \(r_1<R\) such that, for \(C_0>0\) defined following (147), both (142) and (143) are satisfied for the constant \(C_0\) for every \(r\in [r_1,R]\). It will be shown that, in this case,

$$\begin{aligned} \text {Exc}(u;r)\lesssim \left( \frac{r}{R}\right) ^{2\alpha }\text {Exc}(u;R). \end{aligned}$$
(152)

Fix \(\theta _0\) as defined in (143). If \(r\ge R\theta _0\), then using the definition of the excess, for \(C=C(\theta _0)>0\),

$$\begin{aligned} \begin{aligned} \mathrm {Exc}(u;r)&\le \left( \frac{R}{r}\right) ^d\mathrm {Exc}(u;R)=\left( \frac{R}{r}\right) ^{d+2\alpha }\left( \frac{r}{R}\right) ^{2\alpha }\mathrm {Exc}(u;R) \\&\le \theta _0^{-(d+2\alpha )}\left( \frac{r}{R}\right) ^{2\alpha }\mathrm {Exc}(u;R) \le C\left( \frac{r}{R}\right) ^{2\alpha }\mathrm {Exc}(u;R). \end{aligned} \end{aligned}$$
(153)

If \(r<\theta _0R\), then let n be the unique positive integer satisfying \(\theta _0^{n-1}R \le r < \theta _0^nR\). Proceeding inductively, and relying upon the fact that (151) obtains an exact inequality, for constants \(C=C(\theta _0)>0\) which can change between inequalities,

$$\begin{aligned} \begin{aligned} \mathrm {Exc}(u;r) \le C \mathrm {Exc}\left( u;\theta _0^n R\right)&\le C (\theta _0^{n})^{2\alpha } \mathrm {Exc}(u;R) \\&= C\theta _0^{2\alpha } \left( \theta _0^{n-1}\right) ^{2\alpha } \mathrm {Exc}(u;R) \le C\left( \frac{r}{R} \right) ^{2\alpha } \mathrm {Exc}(u;R). \end{aligned} \end{aligned}$$
(154)

In combination, (153) and (154) prove (152) and complete the proof of Proposition 2.

6 The proof of Lemma 2

Fix a coefficient field a satisfying (5). Fix \(R>0\) and suppose that u is a distributional solution of

$$\begin{aligned} u_t=\nabla \cdot a\nabla u\;\;\text {in}\;\;{\mathcal {C}}_R. \end{aligned}$$
(155)

Let \(c\in {\mathbb {R}}\) and \(\rho \in (0,\frac{R}{2})\) be arbitrary. The Caccioppoli inequality is obtained by testing Eq. (155) with \(\eta ^2(u-c)\) for an appropriately chosen cutoff function \(\eta \).

Precisely, fix \(\eta \in \mathcal {C}^\infty _c({\mathbb {R}}^{d+1})\) satisfying \(0\le \eta \le 1\) and, for \(x\in {\mathbb {R}}^d\) and \(t\in {\mathbb {R}}\),

$$\begin{aligned} \eta (x,t)=\left\{ \begin{array}{ll} 1 &{} \quad \text {if}\;\;(x,t)\in \overline{B}_{R-\rho }\times [\rho ^2-R^2,0] \\ 0 &{}\quad \text {if}\;\;(x,t)\in {\mathbb {R}}^{d+1}{\setminus } {\mathcal {C}}_R.\end{array}\right. \end{aligned}$$

Furthermore, choose \(\eta \) satisfying

$$\begin{aligned} \left| \eta _t\right| \lesssim \frac{1}{\rho ^2}\;\;\text {and}\;\;\left| \nabla \eta \right| \lesssim \frac{1}{\rho }\;\;\text {on}\;\;{\mathbb {R}}^{d+1}. \end{aligned}$$

Test Eq. (155) against \(\eta ^2(u-c)\) and use the the definition of \(\eta \) and the identity

$$\begin{aligned} \nabla \left( \eta ^2 (u-c)\right) \cdot a\nabla u=\eta ^2 \nabla u \cdot a\nabla u+2\eta (u-c) \nabla \eta \cdot a\nabla u \end{aligned}$$

to obtain

$$\begin{aligned} \int _{{\mathcal {C}}_R}\eta ^2\nabla u\cdot a\nabla u\lesssim \frac{1}{2}\int _{{\mathcal {C}}_R}(u-c)^2\partial _t\eta +\int _{{\mathcal {C}}_R}\eta \left| (u-c)\right| \left| \nabla \eta \right| \left| \nabla u\right| . \end{aligned}$$

Therefore, following applications of Hölder’s inequality and Young’s inequality, and after using definition of \(\eta \) and the uniform ellipticity of a, it follows that

$$\begin{aligned} \int _{{\mathcal {C}}_{R-\rho }}\left| \nabla u\right| ^2\lesssim \frac{1}{\rho ^2}\int _{{\mathcal {C}}_R{\setminus }{\mathcal {C}}_{R-\rho }}(u-c)^2, \end{aligned}$$

which completes the proof.