1 Introduction

E. Wigner’s vision on the ubiquity of random matrix spectral statistics in quantum systems posed a main challenge to mathematics. The basic conjecture is that the distribution of the eigenvalue gaps of a large self-adjoint matrix with sufficient disorder is universal in the sense that it is independent of the details of the system and it depends only on the symmetry type of the model. This universal statistics has been computed by Dyson, Gaudin and Mehta for the Gaussian Unitary and Orthogonal Ensembles (GUE/GOE) in the limit as the dimension of the matrix goes to infinity. GUE and GOE are the simplest mean field random matrix models in their respective symmetry classes. They have centered Gaussian entries that are identically distributed and independent (modulo the hermitian symmetry). The celebrated Wigner–Dyson–Mehta (WDM) universality conjecture, as formulated in the classical book of Mehta [44], asserts that the same gap statistics holds if the matrix elements are independent and have arbitrary identical distribution (they are called Wigner ensembles). The WDM conjecture has recently been proved in increasing generality in a series of papers [18, 21, 24, 25] for both the real symmetric and complex hermitian symmetry classes via the Dyson Brownian motion. An alternative approach introducing the four-moment comparison theorem was presented in [49, 50, 52]. In this paper we only discuss universality in the bulk of the spectrum, but we remark that a similar development took place for the edge universality.

The next step towards Wigner’s vision is to drop the assumption of identical distribution in the WDM conjecture but still maintain the mean field character of the model by requiring a uniform lower and upper bound on the variances of the matrix elements. This generalization has been achieved in two steps. If the matrix of variances is stochastic, then universality was proved in [18, 27, 29], in parallel with the proof of the original WDM conjecture for Wigner ensembles. Without the stochasticity condition on the variances the limiting eigenvalue density is not the Wigner semicircle any more; the correct density was analyzed in [1, 2] and the universality was proved [4]. We remark that one may also depart from the semicircle law by adding a large diagonal component to Wigner matrices; universality for such deformed Wigner matrices was obtained in [43]. Finally we mention a separate direction to generalize the original WDM conjecture that aims at departing from the mean field condition: bulk universality for general band matrices with a band width comparable to the matrix size was proved in [11], see also [48] for Gaussian block-band matrices.

In this paper we drop the third key condition in the original WDM conjecture, the independence of the matrix elements, i.e. we consider matrices with correlated entries. Correlations come in many different forms and if they are extremely strong and long range, the universality may even be violated. We therefore consider random matrix models with a suitable decay of correlations. These models still carry sufficiently many random degrees of freedom for Wigner’s vision to hold and, indeed, our main result yields spectral universality for such matrices.

We now describe the key points of the current work. Our main result is the local law for the resolvent

$$\begin{aligned} \begin{aligned} {\mathbf{G}}(\zeta ) := ({\mathbf{H}}-\zeta {\mathbf{1}})^{-1}, \end{aligned} \end{aligned}$$
(1.1)

of the random matrix \({\mathbf{H}}={\mathbf{H}}^* \in \mathbb {C}^{N \times N}\) with the spectral parameter \( \zeta \) in the complex upper half plane \(\mathbb {H}:=\{ \zeta \in \mathbb {C}\; : \; {{\mathrm{Im}}}\, \zeta >0\}\) that lies very close to the real axis. We show that, as the size N of the random matrix tends to infinity, \({\mathbf{G}}={\mathbf{G}}(\zeta )\) is well approximated by a deterministic matrix \({\mathbf{M}}={\mathbf{M}}(\zeta )\) that satisfies a nonlinear matrix equation of the form

$$\begin{aligned} \begin{aligned} {\mathbf{1}}+(\zeta {\mathbf{1}}-{\mathbf{A}}+\mathcal {S}[{\mathbf{M}}]){\mathbf{M}}\,=\,{\mathbf{0}}\,. \end{aligned} \end{aligned}$$
(1.2)

Here the self-adjoint matrix \({\mathbf{A}}\) and the operator \(\mathcal {S}: \mathbb {C}^{N \times N} \rightarrow \mathbb {C}^{N \times N}\) on the space of matrices are determined by the first two moments of the random matrix

$$\begin{aligned} \begin{aligned} {\mathbf{A}}\,:= \, \mathbb {E}{\mathbf{H}}, \qquad \mathcal {S}[{\mathbf{R}}] := \mathbb {E}\,({\mathbf{H}}-{\mathbf{A}}) {\mathbf{R}}({\mathbf{H}}-{\mathbf{A}}). \end{aligned} \end{aligned}$$
(1.3)

The central role of (1.2) in the context of random matrices has been recognized by several authors [33, 38, 45, 53]. We will call (1.2) Matrix Dyson Equation (MDE) since the analogous equation for the resolvent is sometimes called Dyson equation in perturbation theory.

Local laws have become a cornerstone in the analysis of spectral properties of large random matrices [4, 8, 20, 23, 29, 35, 37, 51]. In its simplest form, a local law considers the normalized trace \( \frac{1}{N} {{\mathrm{Tr\,}}}{\mathbf{G}}(\zeta )\) of the resolvent. Viewed as a Stieltjes transform, it describes the empirical density of eigenvalues on the scale determined by \(\eta ={{\mathrm{Im}}}\, \zeta \). Assuming a normalization such that the spectrum of \({\mathbf{H}}\) remains bounded as \(N\rightarrow \infty \), the typical eigenvalue spacing in the bulk is of order 1 / N. The local law asserts that this normalized trace approaches a deterministic function \(m(\zeta )\) as the size N of the matrix tends to infinity and this convergence holds uniformly even if \(\eta =\eta _N\) depends on N as long as \(\eta \gg 1/N\). Equivalently, the empirical density of the eigenvalues converges on any scales slightly above 1 / N to a deterministic limit measure on \(\mathbb {R}\) with Stieltjes transform \(m(\zeta )\).

Since \({\mathbf{G}}\) is asymptotically close to \({\mathbf{M}}\), the deterministic limit of the Stieltjes transform of the empirical spectral measure is given by \(m(\zeta )= \frac{1}{N}{{\mathrm{Tr\,}}}{\mathbf{M}}(\zeta )\). Already in the case of random matrices with centered independent entries (Wigner-type matrices) the limiting measure \(\rho (\mathrm{d} \omega )\) and its Stieltjes transform \(m(\zeta )\) typically depend on the entire matrix of variances \(s_{xy}:= \mathbb {E}|h_{xy}|^2\) and the only known way to determine \(\rho \) is to solve (1.2). However, in this setting the problem simplifies considerably because the off-diagonal elements of \({\mathbf{G}}\) tend to zero, \({\mathbf{M}}\) is a diagonal matrix and (1.2) reduces to a vector equation for its diagonal elements. In case the variance matrix is doubly stochastic, \(\sum _y s_{xy}=1\) (generalized Wigner matrix), the problem simplifies yet again, leading to \({\mathbf{M}} = m_{\mathrm{sc}}{\mathbf{1}}\), where \(m_{\mathrm{sc}}=m_{\mathrm{sc}}(\zeta )\) is the Stieltjes transform of the celebrated semicircle law.

The main novelty of this work is to handle general correlations that do not allow to simplify (1.2). The off-diagonal matrix elements \(G_{x y}\), \(x\ne y\), do not vanish in general, even in the \(N\rightarrow \infty \) limit. The proof of the local law consists of two major parts. First, we derive an approximate equation

$$\begin{aligned} \begin{aligned} {\mathbf{1}}+(\zeta {\mathbf{1}}-{\mathbf{A}}+\mathcal {S}[{\mathbf{G}}]){\mathbf{G}}\,\approx \,{\mathbf{0}}, \end{aligned} \end{aligned}$$
(1.4)

for the resolvent of \({\mathbf{H}}\). To avoid confusion we stress that the expectation over the random matrix \({\mathbf{H}}\) in (1.3) is only used to define the deterministic operator \(\mathcal {S}\). If the argument \({\mathbf{G}}\) of \(\mathcal {S}\) itself is random as in (1.4), then \(\mathcal {S}[{\mathbf{G}}]\) is still random and we have \(\mathcal {S}[{\mathbf{G}}]=\widetilde{\mathbb {E}}(\widetilde{{\mathbf{H}}}-{\mathbf{A}}){\mathbf{G}}(\widetilde{{\mathbf{H}}}-{\mathbf{A}})\), where the expectation \(\widetilde{\mathbb {E}} \) acts only on an independent copy \(\widetilde{{\mathbf{H}}}\) of \({\mathbf{H}}\).

Second, we show that the matrix Dyson equation (1.2) is stable under small perturbations, concluding that \({\mathbf{G}} \approx {\mathbf{M}}\). The nontrivial correlations and the non commutativity of the matrix structure in the Dyson equation pose major difficulties compared to the uncorrelated case.

Local laws are the first step of a general three step strategy developed in [24, 25, 27, 29] for proving universality. The second step is to add a tiny independent Gaussian component and prove universality for this slightly deformed model via analyzing the fast convergence of the Dyson Brownian motion (DBM) to local equilibrium. Finally, the third step is a perturbation argument showing that the tiny Gaussian component does not alter the local statistics.

In fact, the second and the third steps are very robust arguments and they easily extend to the correlated case. They do not use any properties of the original ensemble other than the a priori bounds encoded in the local laws, provided that the variances of the matrix elements have a positive lower bound (see [12, 26, 41, 42]). Therefore our work focuses on the first step, establishing the stability of (1.2) and thus obtaining a local law.

Prior to the current paper, bulk universality has already been established for several random matrix models which carry some specific correlation from their construction. These include sample covariance matrices [25], adjacency matrices of large regular graphs [7] and invariant \(\beta \)-ensembles at various levels of generality [9, 10, 16, 17, 30, 34, 47]. However, neither of these papers aimed at understanding the effect of a general correlation nor were their methods suitable to deal with it. Universality for Gaussian matrices with a translation invariant covariance structure was established in [3]. For general distributions of the matrix entries, but with a specific two-scale finite range correlation structure that is smooth on the large scale and translation invariant on the short scale, universality was proved in [14], independently of the current work.

Finally, we mention that there exists an extensive literature on the limiting eigenvalue distribution for random matrices with correlated entries on the global scale (see e.g. [5, 6, 13, 33, 36, 46] and references therein), however these works either dealt with Gaussian random matrices or more specific correlation structures that allow one to effectively reduce (1.2) to a vector or scalar equation. While the matrix Dyson equation in full generality was introduced for the analysis on the global scale before us, we are not aware of a proof establishing that the empirical density of states converges to the deterministic density given by the solution of the MDE for a similarly broad class of models that we consider in this paper. This convergence is expressed by the fact that \(\frac{1}{N} {{\mathrm{Tr\,}}}{\mathbf{G}}(\zeta ) \approx \frac{1}{N} {{\mathrm{Tr\,}}}{\mathbf{M}}(\zeta )\) holds for any fixed \(\zeta \in \mathbb {H}\). We thus believe that our proof identifying the limiting eigenvalue distribution is a new result even on the global scale for ensembles with general short range correlations and non-Gaussian distribution.

We present the stability of the MDE and its application to random matrices with correlated entries separately. Our findings on the MDE are given in Sect. 2.1, while Sect. 2.2 contains the results about random matrices with correlated entries. These sections can be read independently of each other, with the latter relying on the former only through some basic definitions. In Sect. 3 we prove the local law for random matrices with correlations. The proof relies on the results stated in Sect. 2.1. These results concerning the MDE are established in Sect. 4, which can be read independently of any other section. Besides the results from Sect. 2.1 on the MDE the main ingredients of the proof in Sect. 3 are (i) estimates on the random error term appearing in the approximate MDE (1.4) and (ii) the fluctuation averaging mechanism for this error term. These two inputs (Lemma 3.4 and Proposition 3.5) are established in Sects. 5 and 6, respectively. However, Sect. 3 can be understood without reading the ensuing sections, taking these inputs for granted. Finally, we apply the local law to establish the rigidity of eigenvalues and bulk universality in Sect. 7. The “Appendix” collects the proofs for auxiliary results of generic nature that are not directly concerned with either the MDE or the random matrices.

2 Main results

2.1 The matrix Dyson equation

In this section we present our main results on the Matrix Dyson Equation and its stability. The corresponding proofs are carried out in Sect. 4. We consider the linear space \( \mathbb {C}^{N \times N}\) of \(N\times N\) complex matrices \({\mathbf{R}}=(r_{xy})_{x,y =1}^N\), and make it a Hilbert space by equipping it with the standard normalized scalar product

$$\begin{aligned} \begin{aligned} \langle {{\mathbf{R}}} , {{\mathbf{T}}}\rangle := \frac{1}{N}{{\mathrm{Tr\,}}}{\mathbf{R}}^{*}{\mathbf{T}}. \end{aligned} \end{aligned}$$
(2.1)

We denote the cone of strictly positive definite matrices by

$$\begin{aligned} \mathscr {C}_+\,:=\,\{{\mathbf{R}}\in \mathbb {C}^{N\times N}:\, {\mathbf{R}}> {\mathbf{0}}\}, \end{aligned}$$

and by \(\overline{\mathscr {C}}_+\) its closure, the cone of positive semidefinite matrices.

Let \({\mathbf{A}}={\mathbf{A}}^{*} \in \mathbb {C}^{N \times N}\) be a self-adjoint matrix. We will refer to \({\mathbf{A}}\) as the bare matrix. Furthermore, let \(\mathcal {S}:\mathbb {C}^{N \times N} \rightarrow \mathbb {C}^{N \times N}\) be a linear operator that is

  • self-adjoint w.r.t. the scalar product (2.1): \({{\mathrm{Tr\,}}}{\mathbf{R}}^{*}\mathcal {S}[{\mathbf{T}}]={{\mathrm{Tr\,}}}\mathcal {S}[{\mathbf{R}}]^*{\mathbf{T}}\) for any \({\mathbf{R}},{\mathbf{T}} \in \mathbb {C}^{N \times N}\);

  • positivity preserving: \(\mathcal {S}[{\mathbf{R}}]\geqslant {\mathbf{0}}\) for any \({\mathbf{R}}\geqslant {\mathbf{0}}\).

Note that in particular \(\mathcal {S}\) commutes with taking the adjoint, \(\mathcal {S}[{\mathbf{R}}]^*=\mathcal {S}[{\mathbf{R}}^{*}]\), and hence it is real symmetric, \({{\mathrm{Tr\,}}}{\mathbf{R}}\mathcal {S}[{\mathbf{T}}]={{\mathrm{Tr\,}}}\mathcal {S}[{\mathbf{R}}]{\mathbf{T}}\), for all \({\mathbf{R}},{\mathbf{T}}\in \mathbb {C}^{N \times N}\). We will refer to \(\mathcal {S}\) as the self-energy operator.

We call a pair \(({\mathbf{A}}, \mathcal {S})\) consisting of a bare matrix and a self-energy operator with the properties above a data pair. For a given data pair \(({\mathbf{A}}, \mathcal {S})\) and a spectral parameter\(\zeta \in \mathbb {H}\) in the upper half plane we consider the associated Matrix Dyson Equation (MDE),

$$\begin{aligned} \begin{aligned} -{\mathbf{M}}(\zeta )^{-1} \,=\, \zeta {\mathbf{1}}-{\mathbf{A}}+ \mathcal {S}[{\mathbf{M}}(\zeta )], \end{aligned} \end{aligned}$$
(2.2)

for a solution matrix \({\mathbf{M}}={\mathbf{M}}(\zeta ) \in \mathbb {C}^{N\times N}\) with positive definite imaginary part,

$$\begin{aligned} \begin{aligned} {{\mathrm{Im}}}\, {\mathbf{M}} := \frac{1}{2\mathrm {i}}({\mathbf{M}}-{\mathbf{M}}^*) \in \mathscr {C}_+ . \end{aligned} \end{aligned}$$
(2.3)

The question of existence and uniqueness of solutions to (2.2) with the constraint (2.3) has been answered in [38]. The MDE has a unique solution matrix \({\mathbf{M}}(\zeta )\) for any spectral parameter \(\zeta \in \mathbb {H}\) and these matrices constitute a holomorphic function \({\mathbf{M}}: \mathbb {H}\rightarrow \mathbb {C}^{N \times N}\).

On the space of matrices \(\mathbb {C}^{N \times N}\) we consider three norms. For \({\mathbf{R}}\in \mathbb {C}^{N \times N}\) we denote by \(\Vert {\mathbf{R}} \Vert \) the operator norm induced by the standard Euclidean norm \(\Vert \cdot \Vert \) on \(\mathbb {C}^N\), by \(\Vert {\mathbf{R}} \Vert _{\mathrm{hs}}:=\sqrt{\langle {{\mathbf{R}}} , {{\mathbf{R}}}\rangle }\) the norm associated with the scalar product (2.1) and by

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{R}} \Vert _{\mathrm{max}} :=\max _{\,x,y=1}^N|r_{xy} | , \end{aligned} \end{aligned}$$
(2.4)

the entrywise maximum norm on \(\mathbb {C}^{N \times N}\). We also denote the normalized trace of \({\mathbf{R}}\) by \(\langle {\mathbf{R}} \rangle :=\langle {{\mathbf{1}}} , {{\mathbf{R}}}\rangle \).

For linear operators \(\mathcal {T}:\mathbb {C}^{N \times N} \rightarrow \mathbb {C}^{N \times N}\) we denote by \(\Vert \mathcal {T} \Vert \) the operator norm induced by the norm \(\Vert \cdot \Vert \) on \(\mathbb {C}^{N \times N}\) and by \(\Vert \mathcal {T} \Vert _{\mathrm{sp}}\) the operator norm induced by \(\Vert \cdot \Vert _{\mathrm{hs}}\).

The following proposition provides a representation of the solution \({\mathbf{M}}\) as the Stieltjes-transform of a measure with values in \(\overline{\mathscr {C}}_+\). This is a standard result for matrix-valued Nevanlinna functions (see e.g. [32]). For the convenience of the reader we provide a proof which also gives an effective control on the boundedness of the support of this matrix-valued measure.

Proposition 2.1

(Stieltjes transform representation) Let \({\mathbf{M}}: \mathbb {H}\rightarrow \mathbb {C}^{N \times N}\) be the unique solution of (2.2) with \({{\mathrm{Im}}}\, {\mathbf{M}}\in \mathscr {C}_+\). Then \({\mathbf{M}}\) admits a Stieltjes transform representation,

$$\begin{aligned} \begin{aligned} m_{xy}(\zeta )\,=\,\int _\mathbb {R}\frac{v_{xy}(\mathrm {d}\tau )}{\tau -\zeta }, \qquad \zeta \in \mathbb {H},\;x, y=1,\ldots ,N . \end{aligned} \end{aligned}$$
(2.5)

The measure \({\mathbf{V}}(\mathrm {d}\tau )=(v_{x y}(\mathrm {d}\tau ))_{x,y=1}^N\) on the real line with values in positive semidefinite matrices is unique. It satisfies the normalization \({\mathbf{V}}(\mathbb {R})={\mathbf{1}}\) and has support in the interval \([-\,\kappa ,\kappa ]\), where

$$\begin{aligned} \begin{aligned} \kappa \,:=\, \Vert {\mathbf{A}} \Vert +2\Vert \mathcal {S} \Vert ^{1/2}. \end{aligned} \end{aligned}$$
(2.6)

We will now make additional quantitative assumptions on the data pair \(({\mathbf{A}},\mathcal {S})\) that ensure a certain regularity of the measure \({\mathbf{V}}(\mathrm {d}\tau )\). Our assumptions, labeled A1 and A2, always come together with a set of model parameters\(\mathscr {P}_1\) and \(\mathscr {P}_2\), respectively, that control them effectively. Estimates will typically be uniform in all data pairs that satisfy these assumptions with the given set of model parameters. In particular, they are uniform in the size N of the matrix, which is of great importance in the application to random matrix theory.

A1 :

Flatness Let \(\mathscr {P}_1=(p_1,P_1)\) with \(p_1,P_1>0\). The self-energy operator \(\mathcal {S}\) is called flat (with model parameters \(\mathscr {P}_1\)) if it satisfies the lower and upper bound

$$\begin{aligned} \begin{aligned} p_1\langle {\mathbf{R}} \rangle \,{\mathbf{1}}\,\leqslant \, \mathcal {S}[{\mathbf{R}}]\,\leqslant \, P_1\langle {\mathbf{R}} \rangle \,{\mathbf{1}},\qquad {\mathbf{R}} \in \overline{\mathscr {C}}_+. \end{aligned} \end{aligned}$$
(2.7)

Proposition 2.2

(Regularity of self-consistent density of states) Assume that \( \mathcal {S}\) is flat, i.e. it satisfies A1 with some model parameters \(\mathscr {P}_1\) and that the bare matrix has a bounded spectral norm,

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{A}} \Vert \,\leqslant \, P_0 , \end{aligned} \end{aligned}$$
(2.8)

for some constant \(P_0>0\). Then the holomorphic function \(\langle {\mathbf{M}} \rangle : \mathbb {H}\rightarrow \mathbb {H}\) is the Stieltjes transform of a Hölder-continuous probability density \(\rho \) with respect to the Lebesgue-measure,

$$\begin{aligned} \begin{aligned} \langle {\mathbf{V}}(\mathrm {d}\tau ) \rangle \,=\, \rho (\tau )\mathrm {d}\tau . \end{aligned} \end{aligned}$$
(2.9)

More precisely,

$$\begin{aligned} |\rho (\tau _1)-\rho (\tau _2) |\,\leqslant \, C|\tau _1-\tau _2 |^{c},\qquad \tau _1,\tau _2 \in \mathbb {R}, \end{aligned}$$

where \(c>0\) is a universal constant and the constant \(C>0\) depends only on the model parameters \(\mathscr {P}_1\) and \(P_0\). Furthermore, \(\rho \) is real analytic on the open set \( \{ {\tau \in \mathbb {R}: \rho (\tau )>0} \} \).

Definition 2.3

(Self-consistent density of states) Assuming a flat self-energy operator, the probability density \(\rho : \mathbb {R}\rightarrow [0,\infty )\), defined through (2.9), is called the self-consistent density of states (of the MDE with data pair \(({\mathbf{A}},\mathcal {S})\)). We denote by \({{\mathrm{supp\,}}}\rho \subseteq \mathbb {R}\) its support on the real line and call it the self-consistent spectrum. With a slight abuse of notation we also denote by

$$\begin{aligned} \begin{aligned} \rho (\zeta )\,:=\, \frac{1}{\pi } {{\mathrm{Im}}} \langle {\mathbf{M}}(\zeta ) \rangle ,\qquad \zeta \in \mathbb {H}, \end{aligned} \end{aligned}$$
(2.10)

the harmonic extension of \(\rho \) to the complex upper half plane.

The second set of assumptions describe the decay properties of the data pair \(({\mathbf{A}},\mathcal {S})\). To formulate them, we need to equip the index set \(\{1,\ldots ,N\}\) with a concept of distance. Recall that a pseudometricd on a set A is a symmetric function \( d : A \times A \rightarrow [0,\infty ] \) such that \( d(x,y) \leqslant d(x,z) + d(z,y) \) for all \( x,y,z \in A\). We say that the pseudometric space (Ad) with a finite set A has sub-P-dimensional volume, for some constant \( P>0 \), if the metric balls \( B_\tau (x):=\{y\,:\, d(x,y)\leqslant \tau \} \), satisfy

$$\begin{aligned} \begin{aligned} |B_\tau (x)|\,\leqslant \, \tau ^P, \qquad \tau \geqslant 2\;,\,x\in A . \end{aligned} \end{aligned}$$
(2.11)
A2 :

Faster than power law decay Let \(\mathscr {P}_2=(P,\underline{\pi } \!\,_1,\underline{\pi } \!\,_2)\), where \(P>0\) is a constant and \(\underline{\pi } \!\,_k=(\pi _k(\nu ))_{\nu =0}^\infty \), \(k=1,2\) are sequences of positive constants. The data pair \(({\mathbf{A}},\mathcal {S})\) is said to have faster than power law decay (with model parameters \(\mathscr {P}_2\)) if there exists a pseudometric d on the index space \( \{1,\ldots ,N\} \) such that the pseudometric space \(\mathbb {X}=(\{1,\ldots ,N\},d)\) has sub-P-dimensional volume (cf. (2.11)) and

$$\begin{aligned} |a_{xy} | \,\leqslant & {} \frac{\pi _1(\nu )}{(1+d(x,y))^\nu }+\frac{\pi _1(0)}{N} \end{aligned}$$
(2.12)
$$\begin{aligned} |\mathcal {S}[{\mathbf{R}}]_{xy} | \,\leqslant & {} \biggl (\frac{\pi _2(\nu )}{(1+d(x,y))^\nu }+\frac{\pi _2(0)}{N}\,\biggr )\,\Vert {\mathbf{R}} \Vert _{\mathrm{max}} ,\qquad {\mathbf{R}} \in \mathbb {C}^{N\times N},\qquad \end{aligned}$$
(2.13)

holds for any \(\nu \in \mathbb {N}\) and \(x,y \in \mathbb {X}\).

In order to state bounds of the form (2.12) and (2.13) more conveniently we introduce the following matrix norms.

Definition 2.4

(Faster than power law decay) Given a pseudometric d on \(\{1,\ldots ,N\}\) and a sequence \(\underline{\pi } \!\, = (\pi (\nu ))_{\nu =0}^\infty \) of positive constants, we define:

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{R}} \Vert _{\underline{\pi } \!\,} \,:=\, \sup _{\nu \in \mathbb {N}}\,\max _{x,y =1}^N \,\biggl (\frac{\pi (\nu )}{(1+d(x,y))^\nu }+\frac{\pi (0)}{N}\biggr )^{\!-1}|r_{xy} | ,\qquad {\mathbf{R}} \in \mathbb {C}^{N \times N} . \end{aligned} \end{aligned}$$
(2.14)

If \(\Vert {\mathbf{R}} \Vert _{\underline{\pi } \!\,} \leqslant 1 \), for some sequence \( \underline{\pi } \!\, \), we say that \( {\mathbf{R}} \) has faster than power law decay (up to level \(\frac{1}{N}\)) in the pseudometric space \( \mathbb {X} := ( \{ {1,\ldots ,N} \} ,d) \).

This norm expresses the typical behavior of many matrices in this paper that they have an off-diagonal decay faster than any power, up to a possible mean-field term of order 1 / N. Using this norm the bounds (2.12) and (2.13) take the simple forms:

$$\begin{aligned} \qquad \Vert {\mathbf{A}} \Vert _{\underline{\pi } \!\,_1} \leqslant 1, \qquad \Vert \mathcal {S}[{\mathbf{R}}] \Vert _{\underline{\pi } \!\,_2} \leqslant \Vert {\mathbf{R}} \Vert _{\mathrm{max}} . \end{aligned}$$

Our main result, the stability of the MDE, holds uniformly for all spectral parameters that are either away from the self-consistent spectrum, supp \(\rho \), or where the self-consistent density of states takes positive values. Therefore, for any \(\delta >0\) we set

$$\begin{aligned} \mathbb {D}_\delta \,:=\, \bigl \{ {\zeta \in \mathbb {H}: \rho ( \zeta )+{{\mathrm{dist}}}(\zeta , {{\mathrm{supp\,}}}\rho )> \delta \,} \bigl \} . \end{aligned}$$

Theorem 2.5

(Faster than power law decay of solution) Assume A1 and A2 and let \(\delta >0\). Then there exists a positive sequence \(\underline{\gamma } \!\,\) such that

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{M}}(\zeta ) \Vert _{\underline{\gamma } \!\,}\,\leqslant \, 1, \qquad \zeta \in \mathbb {D}_\delta . \end{aligned} \end{aligned}$$
(2.15)

The sequence \(\underline{\gamma } \!\,\) depends only on \(\delta \) and the model parameters \(\mathscr {P}_1\) and \(\mathscr {P}_2\).

Our main result on the MDE is its stability with respect to the entrywise maximum norm on \(\mathbb {C}^{N \times N}\), see (2.4). The choice of this norm is especially useful for applications in random matrix theory, since the matrix valued error terms are typically controlled in this norm. We denote by

$$\begin{aligned} B^{\mathrm{max}}_\tau ({\mathbf{R}})\,:=\, \bigl \{ {{\mathbf{Q}} \in \mathbb {C}^{N \times N}: \Vert {\mathbf{Q}}-{\mathbf{R}} \Vert _{\mathrm{max}} \leqslant \tau \,} \bigl \} , \end{aligned}$$

the ball of radius \( \tau > 0 \) around \({\mathbf{R}} \in \mathbb {C}^{N \times N}\) w.r.t. the entrywise maximum norm.

Theorem 2.6

(Stability) Assume A1 and A2, let \(\delta >0\) and \(\zeta \in \mathbb {D}_\delta \). Then there exist constants \(c_1,c_2>0\) and a unique function \(\varvec{\mathfrak {G}}=\varvec{\mathfrak {G}}_\zeta : B^{\mathrm{max}}_{c_1}({\mathbf{0}}) \rightarrow B^{\mathrm{max}}_{c_2}({\mathbf{M}})\) such that

$$\begin{aligned} \begin{aligned} -\,{\mathbf{1}}\,=\, (\zeta {\mathbf{1}}-{\mathbf{A}}+\mathcal {S}[\varvec{\mathfrak {G}}({\mathbf{D}})])\varvec{\mathfrak {G}}({\mathbf{D}})+{\mathbf{D}}, \end{aligned} \end{aligned}$$
(2.16)

for all \({\mathbf{D}} \in B^{\mathrm{max}}_{c_1}({\mathbf{0}})\), where \({\mathbf{M}}={\mathbf{M}}(\zeta )\). The function \(\varvec{\mathfrak {G}}\) is analytic. In particular, there exists a constant \(C>0\) such that

$$\begin{aligned} \begin{aligned} \Vert \varvec{\mathfrak {G}}({\mathbf{D}}_1)-\varvec{\mathfrak {G}}({\mathbf{D}}_2) \Vert _\mathrm{max}\leqslant \, C\Vert {\mathbf{D}}_1-{\mathbf{D}}_2 \Vert _\mathrm{max}. \end{aligned} \end{aligned}$$
(2.17)

for all \({\mathbf{D}}_1,{\mathbf{D}}_2 \in B^{\mathrm{max}}_{c_1/2}({\mathbf{0}})\).

Furthermore, there is a sequence \(\underline{\gamma } \!\,\) of positive constants, and a linear operator \(\mathcal {Z}=\mathcal {Z}_\zeta : \mathbb {C}^{N \times N} \rightarrow \mathbb {C}^{N \times N}\) such that the derivative of \(\varvec{\mathfrak {G}}\), evaluated at \({\mathbf{D}}={\mathbf{0}}\), has the form

$$\begin{aligned} \begin{aligned} \nabla \varvec{\mathfrak {G}}({\mathbf{0}})\,=\, \mathcal {Z}+{\mathbf{M}}\,\mathrm{Id}, \end{aligned} \end{aligned}$$
(2.18)

and \(\mathcal {Z}\), as well as its adjoint \(\mathcal {Z}^*\) with respect to the scalar product (2.1), satisfy

$$\begin{aligned} \begin{aligned} \qquad \Vert \mathcal {Z}[{\mathbf{R}}] \Vert _{\underline{\gamma } \!\,}+\Vert \mathcal {Z}^*[{\mathbf{R}}] \Vert _{\underline{\gamma } \!\,}\,\leqslant \, \Vert {\mathbf{R}} \Vert _{\mathrm{max}} , \qquad \forall {\mathbf{R}} \in \mathbb {C}^{N \times N} . \end{aligned} \end{aligned}$$
(2.19)

Here \(c_1, c_2, C \) and \(\underline{\gamma } \!\,\) depend only on \(\delta \) and the model parameters \(\mathscr {P}_1\), \(\mathscr {P}_2\) from assumptions A1 and A2.

Theorem 2.6 states quantitative regularity properties of the analytic map \(\varvec{\mathfrak {G}}\). These estimates yield strong stability properties of the MDE. For a concrete application in the proof of the local law, see Corollary 3.2 below.

2.2 Random matrices with correlations

In this section we present our results on local eigenvalue statistics of random matrices with correlations. Let \({\mathbf{H}}=(h_{x,y})_{x,y=1}^N \in \mathbb {C}^{N \times N}\) be a self-adjoint random matrix. For a spectral parameter \(\zeta \in \mathbb {H}\) we consider the associated Matrix Dyson Equation (MDE),

$$\begin{aligned} \begin{aligned} -\,{\mathbf{M}}(\zeta )^{-1}\,&=\,\zeta {\mathbf{1}}-{\mathbf{A}} +\mathcal {S}[{\mathbf{M}}(\zeta )] , \\ \qquad {\mathbf{A}}\,:=\, \mathbb {E}{\mathbf{H}},\qquad&\mathcal {S}[{\mathbf{R}}]\,:=\, \mathbb {E} {\mathbf{H}} {\mathbf{R}} {\mathbf{H}}-{\mathbf{A}} {\mathbf{R}} {\mathbf{A}}, \end{aligned} \end{aligned}$$
(2.20)

for a solution matrix \({\mathbf{M}}(\zeta )\) with positive definite imaginary part [cf. (2.3)]. The linear self-energy operator\(\mathcal {S}: \mathbb {C}^{N \times N} \rightarrow \mathbb {C}^{N \times N}\) from (2.20) preserves the cone \(\overline{\mathscr {C}} \!\,_+\) of positive semidefinite matrices and the MDE therefore has a unique solution [38] whose properties have been presented in Sect. 2.1.

Our main result states that under natural assumptions on the correlations of the entries within the random matrix \({\mathbf{H}}\), the resolvent

$$\begin{aligned} \begin{aligned} {\mathbf{G}}(\zeta )\,:=\, ({\mathbf{H}}- \zeta {\mathbf{1}})^{-1}, \end{aligned} \end{aligned}$$
(2.21)

is close to the non-random solution \({\mathbf{M}}(\zeta )\) of the MDE (2.20), provided N is large enough. In order to list these assumptions, we write \({\mathbf{H}}\) as a sum of its expectation and fluctuation

$$\begin{aligned} \begin{aligned} {\mathbf{H}} \,=\, {\mathbf{A}} + \frac{1}{\sqrt{N}}{\mathbf{W}}. \end{aligned} \end{aligned}$$
(2.22)

Here, the bare matrix\({\mathbf{A}}\) is a non-random self-adjoint matrix and \({\mathbf{W}}\) is a self-adjoint random matrix with centered entries, \(\mathbb {E}{\mathbf{W}}={\mathbf{0}}\). The normalization factor \( N^{-1/2} \) in (2.22) ensures that the spectrum of the fluctuation matrix\({\mathbf{W}}\), with entries of a typical size of order one, remains bounded.

In the following we will assume that there exists some pseudometric d on the index set \(\{1, \ldots ,N\}\), such that the resulting pseudometric space

$$\begin{aligned} \mathbb {X}\,=\, (\{1, \ldots ,N\},d), \end{aligned}$$

has sub-P-dimensional volume for some constant \(P>0\), i.e. d satisfies (2.11), and that the bare and fluctuation matrices satisfy the following assumptions:

B1 :

Existence of moments Moments of all orders of \({\mathbf{W}}\) exist, i.e., there is a sequence of positive constants \(\underline{\kappa } \!\,_1=(\kappa _1(\nu ))_{\nu \in \mathbb {N}}\) such that

$$\begin{aligned} \begin{aligned} \qquad \mathbb {E}\,|w_{xy} |^\nu \,\leqslant \, \kappa _1(\nu ) , \end{aligned} \end{aligned}$$
(2.23)

for all \(x,y\in \mathbb {X}\) and \(\nu \in \mathbb {N}\).

B2 :

Decay of expectation The entries \(a_{xy}\) of the bare matrix \({\mathbf{A}}\) decay in the distance of the indices x and y, i.e., there is a sequence of positive constants \(\underline{\kappa } \!\,_2=(\kappa _2(\nu ))_{\nu \in \mathbb {N}}\) such that

$$\begin{aligned} \begin{aligned} |a_{xy} |\,\leqslant \, \frac{\kappa _2(\nu )}{(1+d(x,y))^\nu }, \end{aligned} \end{aligned}$$
(2.24)

for all \(x,y \in \mathbb {X}\) and \(\nu \in \mathbb {N}\).

B3 :

Decay of correlation The correlations in \( {\mathbf{W}} \) are fast decaying, i.e., there is a sequence of positive constants \(\underline{\kappa } \!\,_3=(\kappa _3(\nu ))_{\nu \in \mathbb {N}}\) such that for all symmetric sets \(A,B \subseteq \mathbb {X}^2\) (A is symmetric if \((x,y) \in A\) implies \((y,x) \in A\)), and all smooth functions \(\phi : \mathbb {C}^{|A |} \rightarrow \mathbb {R}\) and \(\psi : \mathbb {C}^{|B |} \rightarrow \mathbb {R}\), we have

$$\begin{aligned} \begin{aligned} |\mathrm {Cov}(\phi (\mathrm{W}_{\!A}),\psi (\mathrm{W}_{\!B})) | \,\leqslant \, \kappa _3(\nu )\frac{\Vert \nabla \phi \Vert _\infty \Vert \nabla \psi \Vert _\infty }{(1+d_2(A,B))^{\nu }}, \qquad \forall \nu \in \mathbb {N}. \end{aligned} \end{aligned}$$
(2.25)

Here, \( \mathrm {Cov}(Z_1,Z_2):=\mathbb {E}Z_1Z_2-\mathbb {E}Z_1\,\mathbb {E}Z_2\) is the covariance, \( \mathrm{W}_{\!A} := (w_{xy})_{(x,y)\in A} \), and

$$\begin{aligned} d_2(A,B) \,:= \min \bigl \{{\max \bigl \{{d(x_1,x_2),d(y_1,y_2)}\bigr \}:(x_1,y_1) \in A,\; (x_2,y_2) \in B}\bigr \}, \end{aligned}$$

is the distance between A and B in the product metric on \(\mathbb {X}\). The supremum norm on vector valued functions \(\Phi =(\phi _i)_i\) is \(\Vert \Phi \Vert _\infty :=\sup _{Y}\max _{i}| \phi _i(Y)|\).

B4 :

Flatness There is a positive constant \(\kappa _4\) such that for any two deterministic vectors \({\mathbf{u}},{\mathbf{v}} \in \mathbb {C}^N\) we have

$$\begin{aligned} \begin{aligned} \mathbb {E}\,| {\mathbf{u}}^*{\mathbf{W}}{\mathbf{v}} |^2\,\geqslant \, \kappa _4\, \Vert {\mathbf{u}} \Vert ^2\Vert {\mathbf{v}} \Vert ^2, \end{aligned} \end{aligned}$$
(2.26)

where \(\Vert \cdot \Vert \) denotes the standard Euclidean norm on \(\mathbb {C}^N\).

We consider the constants

$$\begin{aligned} \begin{aligned} \mathscr {K}\,:=\,(P,\underline{\kappa } \!\,_1,\underline{\kappa } \!\,_2, \underline{\kappa } \!\,_3,\kappa _4), \end{aligned} \end{aligned}$$
(2.27)

appearing in the above assumptions (2.11) and (2.23)–(2.26), as model parameters. These parameters are regarded as fixed and our statements are uniform in the ensemble of all correlated random matrices of all dimensions N satisfying B1–B4 with given \(\mathscr {K}\).

Under the assumptions B1B4 the function \(\rho : \mathbb {H}\rightarrow [0,\infty )\), given in terms of the solution \({\mathbf{M}}\) to (2.20) by

$$\begin{aligned} \rho (\zeta )\,=\, \frac{1}{\pi N}{{\mathrm{Im}}}\, {{\mathrm{Tr\,}}}{\mathbf{M}}(\zeta ), \end{aligned}$$

is the harmonic extension of a Hölder-continuous probability density \(\rho : \mathbb {R}\rightarrow [0,\infty )\) (cf. Proposition 2.2), which is called the self-consistent density of states (cf. Definition 2.3).

Theorem 2.7

(Local law for correlated random matrices) Let \({\mathbf{G}}\) be the resolvent of a random matrix \({\mathbf{H}}\) written in the form (2.22) that satisfies B1B4. For all \(\delta ,\varepsilon >0\) and \(\nu \in \mathbb {N}\) there exists a positive constant C such that in the bulk,

$$\begin{aligned} \begin{aligned}&\mathbb {P}\Biggl [\, \exists \zeta \in \mathbb {H}\,\text { s.t. }\rho (\zeta )\geqslant \delta ,\; {{\mathrm{Im}}}\, \zeta \geqslant N^{-1+\varepsilon },\, \max _{x,y=1}^N|G_{xy}(\zeta )-m_{xy}(\zeta ) |\\&\quad \geqslant \frac{N^\varepsilon }{\sqrt{N\, {{\mathrm{Im}}}\, \zeta }} \Biggr ] \leqslant \frac{C}{N^{\nu }\!}. \end{aligned} \end{aligned}$$
(2.28)

Furthermore, the normalized trace converges with the improved rate

$$\begin{aligned} \begin{aligned}&\mathbb {P}\Biggl [\, \exists \zeta \in \mathbb {H}\,\text { s.t. }\rho (\zeta )\geqslant \delta ,\; {{\mathrm{Im}}}\, \zeta \geqslant N^{-1+\varepsilon },\, \left|\frac{1}{N}\!{{\mathrm{Tr\,}}}{\mathbf{G}}(\zeta )-\frac{1}{N}\!{{\mathrm{Tr\,}}}{\mathbf{M}}(\zeta ) \right|\\&\quad \geqslant \frac{N^\varepsilon }{{N\, {{\mathrm{Im}}}\, \zeta }} \Biggr ] \leqslant \frac{C}{N^{\nu }\!}. \end{aligned} \end{aligned}$$
(2.29)

The constant C depends only on the model parameters \(\mathscr {K}\) in addition to \(\delta \), \(\varepsilon \) and \(\nu \).

In Sect. 3 we present the proof of Theorem 2.7 that is based on the results from Sect. 2.1 about the Matrix Dyson Equation. As a standard consequence of the local law (2.28) and the uniform boundedness of \({{\mathrm{Im}}}m_{xx}\) from Theorem 2.5, the eigenvectors of \({\mathbf{H}}\) in the bulk are completely delocalized. This directly follows from the uniform boundedness of \({{\mathrm{Im}}}\, G_{xx}(\zeta )\) and spectral decomposition of the resolvent (see e.g. [20]).

Corollary 2.8

(Delocalization of eigenvectors) Pick any \(\delta ,\varepsilon ,\nu >0\) and let \({\mathbf{u}}\) be a normalized, \(\Vert {\mathbf{u}} \Vert =1\), eigenvector of \({\mathbf{H}}\), corresponding to an eigenvalue \(\lambda \in \mathbb {R}\) in the bulk, i.e., \(\rho (\lambda )\geqslant \delta \). Then

$$\begin{aligned} \mathbb {P}\biggl [ \max _{x =1}^N|u_x|\geqslant \frac{N^\varepsilon }{\!\sqrt{N}} \biggr ] \,\leqslant \, \frac{C}{N^\nu \!} \;, \end{aligned}$$

for a positive constant C, depending only on the model parameters \(\mathscr {K}\) in addition to \(\delta \), \(\varepsilon \) and \(\nu \).

The averaged local law (2.29) directly implies the rigidity of the eigenvalues in the bulk. For any \(\tau \in \mathbb {R}\), we define

$$\begin{aligned} \begin{aligned} i(\tau ) \,:=\, \bigg \lceil N \!\int _{-\infty }^\tau \rho (\omega )\mathrm {d}\omega \bigg \rceil . \end{aligned} \end{aligned}$$
(2.30)

This is the index of an eigenvalue that is typically close to a spectral parameter \(\tau \) in the bulk. Then the standard argument presented in Sect. 7.1 proves the following result.

Corollary 2.9

(Rigidity) For any \(\delta ,\varepsilon ,\nu >0\) we have

$$\begin{aligned} \begin{aligned} \mathbb {P}\biggl [{ \sup \big \{\,|\lambda _{i(\tau )} - \tau |\; : \, \tau \in \mathbb {R}, \;\rho (\tau )\geqslant \delta \big \} \geqslant \frac{N^\varepsilon \!}{N}\, }\biggr ] \,\leqslant \frac{C}{N^\nu \!} \;, \end{aligned} \end{aligned}$$
(2.31)

for a positive constant C, depending only on the model parameters \(\mathscr {K}\) in addition to \(\delta \), \(\varepsilon \) and \(\nu \).

Another consequence of Theorem 2.7 is the universality of the local eigenvalue statistics in the bulk of the spectrum of \({\mathbf{H}}\) both in the sense of averaged correlation functions and in the sense of gap universality. For the universality statement we make the following additional assumption that is stronger than B4:

B5 :

Fullness We say that \({\mathbf{H}}\) is \(\beta =1\) (\(\beta =2\))—full if \({\mathbf{H}}\in \mathbb {R}^{N \times N}\) is real symmetric (\({\mathbf{H}}\in \mathbb {C}^{N \times N}\) is complex hermitian) and there is a positive constant \(\kappa _5\) such that

$$\begin{aligned} \mathbb {E}|{{\mathrm{Tr\,}}}{\mathbf{R}}{\mathbf{W}} |^2\,\geqslant \, \kappa _5 {{\mathrm{Tr\,}}}{\mathbf{R}}^2 , \end{aligned}$$

for any real symmetric \({\mathbf{R}}\in \mathbb {R}^{N \times N}\) (any complex hermitian \({\mathbf{R}}\in \mathbb {C}^{N \times N}\)).

When B5 is assumed we consider \(\kappa _5\) as an additional model parameter.

The first formulation of the bulk universality states that the k-point correlation functions \(\rho _k\) of the eigenvalues of \( {\mathbf{H}} \), rescaled around an energy parameter \(\omega \) in the bulk, converge weakly to those of the GUE/GOE. The latter are given by the correlation functions of well known determinantal processes. The precise statement is the following:

Corollary 2.10

(Correlation function bulk universality) Let \({\mathbf{H}}\) satisfy B1B3 and B5 with \(\beta =1\) (\(\beta =2\)). Pick any \( \delta > 0 \) and choose any \(\omega \in \mathbb {R}\) with \(\rho (\omega ) \geqslant \delta \). Fix \( k \in \mathbb {N}\) and \( \varepsilon \in (0,1/2) \). Then for any smooth, compactly supported test function \( \Phi : \mathbb {R}^k \rightarrow \mathbb {R}\) the k-point local correlation functions \( \rho _k : \mathbb {R}^k \rightarrow [0,\infty ) \) of the eigenvalues of \( {\mathbf{H}}\) converge to the k-point correlation function \( \Upsilon _k: \mathbb {R}^k \rightarrow [0,\infty )\) of the GOE(GUE)-determinantal point process,

$$\begin{aligned} \Biggl |\; \int _{\mathbb {R}^k} \Phi ({\varvec{\tau }})\, \Biggr [\, \frac{1}{\rho (\omega )^k\!} \rho _k\bigg (\omega + \frac{\tau _1}{N\rho (\omega )},\ldots ,\omega + \frac{\tau _k}{N \rho (\omega )}\bigg )-\Upsilon _{\!k}({\varvec{\tau }})\,\Biggr ]\, \mathrm {d}{\varvec{\tau }} \,\Biggr | \,\leqslant \, \frac{C}{N^c\!}, \end{aligned}$$

where \( {\varvec{\tau }} = (\tau _1,\ldots ,\tau _k) \), and the positive constants Cc depend only on \(\delta \), \(\Phi \) and the model parameters.

The second formulation compares the joint distributions of gaps between consecutive eigenvalues of \( {\mathbf{H}}\) in the bulk with those of the GUE/GOE. The proofs of Corollaries 2.10 and 2.11 are presented in Sect. 7.2.

Corollary 2.11

(Gap universality in bulk) Let \({\mathbf{H}}\) satisfy B1B3 and B5 with \(\beta =1\) (\(\beta =2\)). Pick any \(\delta >0\), an energy \(\tau \) in the bulk, i.e. \(\rho (\tau ) \geqslant \delta \), and let \(i=i(\tau )\) be the corresponding index defined in (2.30). Then for all \( n \in \mathbb {N}\) and all smooth compactly supported observables \( \Phi :\mathbb {R}^n \rightarrow \mathbb {R}\), there are two positive constants C and c, depending on n, \( \delta \), \( \Phi \) and the model parameters, such that the local eigenvalue distribution is universal,

$$\begin{aligned}&\left|\,\mathbb {E}\Phi \Bigr ( \bigl (N\rho (\lambda _i)(\lambda _{i+j}-\lambda _i)\bigr )_{j=1}^n \Bigr ) - \mathbb {E}_\mathrm{G}\Phi \Bigl ( \bigl (N\rho _\mathrm{sc}(0)(\lambda _{\lceil {N/2} \rceil +j}-\lambda _{\lceil {N/2} \rceil })\bigr )_{j=1}^n \Bigr ) \right| \\&\quad \leqslant \frac{C}{N^c\!}. \end{aligned}$$

Here the second expectation \(\mathbb {E}_\mathrm{G}\) is with respect to GUE and GOE in the cases of complex Hermitian and real symmetric \({\mathbf{H}}\), respectively, and \(\rho _\mathrm{sc}(0)=1/(2\pi )\) is the value of Wigner’s semicircle law at the origin.

During the final preparation of this manuscript and after announcing our theorems, we learned that a similar universality result but with a special correlation structure was proved independently in [14]. The covariances in [14] have a specific finite range and translation invariant structure,

$$\begin{aligned} \mathbb {E}\, h_{xy} h_{uv} = \psi \Big ( \frac{x}{N}, \frac{y}{N}, u-x, v-y\Big ), \end{aligned}$$
(2.32)

where \(\psi \) is a piecewise Lipschitz function with finite support in the third and fourth variables. The short scale translation invariance in (2.32) allows one to use partial Fourier transform after effectively decoupling the slow variables from the fast ones. This renders the matrix Eq. (1.2) into a vector equation for \(N^2\) variables and the necessary stability result directly follows from [1]. The main difference between the current work and [14] is that here we analyze (1.2) as a genuine matrix equation without relying on translation invariance and thus arbitrary short range correlations are allowed.

3 Local law for random matrices with correlations

In this section we show how the stability of the MDE, Theorem 2.6, can be combined with probabilistic estimates for random matrices with correlated entries to obtain a conceptually simple proof of the local law, Theorem 2.7. We state these probabilistic estimates in Lemma 3.4 and Proposition 3.5 below before applying them to establish the local law. Their proofs are postponed to Sects. 5 and 6, respectively.

Consider any self-adjoint random matrix \( {\mathbf{H}} \), and let \(({\mathbf{A}},\mathcal {S})\) be the data pair for the MDE generated by the first two moments of \( {\mathbf{H}} \) through (2.20). Clearly, the self-energy operator \( \mathcal {S}: \mathbb {C}^{N \times N} \rightarrow \mathbb {C}^{N \times N} \) generated by (2.20) is self-adjoint with respect to the scalar product (2.1), and preserves the cone of positive semidefinite matrices. The next lemma, whose proof is postponed to end of this section, shows that also the other assumptions with regards to our MDE results in Sect. 2.1 are satisfied for random matrices considered in Sect. 2.2.

Lemma 3.1

(MDE data generated by random matrices) If \({\mathbf{H}}\) satisfies B1B4, then the data pair \(({\mathbf{A}},\mathcal {S})\) generated through (2.20) satisfies A1 and A2. The corresponding model parameters \(\mathscr {P}_1\) and \(\mathscr {P}_2\) depend only on \(\mathscr {K}\).

In order to apply the stability of the MDE, we first write the defining equation for the resolvent (2.21), namely \(-{\mathbf{1}}=(\zeta {\mathbf{1}}-{\mathbf{H}}){\mathbf{G}}(\zeta )\), of \( {\mathbf{H}} \) into the form

$$\begin{aligned} \begin{aligned} -\,{\mathbf{1}}\,=\, (\zeta {\mathbf{1}}-{\mathbf{A}}+\mathcal {S}[{\mathbf{G}}(\zeta )]){\mathbf{G}}(\zeta ) + {\mathbf{D}}(\zeta ), \end{aligned} \end{aligned}$$
(3.1a)

a perturbed version of the MDE (2.2). Here the error matrix\({\mathbf{D}} : \mathbb {H}\rightarrow \mathbb {C}^{N\times N} \) is given by

$$\begin{aligned} \begin{aligned} {\mathbf{D}}(\zeta )\,:=\, -(\mathcal {S}[{\mathbf{G}}(\zeta )]+{\mathbf{H}}-{\mathbf{A}}){\mathbf{G}}(\zeta ) . \end{aligned} \end{aligned}$$
(3.1b)

We view the resolvent \( {\mathbf{G}}(\zeta ) \) as a perturbation of the deterministic matrix \( {\mathbf{M}}(\zeta ) \) induced by the random perturbation \( {\mathbf{D}}(\zeta )\). Using the notation \(\varvec{\mathfrak {G}}= \varvec{\mathfrak {G}}_\zeta \) from Theorem 2.6, we identify from (2.2) and (3.1a), \( {\mathbf{M}}(\zeta ) = \varvec{\mathfrak {G}}_\zeta ({\mathbf{0}}) \) and \( {\mathbf{G}}(\zeta ) = \varvec{\mathfrak {G}}_\zeta ({\mathbf{D}}(\zeta )) \). Thus Theorem 2.6 yields the following:

Corollary 3.2

(Stability for local laws) Assume \({\mathbf{H}} \) satisfies B1B4, fix \(\delta >0\) and \( \zeta \in \mathbb {D}_\delta \). There exist constants \( c,C > 0 \), depending only on the model parameters and \(\delta \), such that on the event where the a-priori bound

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{G}}(\zeta )-{\mathbf{M}}(\zeta ) \Vert _{\mathrm{max}} \leqslant c , \end{aligned} \end{aligned}$$
(3.2)

holds, the difference \( {\mathbf{G}}(\zeta )-{\mathbf{M}}(\zeta ) \) is bounded in terms of the perturbation \( {\mathbf{D}}(\zeta ) \) by the two estimates:

$$\begin{aligned} \Vert {\mathbf{G}}(\zeta )-{\mathbf{M}}(\zeta ) \Vert _{\mathrm{max}}&\leqslant \, C\Vert {\mathbf{D}}(\zeta ) \Vert _{\mathrm{max}} \end{aligned}$$
(3.3)
$$\begin{aligned} {\textstyle \frac{1}{N}}|{{\mathrm{Tr\,}}}({\mathbf{G}}(\zeta )-{\mathbf{M}}(\zeta )) | \,&\leqslant \, |\langle {\mathbf{J}}(\zeta ),{\mathbf{D}}(\zeta ) \rangle |+C\Vert {\mathbf{D}}(\zeta ) \Vert _{\mathrm{max}}^2 , \end{aligned}$$
(3.4)

for some non-random \( {\mathbf{J}}(\zeta )\in \mathbb {C}^{N\times N} \) with fast decay, \(\Vert {\mathbf{J}}(\zeta ) \Vert _{\underline{\gamma } \!\,} \leqslant C \), where the sequence \( \underline{\gamma } \!\, \) is from Theorem 2.6.

Proof

By Lemma 3.1 assumptions A1 and A2 are satisfied for the data pair \(({\mathbf{A}}, \mathcal {S})\). Hence, the first bound (3.3) follows directly from (2.17) with \({\mathbf{D}}_1={\mathbf{0}}\) and \({\mathbf{D}}_2={\mathbf{D}}(\zeta )\). For the second bound (3.4) we first write \(\frac{1}{N}{{\mathrm{Tr\,}}}[{\mathbf{G}}-{\mathbf{M}}]= \langle {\mathbf{1}}, \varvec{\mathfrak {G}}({\mathbf{D}})-\varvec{\mathfrak {G}}({\mathbf{0}}) \rangle \), then use the analyticity of \( \varvec{\mathfrak {G}} \) and the representation (2.18) of its derivative to obtain

$$\begin{aligned}&\langle {\mathbf{1}}, \varvec{\mathfrak {G}}({\mathbf{D}})-\varvec{\mathfrak {G}}({\mathbf{0}}) \rangle \,=\, \langle {{\mathbf{1}}} , {\mathcal {Z}[{\mathbf{D}}]+{\mathbf{M}}{\mathbf{D}}}\rangle + \mathcal {O}\bigl (\Vert {\mathbf{D}} \Vert _{\mathrm{max}}^2\bigr )\\&\quad = \langle {\mathcal {Z}^*[{\mathbf{1}}]+{\mathbf{M}}^*\!} , {{\mathbf{D}}}\rangle + \mathcal {O}\bigl (\Vert {\mathbf{D}} \Vert _{\mathrm{max}}^2\bigr ) . \end{aligned}$$

Identifying \( {\mathbf{J}}:= \mathcal {Z}^*[{\mathbf{1}}] + {\mathbf{M}}^*\) yields (3.4). The fast off-diagonal decay of the entries of \({\mathbf{J}}\) follows from (2.15) and (2.19). \(\square \)

Corollary 3.2 shows, that on the event where the rough a-priori bound (3.2) holds, the proof of the local law (2.28) and (2.29) is reduced to bounding the error \( {\mathbf{D}} \) on the right hand sides of (3.3) and (3.4) by \( (N\,{{\mathrm{Im}}}\, \zeta )^{-1/2} \) and \( (N\,{{\mathrm{Im}}}\, \zeta )^{-1} \), respectively. In order, to state such estimates for the error matrix we use the notion of stochastic domination, first introduced in [20], that is designed to compare random variables up to \(N^\varepsilon \)-factors on very high probability sets.

Definition 3.3

(Stochastic domination) Let \(X=X^{(N)}\), \(Y=Y^{(N)}\) be sequences of non-negative random variables. We say X is stochastically dominated byY if

$$\begin{aligned} \mathbb {P}\bigl [X > N^\varepsilon Y\bigr ]\,\leqslant \, C(\varepsilon , \nu )N^{-{\nu }}, \qquad N \in \mathbb {N}, \end{aligned}$$

for any \(\varepsilon >0\), \({\nu } \in \mathbb {N}\) and some (N-independent) family of positive constants C. In this case we write \(X\prec Y\).

In this paper the family C of constants in Definition 3.3 will always be an explicit function of the model parameters (2.27) and possibly some additional parameters that are considered fixed and apparent from the context. However, the constants are always uniform in the spectral parameter \(\zeta \) on the domain under consideration and indices xy in case \(X=r_{xy}\) is the element of a matrix \({\mathbf{R}}=(r_{xy})_{x,y}\). To use the notion of stochastic domination, we will think of \({\mathbf{H}}={\mathbf{H}}^{(N)}\) as embedded into a sequence of random matrices with the same model parameters.

The following lemma asserts that the error matrix \({\mathbf{D}}\) from (3.1b) converges to zero as the size N of the random matrix grows to infinity.

Lemma 3.4

(Smallness of perturbation in max-norm) Let \(C>0\) and \(\delta , \varepsilon >0\) be fixed. Away from the real axis the error matrix \({\mathbf{D}}\) is small without regardless of an a-priori bound on \( {\mathbf{G}}-{\mathbf{M}} \):

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{D}}(\tau +\mathrm {i}\eta ) \Vert _{\mathrm{max}}\,\prec \,\frac{1}{\sqrt{N}},\qquad \tau \in [-C,C],\; \eta \in [1, C]. \end{aligned} \end{aligned}$$
(3.5)

Near the real axis and in the regime where the harmonic extension of the self-consistent density of states is bounded away from zero, we have

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{D}}(\zeta ) \Vert _{\mathrm{max}}\,\mathbbm {1}\bigl (\Vert {\mathbf{G}}(\zeta )-{\mathbf{M}}(\zeta ) \Vert _{\mathrm{max}} \,\leqslant \, N^{-\varepsilon }\bigr )\,\prec \,\frac{1}{\sqrt{N\, {{\mathrm{Im}}}\, \zeta }}, \end{aligned} \end{aligned}$$
(3.6)

for all \(\zeta \in \mathbb {H}\) with \(\rho (\zeta )\geqslant \delta \) and \({{\mathrm{Im}}}\, \zeta \geqslant N^{-1+\varepsilon }\).

The proof of this key technical result is postponed to Sect. 5. In order to bound the first term on the right hand side of (3.4) we use the following fluctuation averaging mechanism (introduced in [28] for Wigner matrices) to improve the bound (3.6) to a better bound for the inner product \( \langle {\mathbf{J}},{\mathbf{D}} \rangle \), given a version of the entry-wise local law.

Proposition 3.5

(Fluctuation averaging) Assume B1–B4, and let \( [\kappa _-,\kappa _+] \) be the convex hull of \( {{\mathrm{supp\,}}}\rho \). Let \(\delta ,C>0\) and \(\zeta \in \mathbb {H}\) with \(\delta \leqslant {{\mathrm{dist}}}(\zeta ,[\kappa _-,\kappa _+])+\rho (\zeta )\leqslant \delta ^{-1}\) and \({{\mathrm{dist}}}(\zeta , {{\mathrm{Spec}}}({\mathbf{H}}^B))^{-1}\prec N^{C}\) for all \(B \subsetneq \mathbb {X} \). Let \(\varepsilon \in (0,1/2)\) be a constant and \(\Psi \) a non-random control parameter with \(N^{-1/2}\leqslant \Psi \leqslant N^{-\varepsilon }\). Suppose that the entrywise local law holds in the form

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{G}}(\zeta )-{\mathbf{M}}(\zeta ) \Vert _{\mathrm{max}}\,\prec \, \Psi . \end{aligned} \end{aligned}$$
(3.7)

Then the error matrix \({\mathbf{D}}\), defined in (3.1b), satisfies

$$\begin{aligned} \begin{aligned} |\langle {\mathbf{R}},{\mathbf{D}}(\zeta ) \rangle |\,\prec \, \Psi ^2, \end{aligned} \end{aligned}$$
(3.8)

for every non-random \({\mathbf{R}} \in \mathbb {C}^{N \times N}\) with faster than power law decay.

Note that Proposition 3.5 is stated on a slightly larger domain of spectral parameters than Theorem 2.7 as it allows \(\zeta \) to be away from the convex hull of \({{\mathrm{supp\,}}}\rho \) even if \(\rho (\zeta )\) is not bounded away from zero. This slight extension will be needed in Sect. 7. The proof of Proposition 3.5 is carried out in Sect. 6. We have now stated all the results needed to prove the local law. In order to keep formulas short, will use the notation:

$$\begin{aligned} \begin{aligned} \Lambda (\zeta )\,:=\, \Vert {\mathbf{G}}(\zeta )-{\mathbf{M}}(\zeta ) \Vert _{\mathrm{max}} . \end{aligned} \end{aligned}$$
(3.9)

Proof of Theorem 2.7

We will start with the proof of (2.28). By the Stieltjes transform representation (2.5) of \({\mathbf{M}}\) and the trivial bound \(\Vert {\mathbf{G}}(\zeta ) \Vert \leqslant \frac{1}{{{\mathrm{Im}}}\, \zeta }\) the norm of the difference \( \Lambda (\zeta ) \) converges to zero as \( \zeta \) moves further away from the real axis. In particular, the a-priori bound (3.2) needed for Corollary 3.2 automatically holds for sufficiently large \( {{\mathrm{Im}}}\, \zeta \). Thus combining the corollary with the unconditional error bound (3.5) the estimate (3.3) takes the form

$$\begin{aligned} \begin{aligned} \Lambda (\tau + \mathrm {i}\eta _*)\,\prec \, \frac{1}{\sqrt{N}},\qquad \tau \in [-C_1,C_1], \end{aligned} \end{aligned}$$
(3.10)

for any fixed constant \( C_1>0 \) and sufficiently large \(\eta _*\).

Now let \(\tau \in \mathbb {R}\), \(\eta _0 \in [N^{-1+\varepsilon },\eta _*]\) and \(\zeta _0=\tau +\mathrm {i}\eta _0 \in \mathbb {H}\) such that \(\rho (\zeta _0)\geqslant \delta \) for some \(\delta \in (0,1]\). Note that \(\rho (\zeta _0)\geqslant \delta \) and \(\eta _0 \leqslant \eta _*\) imply \(\tau \in [-\,C_1,C_1]\) for some positive constant \(C_1\) because \(\rho \) is the harmonic extension of the self-consistent density of states with compact support in \([-\kappa ,\kappa ]\) (Proposition 2.1). Since in addition the self-consistent density of states is uniformly Hölder continuous (cf. Proposition 2.2), there is a constant \(c_1\), depending on \(\delta \) and \(\mathscr {P}\), such that \( \inf _{\eta \in [\eta _0,\eta _*]}\rho (\tau +\mathrm {i}\eta )\geqslant c_1 \). Therefore, by (3.6) and (2.17) we infer that

$$\begin{aligned} \begin{aligned} \Lambda (\zeta )\,\mathbbm {1}(\Lambda (\zeta )\leqslant N^{-\varepsilon /4})\,\prec \, \frac{1}{\sqrt{N\,{{\mathrm{Im}}}\, \zeta }} , \qquad \zeta \in \tau + \mathrm {i}[\eta _0,\eta _*]. \end{aligned} \end{aligned}$$
(3.11)

Since \(N^{-\varepsilon /2} \geqslant (N\,{{\mathrm{Im}}}\, \zeta )^{-1/2}\), the inequality (3.11) establishes on a high probability event a gap in the set possible values that \(\Lambda (\zeta )\) can take. The indicator function in (3.11) is absent for \(\zeta =\tau +\mathrm {i}\eta _*\) because of (3.10), i.e. at that point the value lies below the gap. From the Lipshitz-continuity of \(\zeta \mapsto \Lambda (\zeta )\) with Lipshitz-constant bounded by \(2N^2\) for \({{\mathrm{Im}}}\, \zeta \geqslant \frac{1}{N}\) and a standard continuity argument together with a union bound (e.g. Lemma A.1 in [4]), we conclude that \( \Lambda (\zeta ) \) lies below the gap for any \(\zeta \) with \({{\mathrm{Im}}}\,\zeta \in [\eta _0, \eta _*]\) with very high probability. Thus, using the definition of stochastic domination, we see that \( \max _{\zeta \in \mathbb {D}_\delta } \Lambda (\zeta ) \prec (N\,{{\mathrm{Im}}}\,\zeta )^{-1/2} \), i.e., the entrywise local law (2.28) holds.

Now we prove (2.29). Let \(\zeta \in \mathbb {H}\) with \({{\mathrm{Im}}}\, \zeta \geqslant N^{-1+\varepsilon }\) and \(\rho (\zeta )\geqslant \delta \), so that the entrywise local law (3.7) holds at \( \zeta \) with \( \Psi := (N\,{{\mathrm{Im}}}\,\zeta )^{-1/2} \). Applying the fluctuation averaging (Proposition 3.5) with \( \Psi := (N\,{{\mathrm{Im}}}\,\zeta )^{-1/2} \) and \( {\mathbf{R}} := {\mathbf{J}}(\zeta ) \) yields \( |\langle {\mathbf{J}},{\mathbf{D}} \rangle | \prec (N\,{{\mathrm{Im}}}\,\zeta )^{-1} \). Plugging this estimate for the first term into the right hand side of (3.4), and recalling the definition of stochastic domination, yields (2.29). This finishes the proof of Theorem 2.7. \(\square \)

Proof of Lemma 3.1

The condition (2.12) on the bare matrix \({\mathbf{A}}\) is clearly satisfied by (2.24). The lower bound on \(\mathcal {S}\) in (2.7) follows from (2.26). To show this, let \({\mathbf{R}}=\sum _{i} \varrho _i {\mathbf{r}}_i{\mathbf{r}}_i^* \in \overline{\mathscr {C}} \!\,_+\), where the sum is over the orthonormal basis \( ({\mathbf{r}}_i)_{i=1}^N \). Then \( {\mathbf{v}}^*\mathcal {S}[{\mathbf{R}}]{\mathbf{v}} = \sum _{i}\varrho _i \mathbb {E} |{\mathbf{r}}_i^*{\mathbf{W}}{\mathbf{v}} |^2 \geqslant \kappa _4\sum _{j} \varrho _i \), for any normalized vector \({\mathbf{v}}\in \mathbb {C}^N\).

We will now verify the upper bounds on \(\mathcal {S}\) in (2.7) and (2.13). Both bounds follow from the decay of covariances

$$\begin{aligned} \begin{aligned} |\mathbb {E} w_{xu}w_{vy} |\,\leqslant \, \kappa _3(2\nu )\,\left( { (q_{xy}q_{uv})^{\nu } + (q_{xv}q_{uy})^{\nu } }\right) ,\qquad q_{xy}\,:=\, {\textstyle \frac{1}{1+d(x,y)}} , \qquad \nu \in \mathbb {N}, \end{aligned} \end{aligned}$$
(3.12)

which is an immediate consequence of (2.25) with the choices \(W_A=(w_{xu},w_{ux})\), \(W_B=(w_{vy},w_{yv})\), \(\phi (\xi _1,\xi _2)=\overline{\xi } \!\,_1\) and \(\psi (\xi _1,\xi _2)={\xi }_1\).

Indeed, to see the upper bound in (2.7) it suffices to show

$$\begin{aligned} \begin{aligned} \Vert \mathcal {S}[{\mathbf{r}}{\mathbf{r}}^*] \Vert \,\leqslant \, CN^{-1}, \end{aligned} \end{aligned}$$
(3.13)

for a constant \(C>0\), depending on \(\mathscr {K}\), and any normalized vector \({\mathbf{r}} \in \mathbb {C}^N\), because we can use for any \({\mathbf{R}} \in \overline{\mathscr {C}} \!\,_+\) the spectral decomposition \({\mathbf{R}}=\sum _i \varrho _i {\mathbf{r}}_i{\mathbf{r}}_i^* \) as above. The estimate (3.12) yields

$$\begin{aligned} \begin{aligned} \Vert \mathcal {S}[{\mathbf{r}}{\mathbf{r}}^*] \Vert \,\leqslant \, \frac{\kappa _3(2 {\nu })}{N} \left( \, \Vert {\mathbf{Q}}^{({\nu })} \Vert |{\mathbf{r}} |^*{\mathbf{Q}}^{({\nu })}|{\mathbf{r}} | \,+\,\big \Vert ({\mathbf{Q}}^{({\nu })}|{\mathbf{r}} |)({\mathbf{Q}}^{({\nu })}|{\mathbf{r}} |)^*\,\big \Vert \right) ,\qquad \end{aligned} \end{aligned}$$
(3.14)

where we defined the matrix \({\mathbf{Q}}^{({\nu })}\) with entries \(q_{xy}^{(\mathrm{v})}:=q_{xy}^{{\nu }}\) and \(|{\mathbf{r}} |:=(|r_x |)_{x\in \mathbb {X}}\). Since

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{Q}}^{({\nu })} \Vert \,\leqslant \, \max _x \sum _y q_{xy}^{{\nu }}, \end{aligned} \end{aligned}$$
(3.15)

the inequality (3.13) follows from the sub-P-dimensional volume (2.11) by choosing \({\nu }\) sufficiently large.

To show (2.13) we fix any \({\mathbf{R}} \in \mathbb {C}^{N \times N}\) and estimate the entries \(\mathcal {S}_{xy}[{\mathbf{R}}]\) of \(\mathcal {S}[{\mathbf{R}}]\) by using (3.12),

$$\begin{aligned} |\mathcal {S}_{xy}[{\mathbf{R}}] |\,\leqslant \, \kappa _3(2 {\nu }) \left( { \left( {\frac{1}{N}\sum _{u,{\nu }}q_{u{\nu }}^{{\nu }}}\right) q_{xy}^{{\nu }}+\frac{1}{N}\left( {\sum _{\nu } q_{xv}^{{\nu }}}\right) \left( { \sum _u q_{uy}^{{\nu }}}\right) }\right) \,\Vert {\mathbf{R}} \Vert _{\mathrm{max}} \end{aligned}$$

The bound (2.13) follows because the right hand side of (3.15) is finite. \(\square \)

4 The matrix Dyson equation

This section is dedicated to the analysis of the MDE (2.2). In particular, it is thus independent of the probabilistic results established in Sects. 5, 6 and 7. In Sect. 4.1 we establish a variety of properties of the solution \({\mathbf{M}}\) to the MDE. The section starts with the proof of Proposition 2.1 and ends with the proof of Theorem 2.5. In Sect. 4.2 we prove Proposition 2.2 and the stability of the MDE, Theorem 2.6.

4.1 The solution of the matrix Dyson equation

Most of the inequalities in this and the following section are uniform in the data pair \(({\mathbf{A}},\mathcal {S})\) that determines the MDE and its solution, given a fixed set of model parameters \(\mathscr {P}_k\) corresponding to the assumptions Ak. We therefore introduce a convention for inequalities up to constants, depending only on the model parameters.

Convention 4.1

(Comparison relation and constants) Suppose a set of model parameters \(\mathscr {P}\) is given. Within the proofs we will write C and c for generic positive constants, depending on \(\mathscr {P}\). In particular, C and c may change their values from inequality to inequality. If Cc depend on additional parameters \(\mathscr {L}\), we will indicate this by writing \(C(\mathscr {L}),c(\mathscr {L})\). We also use the comparison relation \(\alpha \lesssim \beta \) or \(\beta \gtrsim \alpha \) for any positive \(\alpha \) and \(\beta \) if there exists a constant \(C>0\) that depends only on \(\mathscr {P}\), but is otherwise uniform in the data pair \(({\mathbf{A}},\mathcal {S})\), such that \(\alpha \leqslant C \beta \). In particular, C does not depend on the dimension N or the spectral parameter \(\zeta \). In case \(\alpha \lesssim \beta \lesssim \alpha \) we write \(\alpha \sim \beta \). For two matrices \({\mathbf{R}},{\mathbf{T}}\in \overline{\mathscr {C}}_+\) we similarly write \({\mathbf{R}}\lesssim {\mathbf{T}}\) if the inequality \({\mathbf{R}}\leqslant C {\mathbf{T}}\) in the sense of quadratic forms holds with a constant \(C>0\) depending only on the model parameters.

In the upcoming analysis many quantities depend on the spectral parameter \(\zeta \). We will often suppress this dependence in our notation and write e.g. \({\mathbf{M}}={\mathbf{M}}(\zeta )\), \(\rho =\rho (\zeta )\), etc.

Proof of Proposition 2.1

In this proof we will generalize the proof of Proposition 2.1 from [2] to our matrix setup. By taking the imaginary part of both sides of the MDE and using \({{\mathrm{Im}}}\, {\mathbf{M}}\geqslant {\mathbf{0}}\) and \({\mathbf{A}}={\mathbf{A}}^{\!*}\) we see that

$$\begin{aligned} -{{\mathrm{Im}}}\bigl [{{\mathbf{M}}(\zeta )^{-1}}\bigr ]\,=\, {\mathbf{M}}^*(\zeta )^{-1}{{\mathrm{Im}}}\, {\mathbf{M}}(\zeta ){\mathbf{M}}(\zeta )^{-1} \,\geqslant \, {{\mathrm{Im}}}\, \zeta \,{\mathbf{1}}. \end{aligned}$$

In particular, this implies the trivial bound on the solution to the MDE,

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{M}}(\zeta ) \Vert \,\leqslant \, \frac{1}{{{\mathrm{Im}}}\zeta }, \qquad \zeta \in \mathbb {H}. \end{aligned} \end{aligned}$$
(4.1)

Let \({\mathbf{w}}\in \mathbb {C}^{N}\) be normalized, \({\mathbf{w}}^*{\mathbf{w}}=1\). Since \({\mathbf{M}}(\zeta )\) has positive imaginary part, the analytic function \(\zeta \mapsto {\mathbf{w}}^*{\mathbf{M}}(\zeta ){\mathbf{w}}\) takes values in \(\mathbb {H}\). From the trivial upper bound (4.1) and the MDE itself, we infer the asymptotics \( \mathrm {i}\eta {\mathbf{w}}^*{\mathbf{M}}(\mathrm {i}\eta ){\mathbf{w}}\rightarrow -1 \) as \(\eta \rightarrow \infty \). By the characterization of Stieltjes transforms of probability measures on the complex upper half plane (cf. Theorem 3.5 in [31]), we infer

$$\begin{aligned} {\mathbf{w}}^*{\mathbf{M}}(\zeta ){\mathbf{w}}\,=\, \int \frac{{{v}}_{{\mathbf{w}}}(\mathrm {d}\tau )}{\tau -\zeta }, \end{aligned}$$

where \(v_{{\mathbf{w}}}\) is a probability measure on the real line. By polarization, we find the general representation (2.5).

We now show that \({{\mathrm{supp\,}}}{\mathbf{V}} \subseteq [-\kappa ,\kappa ]\), where \(\kappa =\Vert {\mathbf{A}} \Vert +2\Vert \mathcal {S} \Vert ^{1/2}\) [cf. (2.6)]. Note that A1 implies \(\Vert \mathcal {S} \Vert \lesssim 1\). Indeed, letting \((\cdot )_\pm \) denote the positive and negative parts, we find

$$\begin{aligned} \Vert \mathcal {S}[{\mathbf{R}}] \Vert \,&\leqslant \, P_1\bigl (\,\langle ({{\mathrm{Re\,}}}{\mathbf{R}})_+ \rangle {+}\langle ({{\mathrm{Re\,}}}{\mathbf{R}})_- \rangle {+}\langle ({{\mathrm{Im}}}\, {\mathbf{R}})_+ \rangle {+}\langle ({{\mathrm{Im}}}\, {\mathbf{R}})_- \rangle \,\bigr )\nonumber \\&\,\leqslant \, 2P_1\Vert {\mathbf{R}} \Vert _{\mathrm{hs}}, \end{aligned}$$
(4.2)

for any \( {\mathbf{R}} \in \mathbb {C}^{N\times N}\). Since \(\Vert {\mathbf{R}} \Vert _{\mathrm{hs}} \leqslant \Vert {\mathbf{R}} \Vert \) the bound \(\Vert \mathcal {S} \Vert \lesssim 1\) follows. The following argument will prove that \(\Vert {{\mathrm{Im}}}{\mathbf{M}}(\zeta ) \Vert \rightarrow 0\) as \({{\mathrm{Im}}}\, \zeta \downarrow 0\) locally uniformly for all \(\zeta \in \mathbb {H}\) with \( |\zeta |>\kappa \). This implies \({{\mathrm{supp\,}}}{\mathbf{V}} \subseteq [-\kappa ,\kappa ]\).

Let us fix \(\zeta \in \mathbb {H}\) with \( |\zeta |>\kappa \) and suppose that \(\Vert {\mathbf{M}} \Vert \) satisfies the upper bound

$$\begin{aligned} \Vert {\mathbf{M}} \Vert \,<\, \frac{|\zeta |-\Vert {\mathbf{A}} \Vert }{2 \Vert \mathcal {S} \Vert }. \end{aligned}$$
(4.3)

Then by taking the inverse and then the norm on both sides of (2.2) we conclude that

$$\begin{aligned} \Vert {\mathbf{M}} \Vert \,\leqslant \, \frac{1}{|\zeta |-\Vert {\mathbf{A}} \Vert -\Vert \mathcal {S} \Vert \Vert {\mathbf{M}} \Vert }\,\leqslant \, \frac{2}{|\zeta |-\Vert {\mathbf{A}} \Vert }. \end{aligned}$$
(4.4)

Therefore, (4.3) implies (4.4) and we see that there is a gap in the possible values of \(\Vert {\mathbf{M}} \Vert \), namely

$$\begin{aligned} \textstyle \Vert {\mathbf{M}}(\zeta ) \Vert \not \in \left( {\frac{2}{|\zeta |-\Vert {\mathbf{A}} \Vert },\frac{|\zeta |-\Vert {\mathbf{A}} \Vert }{2 \Vert \mathcal {S} \Vert }}\right) \qquad \text {for} \quad |\zeta |>\kappa . \end{aligned}$$

Since \(\zeta \mapsto \Vert {\mathbf{M}}(\zeta ) \Vert \) is a continuous function and for large \({{\mathrm{Im}}}\, \zeta \) the values of this function lie below the gap by the trivial bound (4.1), we infer

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{M}}(\zeta ) \Vert \,\leqslant \, \frac{2}{|\zeta |-\Vert {\mathbf{A}} \Vert }\qquad \text {for} \quad |\zeta |>\kappa . \end{aligned} \end{aligned}$$
(4.5)

Let us now take the imaginary part of the MDE and multiply it with \({\mathbf{M}}^*\) from the left and with \({\mathbf{M}}\) from the right,

$$\begin{aligned} \begin{aligned} {{\mathrm{Im}}}\, {\mathbf{M}}\,=\, ({{\mathrm{Im}}}\, \zeta )\, {\mathbf{M}}^*{\mathbf{M}} +{\mathbf{M}}^*\mathcal {S}[{{\mathrm{Im}}}\, {\mathbf{M}}]{\mathbf{M}}. \end{aligned} \end{aligned}$$
(4.6)

By taking the norm on both sides of (4.6), using a trivial estimate on the right hand side and rearranging the resulting terms, we get

$$\begin{aligned} \Vert {{\mathrm{Im}}}\, {\mathbf{M}} \Vert \,\leqslant \,\frac{{ {{\mathrm{Im}}}\, \zeta }\Vert {\mathbf{M}} \Vert ^2}{1-\Vert {\mathbf{M}} \Vert ^2\Vert \mathcal {S} \Vert }. \end{aligned}$$
(4.7)

Here we used \(\Vert {\mathbf{M}} \Vert ^2\Vert \mathcal {S} \Vert <1\), which is satisfied by (4.5) for \( |\zeta |>\kappa \). We may estimate the right hand side of (4.7) further by applying (4.5). Thus we find

$$\begin{aligned} \Vert {{\mathrm{Im}}}\, {\mathbf{M}} \Vert \,\leqslant \,\frac{4{ {{\mathrm{Im}}}\, \zeta }}{(|\zeta |-\Vert {\mathbf{A}} \Vert )^2-4\Vert \mathcal {S} \Vert } \,=\, \frac{4{ {{\mathrm{Im}}}\, \zeta }}{(|\zeta |-\kappa + 2 \Vert \mathcal {S} \Vert ^{1/2})^2-4\Vert \mathcal {S} \Vert }. \end{aligned}$$
(4.8)

The right hand side of (4.8) converges to zero locally uniformly for all \(\zeta \in \mathbb {H}\) with \( |\zeta |>\kappa \) as \({{\mathrm{Im}}}\, \zeta \downarrow 0\). This finishes the proof of Proposition 2.1. \(\square \)

The following proposition lists bounds on \( {\mathbf{M}} \) that, besides the ones stated in Sect. 2.1, constitute the only properties of \( {\mathbf{M}}\) that we need outside this section.

Proposition 4.2

(Properties of the solution) Assume A1 and that \(\Vert {\mathbf{A}} \Vert \leqslant P_0\) for some constant \(P_0>0\). Then uniformly for all spectral parameters \(\zeta \in \mathbb {H}\) the following bounds hold:

  1. (i)

    The solution is bounded in the spectral norm,

    $$\begin{aligned} \begin{aligned} \Vert {\mathbf{M}}(\zeta ) \Vert \,\lesssim \, \frac{1}{\rho (\zeta )+{{\mathrm{dist}}}(\zeta , {{\mathrm{supp\,}}}\rho )}. \end{aligned} \end{aligned}$$
    (4.9)
  2. (ii)

    The inverse of the solution is bounded in the spectral norm,

    $$\begin{aligned} \begin{aligned} \Vert {\mathbf{M}}(\zeta )^{-1} \Vert \,\lesssim \,1+|\zeta |. \end{aligned} \end{aligned}$$
    (4.10)
  3. (iii)

    The imaginary part of \({\mathbf{M}}\) is comparable to the harmonic extension of the self-consistent density of states,

    $$\begin{aligned} \begin{aligned} \rho (\zeta ){\mathbf{1}}\,\lesssim \, {{\mathrm{Im}}}{\mathbf{M}}(\zeta )\,\lesssim \, (1+|\zeta |^2)\Vert {\mathbf{M}}(\zeta ) \Vert ^2\rho (\zeta ){\mathbf{1}}. \end{aligned} \end{aligned}$$
    (4.11)

Proof

The inequalities (4.9) and (4.10) provide upper and lower bounds on the singular values of the solution, respectively. Before proving these bounds we show that \({\mathbf{M}}\) has a bounded normalized Hilbert–Schmidt norm,

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{M}}(\zeta ) \Vert _{\mathrm{hs}}\,\lesssim \, 1, \qquad \forall \zeta \in \mathbb {H}. \end{aligned} \end{aligned}$$
(4.12)

For this purpose we take the imaginary part of (2.2) [cf. (4.6)] and find \( {{\mathrm{Im}}}\, {\mathbf{M}} \geqslant {\mathbf{M}}^*\mathcal {S}[{{\mathrm{Im}}}\, {\mathbf{M}}]{\mathbf{M}} \), where \( {\mathbf{M}} = {\mathbf{M}}(\zeta ) \). The lower bound on \(\mathcal {S}\) from (2.7) implies

$$\begin{aligned} \begin{aligned} {{\mathrm{Im}}}\, {\mathbf{M}} \,\gtrsim \, \rho \,{\mathbf{M}}^*{\mathbf{M}}, \end{aligned} \end{aligned}$$
(4.13)

where we used the definition of \(\rho \) in (2.10). Taking the normalized trace on both sides of (4.13) shows (4.12).

Proof of (ii) Taking the norm on both sides of (2.2) yields

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{M}}^{-1} \Vert \,\leqslant \, |\zeta |+\Vert {\mathbf{A}} \Vert +\Vert \mathcal {S} \Vert _{\mathrm{hs} \rightarrow \Vert \cdot \Vert }\Vert {\mathbf{M}} \Vert _{\mathrm{hs}} \,\lesssim \, 1+|\zeta |, \end{aligned} \end{aligned}$$
(4.14)

where \(\Vert \mathcal {S} \Vert _{\mathrm{hs} \rightarrow \Vert \cdot \Vert }\) denotes the norm of \(\mathcal {S}\) from \(\mathbb {C}^{N\times N}\) equipped with the norm \(\Vert \cdot \Vert _{\mathrm{hs}}\) to \(\mathbb {C}^{N\times N}\) equipped with \(\Vert \cdot \Vert \). For the last inequality in (4.14) we used (4.12) and that by A1 we have \(\Vert \mathcal {S} \Vert _{\mathrm{hs} \rightarrow \Vert \cdot \Vert }\lesssim 1\) (cf. (4.2)).

Proof of (iii) First we treat the simple case of large spectral parameters, \(|\zeta |\geqslant 1+\kappa \), where \(\kappa \) was defined in (2.6). Recall that the matrix valued measure \({\mathbf{V}}(\mathrm {d}\tau )\) (cf. (2.5)) is supported in \([-\kappa ,\kappa ]\) by Proposition 2.1. The normalization, \({\mathbf{V}}(\mathbb {R})={\mathbf{1}}\) implies that for any vector \({\mathbf{u}}\in \mathbb {C}^N\) with \(\Vert {\mathbf{u}} \Vert =1\) the function \(\zeta \mapsto \frac{1}{\pi } {{\mathrm{Im}}}[{\mathbf{u}}^*{\mathbf{M}}(\zeta ){\mathbf{u}}]\) is the harmonic extension of a probability measure with support in \([-\kappa ,\kappa ]\), hence it behaves as \(-\zeta ^{-1}\) for large \(|\zeta |\). We conclude that \( {{\mathrm{Im}}}{\mathbf{M}}(\zeta )\sim \rho (\zeta )\sim |\zeta |^{-2}{{\mathrm{Im}}}\, \zeta \), for \(|\zeta |\geqslant 1+\kappa \). Since for these \(\zeta \) we also have \(\Vert {\mathbf{M}}(\zeta ) \Vert \sim |\zeta |^{-1}\) by the Stieltjes transform representation (2.5) we conclude that (4.11) holds in this regime.

Now we consider \(\zeta \in \mathbb {H}\) with \(|\zeta |\leqslant 1+\kappa \). We start with the lower bound on \({{\mathrm{Im}}}\, {\mathbf{M}}\). From (4.13) we see that \( {{\mathrm{Im}}}\, {\mathbf{M}} \,\gtrsim \, \rho \,\Vert {\mathbf{M}}^{-1} \Vert ^{-2}{\mathbf{1}} \), and since \(\Vert {\mathbf{M}}^{-1} \Vert \lesssim 1\) by (ii), the lower bound in (4.11) is proven.

For the upper bound, taking the imaginary part of the MDE [cf. (4.6)] and using A1 and that \({{\mathrm{Im}}}{\mathbf{M}} \gtrsim {{\mathrm{Im}}}\, \zeta {\mathbf{1}}\) by the Stieltjes transform representation (2.5), we get

$$\begin{aligned} {{\mathrm{Im}}}\, {\mathbf{M}} \,=\, {{\mathrm{Im}}}\, \zeta \, {\mathbf{M}}^*{\mathbf{M}} + {\mathbf{M}}^*S[{{\mathrm{Im}}}\, {\mathbf{M}}] {\mathbf{M}} \,\lesssim \, ({{\mathrm{Im}}}\, \zeta +\rho ){\mathbf{M}}^*{\mathbf{M}} \,\lesssim \, \rho \,\Vert {\mathbf{M}} \Vert ^2{\mathbf{1}}. \end{aligned}$$

Proof of (i) In the regime \(|\zeta |\geqslant 1+\kappa \) the bound (4.9) follows from the Stieltjes transform representation (2.5). Thus we consider \(|\zeta |\leqslant 1+\kappa \). We take the imaginary part on both sides of (2.2) and use the lower bound in (4.11) and \(\mathcal {S}[{\mathbf{1}}]\gtrsim {\mathbf{1}}\) to get

$$\begin{aligned} -{{\mathrm{Im}}}\, {\mathbf{M}}(\zeta )^{-1}\,\geqslant \, \mathcal {S}[{{\mathrm{Im}}}\, {\mathbf{M}}(\zeta )]\,\gtrsim \,\rho (\zeta ){\mathbf{1}}. \end{aligned}$$

Since in general \({{\mathrm{Im}}}\, {\mathbf{R}}^{-1}\geqslant {\mathbf{1}}\) implies \(\Vert {\mathbf{R}} \Vert \leqslant 1\) for any \({\mathbf{R}} \in \mathbb {C}^{N \times N}\), we infer that \(\Vert {\mathbf{M}}(\zeta ) \Vert \lesssim \rho (\zeta )^{-1}\). On the other hand, \( \Vert {\mathbf{M}}(\zeta ) \Vert \lesssim {{\mathrm{dist}}}(\zeta , {{\mathrm{supp\,}}}\rho )^{-1} \) follows from (2.5) again. \(\square \)

In order to show the fast decay of off-diagonal entries of \( {\mathbf{M}} \), Theorem 2.5, we rely on the following general result on matrices with decaying off-diagonal entries.

Lemma 4.3

(Perturbed Combes–Thomas estimate) Let \({\mathbf{R}}\in \mathbb {C}^{N \times N}\) be such that

$$\begin{aligned} |r_{xy} |\,\leqslant \, \frac{\beta (\nu )}{(1+d(x,y))^{\nu }}+\frac{\beta (0)}{N} ,\qquad \forall x,y \in \mathbb {X},\;\forall {\nu } \in \mathbb {N}, \end{aligned}$$

with some positive sequence \(\underline{\beta } \!\,=(\beta ({\nu }))_{{\nu }=0}^\infty \), and \(\Vert {\mathbf{R}}^{-1} \Vert \leqslant 1\).

Then there exists a sequence \(\underline{\alpha } \!\,=(\alpha (\nu ))_{\nu =0}^\infty \), depending only on \(\underline{\beta } \!\,\) and P (cf. (2.11)), such that

$$\begin{aligned} \begin{aligned} |({\mathbf{R}}^{-1})_{xy} |\,\leqslant \, \frac{\alpha (\nu )}{(1+d(x,y))^{\nu }}+\frac{\alpha (0)}{N} ,\qquad \forall x,y \in \mathbb {X},\;\forall {\nu } \in \mathbb {N}. \end{aligned} \end{aligned}$$
(4.15)

This lemma is reminiscent of a standard Combes–Thomas estimate: an off-diagonal decay of the entries of a matrix \({\mathbf{R}}\) implies a similar decay for its inverse, \({\mathbf{R}}^{-1}\), provided the smallest singular value is bounded away from zero. Indeed, in the case of \(\alpha (0)=\beta (0)=0\) the proof of this lemma directly follows from the standard strategy for establishing Combes–Thomas estimates, see e.g. Proposition 13.3.1. in [45]; we omit the details. We now explain how to extend this standard result to our case, where Lemma 4.3 allows for a nondecaying component. The detailed proof will be given in the “Appendix”, here we only present the basic idea.

Write \({\mathbf{R}}={\mathbf{S}}+{\mathbf{T}}\), where \({\mathbf{S}}\) has a fast off-diagonal decay and \({\mathbf{T}}\) has entries of size \(|t_{xy}|\lesssim N^{-1}\). Note that \({\mathbf{T}}\) cannot simply be considered as a small perturbation since its norm can be of order one, i.e. comparable with that of \({\mathbf{S}}\). Instead, the proof relies on showing that \({\mathbf{S}}\) inherits the lower bound on its singular values from \({\mathbf{R}}\) and then applying the standard \(\alpha (0)=\beta (0)=0\) version of the Combes–Thomas estimate to \({\mathbf{S}}\) to generate the decaying component of \({\mathbf{R}}^{-1}\). The point is that \({\mathbf{T}}\) can potentially change only finitely many singular values by a significant amount since \(\Vert {\mathbf{T}} \Vert _{\mathrm{max}} \lesssim N^{-1}\). If these few singular values were close to zero, then they would necessarily be isolated, hence the corresponding singular vectors would be strongly localized. However, because of its small entries, \({\mathbf{T}}\) acts trivially on localized vectors which implies that isolated singular values are essentially stable under adding or subtracting \({\mathbf{T}}\). This argument excludes the creation of singular values close to zero by subtracting \({\mathbf{T}}\) from \({\mathbf{R}}\). The details are found in the “Appendix”. Putting all these ingredients together, we can now complete the proof of Theorem 2.5.

Proof of Theorem 2.5

Recall the model parameters \(\underline{\pi } \!\,_1,\underline{\pi } \!\,_2\) from A2. We consider the MDE (2.2) entrywise and see that

$$\begin{aligned} |({\mathbf{M}}^{-1})_{xy} | \;&\leqslant \; |\zeta |\delta _{xy}\,+ \frac{\pi _1({\nu })+\pi _2(\nu )\Vert {\mathbf{M}} \Vert }{(1+d(x,y))^{\nu }} \,+\, \frac{\pi _1(0)+\pi _2(0)\Vert {\mathbf{M}} \Vert }{N}, \end{aligned}$$

where we used the assumptions (2.12) and (2.13), as well as \(\Vert {\mathbf{M}} \Vert _{\mathrm{max}} \leqslant \Vert {\mathbf{M}} \Vert \). By (4.9) and \(\zeta \in \mathbb {D}_\delta \), we have \( \Vert {\mathbf{M}} \Vert \lesssim \delta ^{-1}\). Furthermore, for large \(|\zeta |\) we also have \(\Vert {\mathbf{M}}(\zeta ) \Vert \lesssim |\zeta |^{-1}\). We can now apply Lemma 4.3 with the choice \({\mathbf{R}}:=\Vert {\mathbf{M}} \Vert {\mathbf{M}}^{-1}\) to see the existence of a positive sequence \(\underline{\gamma } \!\,\) such that (2.15) holds. This finishes the proof of Theorem 2.5. \(\square \)

4.2 Stability of the matrix Dyson equation

The goal of this section is to prove Proposition 2.2 and Theorem 2.6. The main technical result, which is needed for these proofs, is the linear stability of the MDE. For its statement we introduce for any \({\mathbf{R}} \in \mathbb {C}^{N \times N}\) the sandwiching operator\(\mathcal {C}_{{\mathbf{R}}}: \mathbb {C}^{N \times N}\rightarrow \mathbb {C}^{N \times N}\) by

$$\begin{aligned} \begin{aligned} \mathcal {C}_{{\mathbf{R}}}[{\mathbf{T}}]\,:=\, {\mathbf{R}}{\mathbf{T}}{\mathbf{R}}. \end{aligned} \end{aligned}$$
(4.16)

Note that \(\mathcal {C}_{{\mathbf{R}}}^{-1}=\mathcal {C}_{{\mathbf{R}}^{-1}}\) and \(\mathcal {C}_{{\mathbf{R}}}^*=\mathcal {C}_{{\mathbf{R}}^*}\) for any \({\mathbf{R}} \in \mathbb {C}^{N \times N}\), where \(\mathcal {C}_{{\mathbf{R}}}^*\) denotes the adjoint with respect to the scalar product (2.1).

Proposition 4.4

(Linear stability) Assume A1 and \(\Vert {\mathbf{A}} \Vert \leqslant P_0\) for some constant \(P_0>0\) (cf. (2.8)). There exists a universal numerical constant \(C>0\) such that uniformly for all \(\zeta \in \mathbb {H}\):

$$\begin{aligned} \begin{aligned} \Vert (\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}(\zeta )}\mathcal {S})^{-1} \Vert _{\mathrm{sp}} \,\lesssim \; 1\,+\frac{1}{(\rho (\zeta )+{{\mathrm{dist}}}(\zeta ,{{\mathrm{supp\,}}}\rho ))^C} . \end{aligned} \end{aligned}$$
(4.17)

Before we show a few technical results that prepare the proof of Proposition 4.4, we give a heuristic argument that explains how the operator \(\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S}\) on the left hand side of (4.17) is connected to the stability of the MDE (2.2), written in the form \(-{\mathbf{1}} = ( \zeta {\mathbf{1}}-{\mathbf{A}}+ \mathcal {S}[{\mathbf{M}}]){\mathbf{M}}\), with respect to perturbations. Suppose that the perturbed MDE

$$\begin{aligned} \begin{aligned} -\,{\mathbf{1}} \,=\, (\zeta {\mathbf{1}}-{\mathbf{A}}+ \mathcal {S}[\varvec{\mathfrak {G}}({\mathbf{D}})])\varvec{\mathfrak {G}}({\mathbf{D}}) + {\mathbf{D}}, \end{aligned} \end{aligned}$$
(4.18)

with perturbation matrix \({\mathbf{D}}\) has a unique solution \(\varvec{\mathfrak {G}}({\mathbf{D}})\), depending differentiably on \({\mathbf{D}}\). Then by differentiating on both sides of (4.18) with respect to \({\mathbf{D}}\), setting \({\mathbf{D}}={\mathbf{0}}\) and using the MDE for \({\mathbf{M}}(\zeta )=\varvec{\mathfrak {G}}({\mathbf{0}})\), we find

$$\begin{aligned} \begin{aligned} {\mathbf{0}}\,=\,-\,{\mathbf{M}}(\zeta )^{-1} \nabla _{{\mathbf{R}}}\varvec{\mathfrak {G}}({\mathbf{0}}) +\mathcal {S}[\nabla _{{\mathbf{R}}}\varvec{\mathfrak {G}}({\mathbf{0}})]{\mathbf{M}}(\zeta )+ {\mathbf{R}}, \end{aligned} \end{aligned}$$
(4.19)

where \( \nabla _{{\mathbf{R}}}\) denotes the directional derivative with respect to \({\mathbf{D}}\) in the direction \({\mathbf{R}} \in \mathbb {C}^{N \times N}\). Rearranging the terms in (4.19) and multiplying with \({\mathbf{M}}={\mathbf{M}}(\zeta )\) from the left yields

$$\begin{aligned} \begin{aligned} (\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})\nabla _{{\mathbf{R}}}\varvec{\mathfrak {G}}({\mathbf{0}})\,=\,{\mathbf{M}}{\mathbf{R}}. \end{aligned} \end{aligned}$$
(4.20)

Thus \(\varvec{\mathfrak {G}}({\mathbf{D}})\) has a bounded derivative at \({\mathbf{D}}={\mathbf{0}}\), i.e., the MDE is stable with respect to the perturbation \({\mathbf{D}}\) to linear order, whenever the operator \(\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S}\) is invertible and its inverse is bounded. In order to extend the linear stability to the full stability of the MDE for non-infinitesimal perturbations, the linear stability bound (4.17) is fed as an input into a quantitative implicit function theorem [cf. (b) of Lemma 4.10 and (4.55) below]. The implicit function theorem then yields the existence of the analytic map \( {\mathbf{D}} \mapsto \varvec{\mathfrak {G}}({\mathbf{D}}) \) appearing in Theorem 2.6.

The following definition will play a crucial role in the upcoming analysis.

Definition 4.5

(Saturated self-energy operator) Let \({\mathbf{M}}={\mathbf{M}}(\zeta )\) be the solution of the MDE at some spectral parameter \(\zeta \in \mathbb {H}\). We define the linear operator \(\mathcal {F}=\mathcal {F}(\zeta ): \mathbb {C}^{N \times N} \rightarrow \mathbb {C}^{N \times N}\) by

$$\begin{aligned} \begin{aligned} \mathcal {F}\,:=\, \mathcal {C}_{{\mathbf{W}}}\mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}\,\mathcal {S}\,\mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}\,\mathcal {C}_{{\mathbf{W}}} , \end{aligned} \end{aligned}$$
(4.21a)

where we have introduced an auxiliary matrix

$$\begin{aligned} \begin{aligned} {\mathbf{W}} \,:=\, \left( {\,{\mathbf{1}}+({\mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}^{-1}[{{\mathrm{Re\,}}}{\mathbf{M}}]})^{2}}\right) ^{1/4}. \end{aligned} \end{aligned}$$
(4.21b)

We call \(\mathcal {F}\) the saturated self-energy operator or the saturation of \(\mathcal {S}\) for short.

The operator \(\mathcal {F}\) inherits the self-adjointness with respect to (2.1) and the property of mapping \(\overline{\mathscr {C}}_+\) to itself from the self-energy operator \(\mathcal S\). We will now briefly discuss the reason for introducing \(\mathcal {F}\). In order to invert \(\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S}\) in (4.20) we have to show that \(\mathcal {C}_{{\mathbf{M}}}\mathcal {S}\) is dominated by \(\mathrm{Id}\) in some sense. Neither \(\mathcal {S}\) nor \({\mathbf{M}}\) can be directly related to the identity operator, but their specific combination \(\mathcal {C}_{{\mathbf{M}}}\mathcal {S}\) can. We extract this delicate information from the MDE via a Perron–Frobenius argument. The key observation is that as \({{\mathrm{Im}}}\, \zeta \downarrow 0\) the imaginary part of the MDE (4.6) becomes an eigenvalue equation for the operator \({\mathbf{R}} \mapsto {\mathbf{M}}^*\mathcal {S}[{\mathbf{R}}]{\mathbf{M}}\) with eigenvalue 1 and corresponding eigenmatrix \({{\mathrm{Im}}}\, {\mathbf{M}}\). Since this operator is positivity preserving and \({{\mathrm{Im}}}\, {\mathbf{M}}\in {\mathscr {C}}_+\), its spectral radius is 1. Naively speaking, through the replacement of \({\mathbf{M}}^*\) by \({\mathbf{M}}\), the operator \(\mathcal {C}_{{\mathbf{M}}}\mathcal {S}\) gains an additional phase which reduces the spectral radius further and thus guarantees the invertibility of \(\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S}\). However, the non-selfadjointness of the aforementioned operators makes it hard to turn control on their spectral radii into norm-estimates. It is therefore essential to find an appropriate symmetrization of these operators before Perron–Frobenius is applied. A similar problem appeared in a simpler commutative setting in [2]. There, \({\mathbf{M}}=\mathrm{diag}({\mathbf{m}})\) was a diagonal matrix and the MDE became a vector equation. In this case the problem of inverting \(\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S}\) reduces to inverting a matrix \({\mathbf{1}}-\mathrm{diag}({\mathbf{m}})^2{\mathbf{S}}\), where \({\mathbf{S}} \in \mathbb {R}^{N \times N}\) is a matrix with non-negative entries that plays the role of the self-energy operator \(\mathcal {S}\) in the current setup. The idea in [2] was to write

$$\begin{aligned} \begin{aligned} {\mathbf{1}}-\mathrm{diag}({\mathbf{m}})^2{\mathbf{S}}\,=\,{\mathbf{R}}({\mathbf{U}}-{\mathbf{F}}){\mathbf{T}}, \end{aligned} \end{aligned}$$
(4.22)

with invertible diagonal matrices \({\mathbf{R}}\) and \({\mathbf{T}}\), a diagonal unitary matrix \({\mathbf{U}}\) and a self-adjoint matrix \({\mathbf{F}}\), playing the role of the operator \( \mathcal {F} \), with positive entries that satisfies the bound \(\Vert {\mathbf{F}} \Vert \leqslant 1\). It is then possible to see that \({\mathbf{U}}-{\mathbf{F}}\) is invertible as long as \({\mathbf{U}}\) does not leave the Perron–Frobenius eigenvector of \({\mathbf{F}}\) invariant. In this commutative setting it is possible to choose \({\mathbf{F}}=\mathrm{diag}(|{\mathbf{m}} |){\mathbf{S}}\mathrm{diag}(|{\mathbf{m}} |)\), where the absolute value is taken in each component. In our current setting we will achieve a decomposition similar to (4.22) on the level of operators acting on \(\mathbb {C}^{N \times N}\) [cf. (4.39) below]. The definition (4.21) ensures that the saturation \(\mathcal {F}\) is self-adjoint, positivity-preserving and satisfies \(\Vert \mathcal {F} \Vert \leqslant 1\), as we will establish later.

Lemma 4.6

(Bounds on \({\mathbf{W}}\)) Assume A1 and \(\Vert {\mathbf{A}} \Vert \leqslant P_0\) for some constant \(P_0>0\). Then uniformly for all spectral parameters \(\zeta \in \mathbb {H}\) with \(|\zeta |\leqslant 3(1+\kappa )\) the matrix \({\mathbf{W}}={\mathbf{W}}(\zeta ) \in {\mathscr {C}}_+\), defined in (4.21b), fulfills the bounds

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{M}} \Vert ^{-1}{\mathbf{1}}\,\lesssim \, \rho ^{1/2}{\mathbf{W}}\,\lesssim \, \Vert {\mathbf{M}} \Vert ^{1/2}{\mathbf{1}}. \end{aligned} \end{aligned}$$
(4.23)

Proof

We write \({\mathbf{W}}^{4}\) in a form that follows immediately from its definition (4.21b),

$$\begin{aligned} {\mathbf{W}}^4\,=\, \mathcal {C}_{\sqrt{{{\mathrm{Im}}}{\mathbf{M}}}}^{-1}(\mathcal {C}_{{{\mathrm{Im}}}\, {\mathbf{M}}}+\mathcal {C}_{{{\mathrm{Re\,}}}{\mathbf{M}}})[({{\mathrm{Im}}}\,{\mathbf{M}})^{-1}]. \end{aligned}$$

We estimate \(({{\mathrm{Im}}}\,{\mathbf{M}})^{-1}\) from above and below by employing (4.11) in the regime \(|\zeta |\lesssim 1\),

$$\begin{aligned} \frac{1}{\rho \Vert {\mathbf{M}} \Vert ^2}\,\mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}^{\,-1}[{\mathbf{M}}^*{\mathbf{M}}+{\mathbf{M}}{\mathbf{M}}^*] \,\lesssim \, {\mathbf{W}}^{4}\,\lesssim \, \frac{1}{\rho }\,\mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}^{\,-1}[{\mathbf{M}}^*{\mathbf{M}}+{\mathbf{M}}{\mathbf{M}}^*]. \end{aligned}$$

Using the trivial bounds \(2\Vert {\mathbf{M}}^{-1} \Vert ^{-2}{\mathbf{1}}\leqslant {\mathbf{M}}^*{\mathbf{M}}+{\mathbf{M}}{\mathbf{M}}^* \leqslant 2\Vert {\mathbf{M}} \Vert ^2{\mathbf{1}}\) and \(\Vert {\mathbf{M}}^{-1} \Vert \lesssim 1\) from (4.10) as well as (4.11) again, we find \( \Vert {\mathbf{M}} \Vert ^{-4}\rho ^{-2}\lesssim {\mathbf{W}}^4\lesssim \Vert {\mathbf{M}} \Vert ^2\rho ^{-2} \). This is equivalent to (4.23). \(\square \)

Lemma 4.7

(Spectrum of \(\mathcal {F}\)) Assume A1 and \(\Vert {\mathbf{A}} \Vert \leqslant P_0\) for some constant \(P_0>0\). Then the saturated self-energy operator \(\mathcal {F}=\mathcal {F}(\zeta )\), defined in (4.21), has a unique normalized, \(\Vert {\mathbf{F}} \Vert _{\mathrm{hs}}=1\), eigenmatrix \({\mathbf{F}}={\mathbf{F}}(\zeta ) \in \mathscr {C}_+\), corresponding to its largest eigenvalue, \( \mathcal {F}[{\mathbf{F}}] = \Vert \mathcal {F} \Vert _{\mathrm{sp}}{\mathbf{F}}\). Furthermore, the following properties hold uniformly for all spectral parameters \(\zeta \in \mathbb {H}\) such that \(|\zeta |\leqslant 3(1+\kappa )\) and \(\Vert \mathcal {F}(\zeta ) \Vert _{\mathrm{sp}}\geqslant 1/2\).

  1. (i)

    The spectral radius of \(\mathcal {F}\) is given by

    $$\begin{aligned} \begin{aligned} \Vert \mathcal {F} \Vert _{\mathrm{sp}} \,=\, 1\,- \frac{\langle {{\mathbf{F}}} , {\mathcal {C}_{{\mathbf{W}}}[{{\mathrm{Im}}}\,{\mathbf{M}}]}\rangle }{\langle {\mathbf{F}},\!{\mathbf{W}}^{-2} \rangle } {{\mathrm{Im}}}\, \zeta . \end{aligned} \end{aligned}$$
    (4.24)
  2. (ii)

    The eigenmatrix \({\mathbf{F}}\) is controlled by the solution of the MDE:

    $$\begin{aligned} \begin{aligned} \Vert {\mathbf{M}} \Vert ^{-7}{\mathbf{1}}\,\lesssim \,{\mathbf{F}}\,\lesssim \,\Vert {\mathbf{M}} \Vert ^{6}{\mathbf{1}}. \end{aligned} \end{aligned}$$
    (4.25)
  3. (iii)

    The operator \(\mathcal {F}\) has the uniform spectral gap \( \vartheta \gtrsim \Vert {\mathbf{M}} \Vert ^{-42}\), i.e.,

    $$\begin{aligned} \begin{aligned} {{\mathrm{Spec}}}\bigl (\mathcal {F}/\Vert \mathcal {F} \Vert _{\mathrm{sp}}\bigr )\,\subseteq \, [-1+\vartheta , 1-\vartheta ]\cup \{1\}. \end{aligned} \end{aligned}$$
    (4.26)

Proof

Since \(\mathcal {F}\) preserves the cone \(\overline{\mathscr {C}}_+\) of positive semidefinite matrices, a version of the Perron–Frobenius theorem for cone preserving operators implies that there exists a normalized \({\mathbf{F}} \in \overline{\mathscr {C}}_+\) such that \( \mathcal {F}[{\mathbf{F}}] = \Vert \mathcal {F} \Vert _{\mathrm{sp}}{\mathbf{F}}\). We will show uniqueness of this eigenmatrix later in the proof. First we will prove that (4.24) holds for any such \({\mathbf{F}}\).

Proof of (i) We define for any matrix \({\mathbf{R}} \in \mathbb {C}^{N \times N}\) the operator \(\mathcal {K}_{{\mathbf{R}}}:\mathbb {C}^{N \times N} \rightarrow \mathbb {C}^{N \times N}\) via

$$\begin{aligned} \begin{aligned} \mathcal {K}_{{\mathbf{R}}}[{\mathbf{T}}]\,:=\, {\mathbf{R}}^{*}{\mathbf{T}}{\mathbf{R}} . \end{aligned} \end{aligned}$$
(4.27)

Note that for self-adjoint \({\mathbf{R}}\in \mathbb {C}^{N\times N}\) we have \(\mathcal {K}_{{\mathbf{R}}}=\mathcal {C}_{{\mathbf{R}}}\) [cf. (4.16)]. Using definition (4.27), the imaginary part of the MDE (4.6) can be written in the form

$$\begin{aligned} \begin{aligned} {{\mathrm{Im}}}\, {\mathbf{M}}\,=\, ({{\mathrm{Im}}}\, \zeta ) \mathcal {K}_{{\mathbf{M}}}[{\mathbf{1}}]+\mathcal {K}_{{\mathbf{M}}}\mathcal {S}[{{\mathrm{Im}}}\, {\mathbf{M}}]. \end{aligned} \end{aligned}$$
(4.28)

We will now write up the Eq. (4.28) in terms of \({{\mathrm{Im}}}\, {\mathbf{M}}\), \(\mathcal {F}\) and \({\mathbf{W}}\). In order to express \({\mathbf{M}}\) in terms of \({\mathbf{W}}\), we introduce the unitary matrix

$$\begin{aligned} \begin{aligned} {\mathbf{U}} \,:=\, \frac{ \mathcal {C}_{\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}^{-1}[{{\mathrm{Re\,}}}{\mathbf{M}}]-\mathrm {i}{\mathbf{1}} }{ \left|\mathcal {C}_{\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}^{-1}[{{\mathrm{Re\,}}}{\mathbf{M}}]-\mathrm {i}{\mathbf{1}} \right|}, \end{aligned} \end{aligned}$$
(4.29)

via the spectral calculus of the self-adjoint matrix \(\mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}^{-1}[{{\mathrm{Re\,}}}{\mathbf{M}}]\). With (4.29) and the definition of \({\mathbf{W}}\) from (4.21b) we may write \({\mathbf{M}}\) as

$$\begin{aligned} \begin{aligned} {\mathbf{M}}\,=\,\mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}\mathcal {C}_{{\mathbf{W}}}[{\mathbf{U}}^*]. \end{aligned} \end{aligned}$$
(4.30)

Here, the matrices \({\mathbf{W}}\) and \({\mathbf{U}}\) commute. The identity (4.30) should be viewed as a balanced polar decomposition. Instead of having unitary matrices \(\mathbf {U}_1\) or \(\mathbf {U}_2\) on the left or right of the decompositions \(\mathbf {M} = \mathbf {U}_{1}\mathbf {Q}_{1}\) or \(\mathbf {M} = \mathbf {Q}_{2}\mathbf {U}_{2}\), respectively, the unitary matrix \(\mathbf {U}^{*}\) appears in the middle of \(\mathbf {M} = \mathbf {Q}^{*}\mathbf {U}^{*}\mathbf {Q}\) with \(\mathbf {Q} = \mathbf {W}\sqrt{{{{\mathrm{Im}}}}\,\mathbf {M}}\). Using (4.30) we also find an expression for \(\mathcal {K}_{{\mathbf{M}}}\), namely

$$\begin{aligned} \begin{aligned} \mathcal {K}_{{\mathbf{M}}}\,=\,\mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}\,\mathcal {C}_{{\mathbf{W}}}\mathcal {K}_{{\mathbf{U}}^*}\mathcal {C}_{{\mathbf{W}}}\mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}. \end{aligned} \end{aligned}$$
(4.31)

Plugging (4.31) into (4.28) and applying the inverse of \(\mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}\mathcal {C}_{{\mathbf{W}}}\mathcal {K}_{{\mathbf{U}}^*}\) on both sides, yields

$$\begin{aligned} \begin{aligned} {\mathbf{W}}^{-2}\,=\, \mathcal {C}_{{\mathbf{W}}}[{{\mathrm{Im}}}\,{\mathbf{M}}] {{\mathrm{Im}}}\, \zeta +\mathcal {F}[{\mathbf{W}}^{-2}] , \end{aligned} \end{aligned}$$
(4.32)

where we used the definition of \(\mathcal {F}\) from (4.21) and \(\mathcal {K}_{{\mathbf{U}}^*}^{-1}[{\mathbf{W}}^{-2}]={\mathbf{W}}^{-2}\), which holds because \({\mathbf{U}}\) and \({\mathbf{W}}\) commute. We project both sides of (4.32) onto the eigenmatrix \(\mathbf {F}\) of \(\mathcal {F}\). Since \(\mathcal {F}\) is self-adjoint with respect to the scalar product (2.1) and by \( \mathcal {F}[{\mathbf{F}}] = \Vert \mathcal {F} \Vert _{\mathrm{sp}}{\mathbf{F}}\) we get

$$\begin{aligned} \langle {{\mathbf{F}}} , {\!{\mathbf{W}}^{-2}}\rangle \,=\, \langle {{\mathbf{F}}} , {\mathcal {C}_{{\mathbf{W}}}[{{\mathrm{Im}}}\,{\mathbf{M}}]}\rangle {{\mathrm{Im}}}\, \zeta + \Vert \mathcal {F} \Vert _{\mathrm{sp}}\langle {{\mathbf{F}}} , {\!{\mathbf{W}}^{-2}}\rangle . \end{aligned}$$

Solving this identity for \(\Vert \mathcal {F} \Vert _{\mathrm{sp}}\) yields (4.24).

Proof of (ii) and (iii) Let \(\zeta \in \mathbb {H}\) with \(|\zeta |\leqslant 3(1+\kappa )\) and \(\Vert \mathcal {F}(\zeta ) \Vert _{\mathrm{sp}}\geqslant 1/2\). The bounds on the eigenmatrix (4.25) and on the spectral gap (4.26) are a consequence of the estimate

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{M}} \Vert ^{-4} \langle {\mathbf{R}} \rangle {\mathbf{1}} \,\lesssim \,\mathcal {F}[{\mathbf{R}}] \,\lesssim \, \Vert {\mathbf{M}} \Vert ^{6} \langle {\mathbf{R}} \rangle {\mathbf{1}} ,\qquad \forall {\mathbf{R}} \in \overline{\mathscr {C}}_+ . \end{aligned} \end{aligned}$$
(4.33)

We verify (4.33) below. Given (4.33), the remaining assertions, (4.25) and (4.26), of Lemma 4.7, are consequences of the following general result that is proven in the “Appendix”. It generalises to a non-commutative setting the basic fact (cf. Lemma A.1) that symmetric matrices with strictly positive entries have a positive spectral gap. The proof of Lemma 4.8 is given in the “Appendix”.\(\square \)

Lemma 4.8

(Spectral gap) Let \(\mathcal {T}: \mathbb {C}^{N \times N} \rightarrow \mathbb {C}^{N \times N}\) be a linear self-adjoint operator preserving the cone \(\overline{\mathscr {C}}_+\) of positive semidefinite matrices. Suppose \(\mathcal {T}\) is normalized, \(\Vert \mathcal {T} \Vert _{\mathrm{sp}}=1\), and

$$\begin{aligned} \begin{aligned} \gamma \,\langle {\mathbf{R}} \rangle {\mathbf{1}} \,\leqslant \, \mathcal {T}[{\mathbf{R}}] \,\leqslant \, \Gamma \,\langle {\mathbf{R}} \rangle {\mathbf{1}}, \qquad {\mathbf{R}} \in \overline{\mathscr {C}}_+, \end{aligned} \end{aligned}$$
(4.34)

for some positive constants \( \gamma \) and \( \Gamma \). Then \( \mathcal {T} \) has a spectral gap of size \( \theta := \frac{\gamma ^{6}}{2\Gamma ^{4}} \), i.e.,

$$\begin{aligned} \begin{aligned} {{\mathrm{Spec}}}\,{\mathcal {T}} \subseteq [-1+\theta ,1-\theta ] \cup \{ {1} \} . \end{aligned} \end{aligned}$$
(4.35)

Furthermore, the eigenvalue 1 is non-degenerate and the corresponding normalized, \(\Vert {\mathbf{T}} \Vert _{\mathrm{hs}}=1\), eigenmatrix \({\mathbf{T}} \in \mathscr {C}_+\) satisfies

$$\begin{aligned} \begin{aligned} {\textstyle \frac{ \gamma }{ \!\sqrt{\Gamma } }}\, {\mathbf{1}}\,\leqslant \, {\mathbf{T}} \,\leqslant \,\Gamma \, {\mathbf{1}}. \end{aligned} \end{aligned}$$
(4.36)

Lemma 4.8 shows the uniqueness of the eigenmatrix \({\mathbf{F}}\) as well. In the regime \(|\zeta |\geqslant 3(1+\kappa )\) the constants hidden in the comparison relation of (4.33) will depend on \(|\zeta |\), but otherwise the upcoming arguments are not affected. In particular the qualitative property of having a unique eigenmatrix \({\mathbf{F}}\) remains true even for large values of \(|\zeta |\).

Proof of (4.33): The bounds in (4.33) are a consequence of Assumption A1 and the bounds (4.23) on \({\mathbf{W}}\) and (4.11) on \({{\mathrm{Im}}}\, {\mathbf{M}}\), respectively. Indeed, from A1 we have \(\mathcal {S}[{\mathbf{R}}]\sim \langle {\mathbf{R}} \rangle {\mathbf{1}}\) for positive semidefinite matrices \({\mathbf{R}}\). By the definition (4.21a) of \(\mathcal {F}\) this immediately yields

$$\begin{aligned} \mathcal {F}[{\mathbf{R}}] \;\sim \; \big \langle \mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}\;\mathcal {C}_{{\mathbf{W}}}[{\mathbf{R}}] \big \rangle \, \mathcal {C}_{{\mathbf{W}}}\mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}[{\mathbf{1}}] \;=\; \langle {\mathcal {C}_{{\mathbf{W}}}[{{\mathrm{Im}}}\, {\mathbf{M}}]} , {{\mathbf{R}}}\rangle \; \mathcal {C}_{{\mathbf{W}}}[{{\mathrm{Im}}}\, {\mathbf{M}}] . \end{aligned}$$

Since (4.23) and (4.11) imply \( \Vert {\mathbf{M}} \Vert ^{-2}{\mathbf{1}}\lesssim \mathcal {C}_{{\mathbf{W}}}[{{\mathrm{Im}}}{\mathbf{M}}]\lesssim \Vert {\mathbf{M}} \Vert ^3{\mathbf{1}} \), we conclude that (4.33) holds. \(\square \)

Proof of Proposition 4.4

To show (4.17) we consider the regime of large and small values of \(|\zeta |\) separately. We start with the simpler regime, \(|\zeta |\geqslant 3(1+\kappa )\). In this case we apply the bound \( \Vert {\mathbf{M}}(\zeta ) \Vert \leqslant (|\zeta |-\kappa )^{-1} \), which is an immediate consequence of the Stieltjes transform representation (2.5) of \({\mathbf{M}}\). In particular,

$$\begin{aligned} \begin{aligned} \Vert \mathcal {C}_{{\mathbf{M}}(\zeta )}\mathcal {S} \Vert _{\mathrm{sp}}\,\leqslant \, \frac{\Vert \mathcal {S} \Vert _{\mathrm{sp}} }{(|\zeta |-\kappa )^2} \,\leqslant \, \frac{\Vert \mathcal {S} \Vert }{4(1+\kappa )^2}\,\leqslant \,\frac{1}{4}, \end{aligned} \end{aligned}$$
(4.37)

where we used \(\kappa \geqslant \Vert \mathcal {S} \Vert ^{1/2}\) in the last and second to last inequality. We also used that \(\Vert \mathcal {T} \Vert _{\mathrm{sp}}\leqslant \Vert \mathcal {T} \Vert \) for any self-adjoint \(\mathcal {T} \in \mathbb {C}^{N \times N}\). The claim (4.17) hence follows in the regime of large \(|\zeta |\).

Now we consider the regime \(|\zeta |\leqslant 3(1+\kappa )\). Here we will use the spectral properties of the saturated self-energy operator \(\mathcal {F}\), established in Lemma 4.7. First we rewrite \(\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}(\zeta )}\mathcal {S}\) in terms of \(\mathcal {F}\). For this purpose we recall the definition of \({\mathbf{U}}\) from (4.29). With the identity (4.30) we find

$$\begin{aligned} \begin{aligned} \mathcal {C}_{{\mathbf{M}}} \,=\,\mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}} \,\mathcal {C}_{{\mathbf{W}}}\mathcal {C}_{{\mathbf{U}}^*}\mathcal {C}_{{\mathbf{W}}}\mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}. \end{aligned} \end{aligned}$$
(4.38)

Combining (4.38) with the definition of \(\mathcal {F}\) from (4.21a) we verify

$$\begin{aligned} \begin{aligned} \mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S} \,=\, \mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}\,\mathcal {C}_{{\mathbf{W}}}\mathcal {C}_{{\mathbf{U}}^*}(\mathcal {C}_{{\mathbf{U}}}-\mathcal {F}) \mathcal {C}_{{\mathbf{W}}}^{-1}\mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}^{-1}. \end{aligned} \end{aligned}$$
(4.39)

The bounds (4.23) on \({\mathbf{W}}\) and (4.11) on \({{\mathrm{Im}}}\, {\mathbf{M}}\) imply bounds on \(\mathcal {C}_{{\mathbf{W}}}\) and \(\mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}\), respectively. In fact, in the regime of bounded \(|\zeta |\), we have

$$\begin{aligned} \begin{aligned} \Vert \mathcal {C}_{{\mathbf{W}}} \Vert \,\lesssim \, \frac{\Vert {\mathbf{M}} \Vert }{\rho }, \quad \Vert \mathcal {C}_{{\mathbf{W}}}^{-1} \Vert \,\lesssim \,\rho \Vert {\mathbf{M}} \Vert ^2, \quad \Vert \mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}} \Vert \,\lesssim \, \rho \Vert {\mathbf{M}} \Vert ^2, \quad \Vert \mathcal {C}_{\!\sqrt{{{\mathrm{Im}}}\, {\mathbf{M}}}}^{-1} \Vert \,\lesssim \,\frac{1}{\rho }. \end{aligned} \end{aligned}$$
(4.40)

Therefore, taking the inverse and then the norm \(\Vert \cdot \Vert _{\mathrm{sp}}\) on both sides of (4.39) and using (4.40) as well as \(\Vert \mathcal {C}_{{\mathbf{T}}} \Vert _{\mathrm{sp}}\leqslant \Vert \mathcal {C}_{{\mathbf{T}}} \Vert \) for self-adjoint \({\mathbf{T}} \in \mathbb {C}^{N \times N}\) yields

$$\begin{aligned} \begin{aligned} \Vert (\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^{-1} \Vert _{\mathrm{sp}}\,\lesssim \, \Vert {\mathbf{M}} \Vert ^5\Vert (\mathcal {C}_{{\mathbf{U}}}-\mathcal {F})^{-1} \Vert _{\mathrm{sp}}. \end{aligned} \end{aligned}$$
(4.41)

Note that \(\mathcal {C}_{{\mathbf{U}}}\) and \(\mathcal {C}_{{\mathbf{U}}^*}\) are unitary operators on \(\mathbb {C}^{N \times N}\) and thus \(\Vert \mathcal {C}_{{\mathbf{U}}} \Vert _{\mathrm{sp}}=\Vert \mathcal {C}_{{\mathbf{U}}^*} \Vert _{\mathrm{sp}}=1\). We estimate the norm of the inverse of \(\mathcal {C}_{{\mathbf{U}}}-\mathcal {F}\). In case \(\Vert \mathcal {F} \Vert _{\mathrm{sp}}< 1/2\) we will simply use the bound \(\Vert (\mathcal {C}_{{\mathbf{U}}}-\mathcal {F})^{-1} \Vert _{\mathrm{sp}}\leqslant 2\) in (4.41) and (4.9) for estimating \(\Vert {\mathbf{M}} \Vert \), thus verifying (4.17) in this case.

If \(\Vert \mathcal {F} \Vert _{\mathrm{sp}}\geqslant 1/2\), we apply the following lemma, which was stated as Lemma 5.8 in [2].

Lemma 4.9

(Rotation–Inversion Lemma) Let \( \mathcal {T} \) be a self-adjoint and \(\mathcal {U}\) a unitary operator on \( \mathbb {C}^{N \times N} \). Suppose that \( \mathcal {T} \) has a spectral gap, i.e., there is a constant \( \theta > 0 \) such that

$$\begin{aligned} {{\mathrm{Spec}}}\,{\mathcal {T}} \,\subseteq \, \bigl [ -\Vert \mathcal {T} \Vert _{\mathrm{sp}} \!+ \theta , \Vert \mathcal {T} \Vert _{\mathrm{sp}} \!-\theta \,\bigr ] \cup \bigl \{ {\Vert \mathcal {T} \Vert _{\mathrm{sp}}} \bigl \} , \end{aligned}$$

with a non-degenerate largest eigenvalue \( \Vert \mathcal {T} \Vert _{\mathrm{sp}} \leqslant 1 \). Then there exists a universal positive constant C such that

$$\begin{aligned} \Vert (\mathcal {U}-\mathcal {T})^{-1} \Vert _{\mathrm{sp}} \,\leqslant \, \frac{C}{\theta }\,|1 - \Vert \mathcal {T} \Vert _{\mathrm{sp}}\langle {{\mathbf{T}}} , { \mathcal {U}[{\mathbf{T}}]}\rangle |^{-1}, \end{aligned}$$

where \( {\mathbf{T}} \) is the normalized, \(\Vert {\mathbf{T}} \Vert _{\mathrm{hs}}=1\), eigenmatrix of \(\mathcal {T}\), corresponding to \(\Vert \mathcal {T} \Vert _{\mathrm{sp}}\).

With the lower bound (4.26) on the spectral gap of \(\mathcal {F}\), we find

$$\begin{aligned} \begin{aligned} \Vert (\mathcal {C}_{{\mathbf{U}}}-\mathcal {F})^{-1} \Vert _{\mathrm{sp}} \,&\lesssim \, \frac{\Vert {\mathbf{M}} \Vert ^{42 }}{\max \bigl \{ {1-\Vert \mathcal {F} \Vert _{\mathrm{sp}},|1-\langle {{\mathbf{F}}} , {\mathcal {C}_{{\mathbf{U}}}[{\mathbf{F}}]}\rangle |} \bigl \} } . \end{aligned} \end{aligned}$$
(4.42)

Plugging (4.42) into (4.41) and using (4.9) to estimate \(\Vert {\mathbf{M}} \Vert \), shows (4.17), provided the denominator on the right hand side of (4.42) satisfies

$$\begin{aligned} \begin{aligned} \max \bigl \{ {1-\Vert \mathcal {F} \Vert _{\mathrm{sp}},|1-\langle {{\mathbf{F}}} , {\mathcal {C}_{{\mathbf{U}}}[{\mathbf{F}}]}\rangle |} \bigl \} \;\gtrsim \, (\rho (\zeta )+{{\mathrm{dist}}}(\zeta ,{{\mathrm{supp\,}}}\rho ))^{ C}, \end{aligned} \end{aligned}$$
(4.43)

for some universal constant \( C>0\).

In the remainder of this proof we will verify (4.43). We establish lower bounds on both arguments of the maximum in (4.43) and combine them afterwards. We start with a lower bound on \(1-\Vert \mathcal {F} \Vert _{\mathrm{sp}}\). Estimating the numerator of the fraction on the right hand side of (4.24) from below

$$\begin{aligned} \langle {{\mathbf{F}}} , {\mathcal {C}_{{\mathbf{W}}}[{{\mathrm{Im}}}\,{\mathbf{M}}]}\rangle \,\gtrsim \, \rho \, \langle {{\mathbf{F}}} , {\!{\mathbf{W}}^2}\rangle \,\gtrsim \,\Vert {\mathbf{M}} \Vert ^{-2}\langle {\mathbf{F}} \rangle , \end{aligned}$$

and its denominator from above, \( \langle {{\mathbf{F}}} , {\!{\mathbf{W}}^{-2}}\rangle \lesssim \rho \Vert {\mathbf{M}} \Vert ^2\langle {\mathbf{F}} \rangle , \) by applying the bounds from (4.23) and (4.11), we see that

$$\begin{aligned} \begin{aligned} 1-\Vert \mathcal {F}(\zeta ) \Vert _{\mathrm{sp}}\,\gtrsim \, \frac{{{\mathrm{Im}}}\, \zeta }{\rho (\zeta )\Vert {\mathbf{M}}(\zeta ) \Vert ^4}. \end{aligned} \end{aligned}$$
(4.44)

Since \(\rho (\zeta )\) is the harmonic extension of a probability density (namely the self-consistent density of states \(\rho \)), we have the trivial upper bound \(\rho (\zeta )\lesssim {{\mathrm{Im}}}\, \zeta /{{\mathrm{dist}}}(\zeta , {{\mathrm{supp\,}}}\rho )^2\). Continuing from (4.44) we find the lower bound

$$\begin{aligned} \begin{aligned} 1-\Vert \mathcal {F} \Vert _{\mathrm{sp}}&\,\gtrsim \, \Vert {\mathbf{M}} \Vert ^{-4}{{\mathrm{dist}}}(\zeta , {{\mathrm{supp\,}}}\rho )^2 \\&\,\gtrsim \, (\rho +{{\mathrm{dist}}}(\zeta ,{{\mathrm{supp\,}}}\rho ))^{4}{{\mathrm{dist}}}(\zeta , {{\mathrm{supp\,}}}\rho )^2 , \end{aligned} \end{aligned}$$
(4.45)

where we used (4.9) in the second inequality.

Now we estimate \(|1-\langle {{\mathbf{F}}} , {\mathcal {C}_{{\mathbf{U}}}[{\mathbf{F}}]}\rangle |\) from below. We begin with

$$\begin{aligned} \begin{aligned} \!|1-\langle {{\mathbf{F}}} , {\mathcal {C}_{{\mathbf{U}}}[{\mathbf{F}}]}\rangle |&\geqslant {{\mathrm{Re\,}}}\bigl [ 1-\langle {{\mathbf{F}}} , {\mathcal {C}_{{\mathbf{U}}}[{\mathbf{F}}]}\rangle \bigr ] \,=\, 1-\big \langle {\mathbf{F}},(\mathcal {C}_{{{\mathrm{Re\,}}}{\mathbf{U}}}-\mathcal {C}_{{{\mathrm{Im}}}\, {\mathbf{U}}})[{\mathbf{F}}] \big \rangle \\&\geqslant \langle {{\mathbf{F}}} , {\mathcal {C}_{{{\mathrm{Im}}}\, {\mathbf{U}}}[{\mathbf{F}}]}\rangle , \end{aligned} \end{aligned}$$
(4.46)

where \(1- \langle {{\mathbf{F}}} , {\mathcal {C}_{{{\mathrm{Re\,}}}{\mathbf{U}}}{\mathbf{F}}}\rangle \geqslant 0\) in the last inequality, because \({\mathbf{U}}\) is unitary and \(\Vert {\mathbf{F}} \Vert _{\mathrm{hs}}=1\). Since \({{\mathrm{Im}}}\, {\mathbf{U}} =- {\mathbf{W}}^{-2}\) [cf. (4.29) and (4.21)] and because of (4.23) we have \( -{{\mathrm{Im}}}\, {\mathbf{U}}\gtrsim \Vert {\mathbf{M}} \Vert ^{-2}\rho . \) Continuing from (4.46), using the normalization \(\Vert {\mathbf{F}} \Vert _{\mathrm{hs}}=1 \) and (4.9), we get the lower bound

$$\begin{aligned} |1-\langle {{\mathbf{F}}} , {\mathcal {C}_{{\mathbf{U}}}[{\mathbf{F}}]}\rangle |\,\gtrsim \, \rho ^{2}\Vert {\mathbf{M}} \Vert ^{-4} \,\gtrsim \; (\rho +{{\mathrm{dist}}}(\zeta ,{{\mathrm{supp\,}}}\rho ))^{4} \rho ^{2}. \end{aligned}$$

Combining this with (4.45) shows (4.43) and thus finishes the proof of Proposition 4.4.

Proof of Proposition 2.2

We show that the harmonic extension \(\rho (\zeta )\) of the self-consistent density of states [cf. (2.10)] is uniformly c-Hölder continuous on the entire complex upper half plane. Thus its unique continuous extension to the real line, the self-consistent density of states, inherits this regularity.

We differentiate both sides of the MDE with respect to \(\zeta \) and find the equation

$$\begin{aligned} \begin{aligned} (\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})[\partial _\zeta {\mathbf{M}}] = {\mathbf{M}}^{2} . \end{aligned} \end{aligned}$$
(4.47)

Inverting the operator \(\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S}\) and taking the normalized Hilbert–Schmidt norm reveals a bound on the derivative of the solution to the MDE,

$$\begin{aligned} \begin{aligned} \Vert \partial _\zeta {\mathbf{M}} \Vert _{\mathrm{hs}} \,\leqslant \, \Vert (\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^{-1} \Vert _{\mathrm{sp}} \Vert {\mathbf{M}} \Vert ^2. \end{aligned} \end{aligned}$$
(4.48)

Since \(\zeta \rightarrow \langle {\mathbf{M}}(\zeta ) \rangle \) is an analytic function on \(\mathbb {H}\), we have the basic identity \(2\pi \mathrm {i}\partial _\zeta \rho =2\mathrm {i}\partial _\zeta \,{{\mathrm{Im}}}\langle {\mathbf{M}} \rangle \,=\, \partial _\zeta \langle {\mathbf{M}} \rangle \). Therefore, making use of (4.48), we get

$$\begin{aligned} \begin{aligned} |\partial _\zeta \rho | \,=\, {\textstyle \frac{1}{2\pi }}|\langle \partial _\zeta {\mathbf{M}} \rangle | \,\leqslant \, {\textstyle \frac{1}{2}}\Vert (\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^{-1} \Vert _{\mathrm{sp}} \Vert {\mathbf{M}} \Vert ^2 \,\lesssim \, \rho ^{-(C+2)}. \end{aligned} \end{aligned}$$
(4.49)

For the last inequality in (4.49) we employed the bound (4.9) and the linear stability, Proposition 4.4. The universal constant C stems from its statement (4.17). From (4.49) we read off that the harmonic extension \(\rho \) of the self-consistent density of states is \(\frac{1}{C+3}\)-Hölder continuous.

It remains to prove that \( \rho \) is real analytic at any \(\tau _0\) with \(\rho (\tau _0)>0\). Since \(\rho \) is continuous, it is bounded away from zero in a neighborhood of \(\tau _0\). Using (4.47), (4.9) and (4.17) we conclude that \({\mathbf{M}}\) is uniformly continuous in the intersection of a small neighborhood of \(\tau _0\) in \(\mathbb {C}\) with the complex upper half plane. In particular, \({\mathbf{M}}\) has a unique continuous extension \({\mathbf{M}}(\tau _0)\) to \(\tau _0\). Furthermore, by differentiating (2.2) with respect to \(\zeta \) and by the uniqueness of the solution to (2.2) with positive imaginary part one verifies that \({\mathbf{M}}\) coincides with the solution \({\mathbf{Q}}\) to the holomorphic initial value problem

$$\begin{aligned} \partial _\omega {\mathbf{Q}}\,=\, (\mathrm{Id}-\mathcal {C}_{{\mathbf{Q}}}\mathcal {S})^{-1} [{\mathbf{Q}}^2],\qquad {\mathbf{Q}}(0)\,=\, {\mathbf{M}}(\tau _0), \end{aligned}$$

i.e. \({\mathbf{M}}(\tau _0+\omega )={\mathbf{Q}}(\omega )\) for any \(\omega \in \mathbb {H}\) with sufficiently small absolute value. Since the solution \({\mathbf{Q}}\) is analytic in a small neighborhood of zero, we conclude that \({\mathbf{M}}\) can be holomorphically extended to a neighborhood of \(\tau _0\) in \(\mathbb {C}\). By continuity (2.10) remains true for \(\zeta \in \mathbb {R}\) close to \(\tau _0\) and thus \(\rho \) is real analytic there. \(\square \)

In the proof of Theorem 2.6 we will often consider \(\mathcal T:(\mathbb {C}^{N \times N}, \Vert \cdot \Vert _A)\rightarrow (\mathbb {C}^{N \times N}, \Vert \cdot \Vert _B)\), i.e., \(\mathcal T\) is a linear operator on \(\mathbb {C}^{N \times N}\) equipped with two different norms. We indicate the norms in the notation of the corresponding induced operator norm \(\Vert \mathcal {T} \Vert _{A \rightarrow B}\). We will use \(A,B=\mathrm{hs}, \Vert \cdot \Vert , 1, \infty , \mathrm{max}\), etc. We still keep our convention that \( \Vert \mathcal {T} \Vert _{\mathrm{sp}}=\Vert \mathcal {T} \Vert _{\mathrm{hs}\rightarrow \mathrm{hs}} \) and \( \Vert \mathcal {T} \Vert =\Vert \mathcal {T} \Vert _{\Vert \cdot \Vert \rightarrow \Vert \cdot \Vert } \). Furthermore, we introduce the norms

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{R}} \Vert _1&:= \max _y{ \textstyle \sum _x}\,|r_{x y} | ,\quad \Vert {\mathbf{R}} \Vert _\infty := \max _x {\textstyle \sum _y}\, |r_{xy} | ,\quad \\ \Vert {\mathbf{R}} \Vert _{1\vee \infty }&:=\max \bigl \{ {\Vert {\mathbf{R}} \Vert _1,\Vert {\mathbf{R}} \Vert _\infty } \bigl \} . \end{aligned} \end{aligned}$$
(4.50)

Some of the norms on matrices \( {\mathbf{R}} \in \mathbb {C}^{N \times N}\) are ordered, e.g. \(\max \{\Vert {\mathbf{R}} \Vert _\mathrm{max},\Vert {\mathbf{R}} \Vert _{\mathrm{hs}}\}\leqslant \Vert {\mathbf{R}} \Vert \leqslant \Vert {\mathbf{R}} \Vert _{1\vee \infty }\). Note that if \(\Vert \cdot \Vert _{\widetilde{A}}\leqslant \Vert \cdot \Vert _{ A}\) and \(\Vert \cdot \Vert _{B}\leqslant \Vert \cdot \Vert _{ \widetilde{B}}\), then \(\Vert \cdot \Vert _{A \rightarrow B}\leqslant \Vert \cdot \Vert _{\widetilde{A} \rightarrow \widetilde{B}}\). In particular, for \(\mathcal T:\mathbb {C}^{N \times N}\rightarrow \mathbb {C}^{N \times N}\) we have e.g. \( \Vert \mathcal {T} \Vert _{\mathrm{max}\rightarrow \mathrm{hs}} \leqslant \Vert \mathcal {T} \Vert _{\mathrm{max}\rightarrow \Vert \cdot \Vert } \leqslant \Vert \mathcal {T} \Vert _{\mathrm{max}\rightarrow 1 \vee \infty } \).

In order to show the existence and properties of the map \( {\mathbf{D}} \mapsto \varvec{\mathfrak {G}}({\mathbf{D}}) \) from Theorem 2.6 we rely on an implicit function theorem, which we state here for reference purposes.

Lemma 4.10

(Quantitative implicit function theorem) Let \( T: \mathbb {C}^{A}\times \mathbb {C}^{D} \rightarrow \mathbb {C}^{A}\) be a continuously differentiable function with invertible derivative \(\nabla ^{(1)}T(0,0)\) at the origin with respect to the first argument and \({T}(0,0)=0\). Suppose \(\mathbb {C}^{A}\) and \(\mathbb {C}^{D}\) are equipped with norms that we both denote by \(\Vert \cdot \Vert \), and let the linear operators on these spaces be equipped with the corresponding induced operator norms. Let \(\delta > 0 \) and \( C_1,C_2 < \infty \) be constants, such that

  1. (a)

    \(\Vert (\nabla ^{(1)}T(0,0))^{-1} \Vert \leqslant C_1 \);

  2. (b)

    \(\big \Vert \,\mathrm{Id}_{\mathbb {C}^{A}}-(\nabla ^{(1)}T(0,0))^{-1}\nabla ^{(1)}T(a,d)\big \Vert \leqslant \frac{1}{2}\), for every\((a,d) \in B^A_\delta \times B^D_\delta \);

  3. (c)

    \(\Vert \nabla ^{(2)}T(a,d) \Vert \leqslant C_2 \), for every\( (a,d) \in B^A_\delta \times B^D_\delta \).

Here \(B_\delta ^\#\) is the \(\delta \)-ball around 0 with respect to \(\Vert \cdot \Vert \) in \(\mathbb {C}^\#\), and \(\nabla ^{(2)}\) denotes the derivative with respect to the second variable.

Then there exists a constant \( \varepsilon > 0 \), depending only on \(\delta \), \(C_1\) and \(C_2\), and a unique continuously differentiable function \( f : B^D_\varepsilon \rightarrow B^A_\delta \), such that \( T(f(d),d) = 0 \), for every \( d \in B^D_\varepsilon \). Furthermore, if T is analytic, then so is f.

The proof of this result is elementary and left to the reader.

Proof of Theorem 2.6

To apply Lemma 4.10 we define \(\mathcal {J}:\mathbb {C}^{N \times N}\times \mathbb {C}^{N \times N}\rightarrow \mathbb {C}^{N \times N}\) by

$$\begin{aligned} \mathcal {J}[\varvec{\mathfrak {G}},{\mathbf{D}}]\,:=\, {\mathbf{1}} + (\zeta {\mathbf{1}}-{\mathbf{A}}+\mathcal {S}[\varvec{\mathfrak {G}}])\varvec{\mathfrak {G}}+{\mathbf{D}} . \end{aligned}$$

With this definition the perturbed MDE (2.16) takes the form

$$\begin{aligned} \mathcal {J}[\varvec{\mathfrak {G}}({\mathbf{D}}),{\mathbf{D}}]\,=\, {\mathbf{0}}. \end{aligned}$$

In particular, the unperturbed MDE (2.2) is \(\mathcal {J}[{\mathbf{M}},{\mathbf{0}}]= {\mathbf{0}}\), with \({\mathbf{M}}=\varvec{\mathfrak {G}}({\mathbf{0}})\).

For the application of the implicit function theorem we control the derivatives of \(\mathcal {J}\) with respect to \(\varvec{\mathfrak {G}}\) and \({\mathbf{D}}\). With the short hand notation,

$$\begin{aligned} \mathcal {W}_{{\mathbf{R}}}[{\mathbf{T}}] \,:=\, {\mathbf{M}}(\mathcal {S}[{\mathbf{T}}]{\mathbf{R}}+\mathcal {S}[{\mathbf{R}}]{\mathbf{T}}) , \end{aligned}$$
(4.51)

we compute the directional derivative of \(\mathcal {J}\) with respect to \(\varvec{\mathfrak {G}}\) in the direction \({\mathbf{R}} \in \mathbb {C}^{N \times N}\),

$$\begin{aligned} \begin{aligned} \nabla ^{(\varvec{\mathfrak {G}})}_{\!{\mathbf{R}}}\!\mathcal {J}[\varvec{\mathfrak {G}},{\mathbf{D}}] \,&=\, (\zeta {\mathbf{1}}-{\mathbf{A}}+\mathcal {S}[\varvec{\mathfrak {G}}]){\mathbf{R}}+\mathcal {S}[{\mathbf{R}}]\varvec{\mathfrak {G}} \\&=\, -{\mathbf{M}}^{-1}(\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S}-\mathcal {W}_{\varvec{\mathfrak {G}}-{\mathbf{M}}})[{\mathbf{R}}] . \end{aligned} \end{aligned}$$
(4.52)

For the second identity in (4.52) we used (2.2).

The derivative with respect to \({\mathbf{D}}\) is simply the identity operator, \( \nabla ^{({\mathbf{D}})}\!\mathcal {J}[\varvec{\mathfrak {G}},{\mathbf{D}}]= \mathrm{Id} \). Therefore, estimating \(\nabla ^{({\mathbf{D}})}\mathcal {J}\) for the hypothesis (c) of Lemma 4.10 is trivial.

We consider \( \mathbb {C}^{N \times N} \cong \mathbb {C}^{N^2}\) with the entrywise maximum norm \(\Vert \cdot \Vert _\mathrm{max}\) and use the short hand notation \(\Vert \mathcal {T} \Vert _\mathrm{max}:=\Vert \mathcal {T} \Vert _{\mathrm{max}\rightarrow \mathrm{max}}\) for the induced operator norm of any linear \(\mathcal {T}: \mathbb {C}^{N \times N}\rightarrow \mathbb {C}^{N \times N}\). To apply Lemma 4.10 in this setup we need the following two estimates:

  1. (i)

    The operator norm of \( (\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^{-1} \) on \((\mathbb {C}^{N \times N}, \Vert \cdot \Vert _\mathrm{max})\) is controlled by its spectral norm,

    $$\begin{aligned} \begin{aligned} \Vert (\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^{-1} \Vert _{\mathrm{max}} \,\lesssim \; 1 \,+\Vert {\mathbf{M}} \Vert ^2+\Vert {\mathbf{M}} \Vert ^4\Vert (\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^{-1} \Vert _{\mathrm{sp}}. \end{aligned} \end{aligned}$$
    (4.53)
  2. (ii)

    The operator norm of \(\mathcal {W}_{\varvec{\mathfrak {G}}-{\mathbf{M}}}\) is small, provided \(\varvec{\mathfrak {G}}\) is close to \({\mathbf{M}}\),

    $$\begin{aligned} \begin{aligned} \Vert \mathcal {W}_{\varvec{\mathfrak {G}}-{\mathbf{M}}} \Vert _{\mathrm{max}}\lesssim \, \Vert {\mathbf{M}} \Vert _{1\vee \infty } \Vert \varvec{\mathfrak {G}}-{\mathbf{M}} \Vert _{\mathrm{max}}. \end{aligned} \end{aligned}$$
    (4.54)

We will prove these estimates after we have used them to show that the hypotheses of the quantitative inverse function theorem hold.

Let us first bound the operator \( {\mathbf{R}}\mapsto \nabla ^{(\varvec{\mathfrak {G}})}_{\!{\mathbf{R}}}\!\mathcal {J}[{\mathbf{M}},{\mathbf{0}}] \). To this end, using (4.52) we have

$$\begin{aligned} \begin{aligned} \Vert (\nabla ^{(\varvec{\mathfrak {G}})}\!\mathcal {J}[{\mathbf{M}},{\mathbf{0}}])^{-1}[{\mathbf{R}}] \Vert _\mathrm{max}&\leqslant \,\Vert (\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^{-1} \Vert _{\mathrm{max}}\, \Vert {\mathbf{M}} \Vert _{1\vee \infty } \Vert {\mathbf{R}} \Vert _{\mathrm{max}} , \end{aligned} \end{aligned}$$
(4.55)

for an arbitrary \( {\mathbf{R}} \). For the last line we have used \( \Vert {\mathbf{M}}{\mathbf{R}} \Vert _{\mathrm{max}} \leqslant \Vert {\mathbf{M}} \Vert _{1\vee \infty } \Vert {\mathbf{R}} \Vert _{\mathrm{max}}\). By Theorem 2.5 there is a sequence \(\underline{\gamma } \!\,\), depending only on \(\delta \) and \(\mathscr {P}\), such that

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{M}} \Vert _{1\vee \infty } \leqslant \, \max _x \sum _y \frac{\gamma ({\nu })}{(1+d(x,y))^{{\nu }}} +\gamma (0) , \qquad {\nu } \in \mathbb {N}. \end{aligned} \end{aligned}$$
(4.56)

Here and in the following unrestricted summations \(\sum _x\) are understood to run over the entire index set from 1 to N. Since the sizes of the balls with respect to d grow only polynomially in their radii [cf. (2.11)], the right hand side of (4.56) is bounded by a constant that only depends on \(\delta \) and \(\mathscr {P}\) for a sufficiently large choice of \({\nu }\). Using this estimate together with the bound (i) for the inverse of \( \mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S} \) in (4.55) yields the bound \( \Vert (\nabla ^{(\varvec{\mathfrak {G}})}_{\!{\mathbf{R}}}\!\mathcal {J}[{\mathbf{M}},{\mathbf{0}}])^{-1} \Vert _\mathrm{max} \lesssim 1 \) for (a) of Lemma 4.10.

Next, to verify the assumption (b) of Lemma 4.10 we write

$$\begin{aligned} \begin{aligned} \mathrm{Id}-(\nabla ^{(\varvec{\mathfrak {G}})}\!\mathcal {J}[{\mathbf{M}},{\mathbf{0}}])^{-1}\nabla ^{(\varvec{\mathfrak {G}})}\!\mathcal {J}[\varvec{\mathfrak {G}},{\mathbf{D}}] \,=\, (\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^{-1}\mathcal {W}_{\varvec{\mathfrak {G}}-{\mathbf{M}}} . \end{aligned} \end{aligned}$$
(4.57)

Using (4.54) and (4.53), in conjunction with (4.9) and (4.17), we see that

$$\begin{aligned} \begin{aligned} \big \Vert \mathrm{Id}-(\nabla ^{(\varvec{\mathfrak {G}})}\!\mathcal {J}[{\mathbf{M}},{\mathbf{0}}])^{-1}\nabla ^{(\varvec{\mathfrak {G}})}\!\mathcal {J}[\varvec{\mathfrak {G}},{\mathbf{D}}]\big \Vert _{\mathrm{max} } \leqslant \, \frac{1}{2}, \end{aligned} \end{aligned}$$
(4.58)

for all \( (\varvec{\mathfrak {G}},{\mathbf{D}}) \in B^{\mathrm{max}}_{c_2}({\mathbf{M}}) \times B^{\mathrm{max}}_{c_1}({\mathbf{0}}) \), provided \( c_1,c_2 \sim 1 \) are sufficiently small. The first part of Theorem 2.6, the existence and uniqueness of the analytic function \(\varvec{\mathfrak {G}}\), now follows from the implicit function theorem Lemma 4.10. In particular, (2.17) follows from the analyticity.

Proof of (i) First we remark that (2.13) for a large enough \(\nu \) and together with (2.11) imply

$$\begin{aligned} \begin{aligned} \Vert \mathcal {S} \Vert _{\mathrm{max} \rightarrow 1\vee \infty }\,\lesssim \, 1. \end{aligned} \end{aligned}$$
(4.59)

We expand the geometric series corresponding to the operator \((\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^{-1}\) to second order,

$$\begin{aligned} \begin{aligned} (\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^{-1} \,=\, {\mathrm{Id}+\mathcal {C}_{{\mathbf{M}}}\mathcal {S}+\frac{(\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^2}{\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S}}}. \end{aligned} \end{aligned}$$
(4.60)

We consider each of the three terms on the right hand side separately and estimate their norms as operators from \(\mathbb {C}^{N \times N}\) with the entrywise maximum norm to itself.

The easiest is \(\Vert \mathrm{Id} \Vert _\mathrm{max}=1\). For the second term we use the estimate

$$\begin{aligned} \begin{aligned} \Vert \mathcal {C}_{{\mathbf{M}}}\mathcal {S} \Vert _{\mathrm{max}}\,\leqslant \,\Vert \mathcal {C}_{{\mathbf{M}}}\mathcal {S} \Vert _{\mathrm{max} \rightarrow \Vert \cdot \Vert } \,\leqslant \, \Vert \mathcal {C}_{{\mathbf{M}}} \Vert \Vert \mathcal {S} \Vert _{\mathrm{max} \rightarrow \Vert \cdot \Vert }. \end{aligned} \end{aligned}$$
(4.61)

For the third term on the right hand side of (4.60) we apply

$$\begin{aligned} \begin{aligned} \Big \Vert \frac{(\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^2}{\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S}}\Big \Vert _{\mathrm{max}} \,\leqslant \, \Vert \mathcal {C}_{{\mathbf{M}}}\mathcal {S} \Vert _{\mathrm{hs} \rightarrow \mathrm{max}} \Vert (\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^{-1} \Vert _{\mathrm{sp}} \Vert \mathcal {C}_{{\mathbf{M}}}\mathcal {S} \Vert _{\mathrm{max} \rightarrow \mathrm{hs}}. \end{aligned} \end{aligned}$$
(4.62)

The last factor on the right hand side of (4.62) is bounded by

$$\begin{aligned} \begin{aligned} \Vert \mathcal {C}_{{\mathbf{M}}}\mathcal {S} \Vert _{ \mathrm{max}\rightarrow \mathrm{hs} } \,\leqslant \, \Vert \mathcal {C}_{{\mathbf{M}}}\mathcal {S} \Vert _{ \mathrm{max}\rightarrow \Vert \cdot \Vert } \,\leqslant \, \Vert \mathcal {C}_{{\mathbf{M}}} \Vert \Vert \mathcal {S} \Vert _{\mathrm{max}\rightarrow \Vert \cdot \Vert }. \end{aligned} \end{aligned}$$
(4.63)

For the first factor we use \( \Vert \mathcal {C}_{{\mathbf{M}}}\mathcal {S} \Vert _{ \mathrm{hs} \rightarrow \mathrm{max}} \leqslant \Vert \mathcal {C}_{{\mathbf{M}}}\mathcal {S} \Vert _{ \mathrm{hs} \rightarrow \Vert \cdot \Vert } \leqslant \Vert \mathcal {C}_{{\mathbf{M}}} \Vert \Vert \mathcal {S} \Vert _{\mathrm{hs} \rightarrow \Vert \cdot \Vert } \). We plug this and (4.63) into (4.62). Then we use the resulting inequality in combination with (4.61) in (4.60) and find

$$\begin{aligned} \Vert (\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^{-1} \Vert _{\mathrm{max}}\,\lesssim \, 1+\Vert \mathcal {C}_{{\mathbf{M}}} \Vert +\Vert \mathcal {C}_{{\mathbf{M}}} \Vert ^2\Vert (\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^{-1} \Vert _{\mathrm{sp}}, \end{aligned}$$

where we also used \(\Vert \mathcal {S} \Vert _{\mathrm{max} \rightarrow \Vert \cdot \Vert }\leqslant \Vert \mathcal {S} \Vert _{\mathrm{max} \rightarrow 1\vee \infty }\) and (4.59). Since \(\Vert \mathcal {C}_{{\mathbf{M}}} \Vert \leqslant \Vert {\mathbf{M}} \Vert ^2\) the claim (4.53) follows.

Proof of (ii) Recall the definition of \(\mathcal {W}_{{\mathbf{R}}}\) in (4.51). We estimate

$$\begin{aligned} \begin{aligned} \Vert \mathcal {W}_{{\mathbf{R}}}[{\mathbf{T}}] \Vert _\mathrm{max}&\leqslant \, 2\Vert {\mathbf{M}} \Vert _{1\vee \infty }\Vert \mathcal {S} \Vert _{\mathrm{max}\rightarrow 1\vee \infty }\Vert {\mathbf{R}} \Vert _\mathrm{max}\Vert {\mathbf{T}} \Vert _\mathrm{max}. \end{aligned} \end{aligned}$$
(4.64)

From the bound (4.59) we infer (4.54).

Proof of (2.18) and (2.19): Now we are left with showing the second part of Theorem 2.6, namely that the derivative of \(\varvec{\mathfrak {G}}\) at \({\mathbf{D}}={\mathbf{0}}\) can be written in the form (2.18) with the operator \(\mathcal {Z}\) satisfying (2.19).

Since we have shown the analyticity of \(\varvec{\mathfrak {G}}\), the calculation leading up to (4.20) is now justified and we see that

$$\begin{aligned} \nabla _{{\mathbf{R}}}\varvec{\mathfrak {G}}({\mathbf{0}})\,=\, (\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^{-1}[{\mathbf{M}}{\mathbf{R}}] \,=\, {\mathbf{M}}{\mathbf{R}} +\mathcal {Z}[{\mathbf{R}}], \end{aligned}$$

for all \({\mathbf{R}} \in \mathbb {C}^{N \times N}\). Here, the linear operator \(\mathcal {Z}\) is given by

$$\begin{aligned} \begin{aligned} \mathcal {Z}[{\mathbf{R}}] \,&:=\, \frac{\mathcal {C}_{{\mathbf{M}}}\mathcal {S}}{\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S}}[{\mathbf{M}}{\mathbf{R}}]\\&= \left( {\mathcal {C}_{{\mathbf{M}}}\mathcal {S}+(\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^2+(\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^2(\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^{-1}\mathcal {C}_{{\mathbf{M}}}\mathcal {S}}\right) [{\mathbf{M}}{\mathbf{R}}] . \end{aligned} \end{aligned}$$
(4.65)

We will estimate the entries of the three summands separately.

We show that \(\Vert \mathcal {Z}[{\mathbf{R}}] \Vert _{\underline{\gamma } \!\,}\leqslant \frac{1}{2}\) for any \({\mathbf{R}} \in \mathbb {C}^{N \times N}\) with \(\Vert {\mathbf{R}} \Vert _{\mathrm{max}}\leqslant 1\), where \(\underline{\gamma } \!\,\) depends only on \(\delta \) and \(\mathscr {P}\). We begin with a few easy observations: for two matrices \({\mathbf{R}}, {\mathbf{T}} \in \mathbb {C}^{N \times N}\) that have faster than power law decay, \(\Vert {\mathbf{R}} \Vert _{\underline{\gamma } \!\,_R}\leqslant 1\) and \(\Vert {\mathbf{T}} \Vert _{\underline{\gamma } \!\,_T}\leqslant 1\), their sum and product have faster than power law decay as well, i.e., \(\Vert {\mathbf{R}}+{\mathbf{T}} \Vert _{\underline{\gamma } \!\,_{R+T}}\leqslant 1\) and \(\Vert {\mathbf{R}}{\mathbf{T}} \Vert _{\underline{\gamma } \!\,_{RT}}\leqslant 1\). Here, \(\underline{\gamma } \!\,_{R+T}\) and \(\underline{\gamma } \!\,_{RT}\) depend only on \(\underline{\gamma } \!\,_R,\underline{\gamma } \!\,_T\) and P (cf. (2.11)). Furthermore, we see that by (2.13) the matrix \(\mathcal {S}[{\mathbf{R}}]\) has faster than power law decay for any \({\mathbf{R}}\in \mathbb {C}^{N \times N}\) with \(\Vert {\mathbf{R}} \Vert _{\mathrm{max}}\leqslant 1\).

By the following argument we estimate the first summand on the right hand side of (4.65). Using (2.13), \(\Vert {\mathbf{M}}{\mathbf{R}} \Vert _{\mathrm{max}}\leqslant \Vert {\mathbf{M}} \Vert _{1\vee \infty }\Vert {\mathbf{R}} \Vert _{\mathrm{max}}\) and the estimate (4.56), the matrix \(\mathcal {S}[{\mathbf{M}}{\mathbf{R}}]\) has faster than power law decay. Since \(\mathcal {C}_{{\mathbf{M}}}\) multiplies with \({\mathbf{M}}\) on both sides (cf. (4.16)) and \({\mathbf{M}}\) has faster than power law decay (cf. Theorem 2.5), we conclude that so has \(\mathcal {C}_{{\mathbf{M}}}\mathcal {S}[{\mathbf{M}}{\mathbf{R}}]\).

Now we turn to the second summand on the right hand side of (4.65). Since \(\mathcal {C}_{{\mathbf{M}}}\mathcal {S}[{\mathbf{M}}{\mathbf{R}}]\) has faster than power law decay, its entries are bounded. Using again (2.13) as above, we see that \(\mathcal {C}_{{\mathbf{M}}}\mathcal {S}\) applied to \(\mathcal {C}_{{\mathbf{M}}}\mathcal {S}[{\mathbf{M}}{\mathbf{R}}]\) has faster than power law decay as well.

Finally, we estimate the third summand from (4.65). Since the matrix \(\mathcal {C}_{{\mathbf{M}}}\mathcal {S}[{\mathbf{M}}{\mathbf{R}}]\) has faster than power law decay, its \(\Vert \cdot \Vert _{\mathrm{hs}}\)-norm is bounded. By the linear stability (4.17) and \(\zeta \in \mathbb {D}_\delta \), we conclude \( \Vert (\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^{-1}[\mathcal {C}_{{\mathbf{M}}}\mathcal {S}[{\mathbf{M}}{\mathbf{R}}]] \Vert _{\mathrm{hs}}\,\leqslant \, C(\delta ) \). Thus, we get

$$\begin{aligned} \Vert \,\mathcal {C}_{{\mathbf{M}}}\mathcal {S}\,(\mathrm{Id}-\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^{-1}[\mathcal {C}_{{\mathbf{M}}}\mathcal {S}[{\mathbf{M}}{\mathbf{R}}]] \Vert _{\mathrm{max}}\leqslant & {} C(\delta )\Vert \mathcal {S} \Vert _{\mathrm{hs}\rightarrow \mathrm{max}}\Vert \mathcal {C}_{{\mathbf{M}}} \Vert _{\mathrm{max}} \\\leqslant & {} C(\delta )\Vert \mathcal {S} \Vert _{\mathrm{hs}\rightarrow 1\vee \infty } \Vert {\mathbf{M}} \Vert _{1\vee \infty }^2, \end{aligned}$$

which is bounded by (4.59) and (4.56). Therefore, the third term on the right hand side of (4.65) is an application of \(\mathcal {C}_{{\mathbf{M}}}\mathcal {S}\) to a matrix with bounded entries, which results in a matrix with faster than power law decay. Altogether we have established that (2.19) hold with only \(\Vert \mathcal {Z}[{\mathbf{R}}] \Vert _{\underline{\gamma } \!\,}\) on the left hand side.

It remains to show that also \(\mathcal {Z}^*[{\mathbf{R}}]\) satisfies this bound. Since \(\mathcal {Z}^*\) has a structure that resembles the structure (4.65) of \(\mathcal {Z}\), namely

$$\begin{aligned} \mathcal {Z}^*[{\mathbf{R}}]\,=\, {\mathbf{M}}^* \left( {\mathcal {S}\mathcal {C}_{{\mathbf{M}}^*}+(\mathcal {S}\mathcal {C}_{{\mathbf{M}}^*})^2+(\mathcal {S}\mathcal {C}_{{\mathbf{M}}^*})^2\big (\mathrm{Id}-(\mathcal {C}_{{\mathbf{M}}}\mathcal {S})^*\big )^{-1}\mathcal {S}\mathcal {C}_{{\mathbf{M}}^*}}\right) [{\mathbf{R}}] , \end{aligned}$$

we can follow the same line of reasoning as for the entries of \(\mathcal {Z}[{\mathbf{R}}]\). This finishes the proof of (2.19) and with it the proof of Theorem 2.6. \(\square \)

5 Estimating the error term

In this section we prove the key estimates, stated precisely in Lemmas 3.4 and 5.1, for the error matrix \( {\mathbf{D}} \) that appears as the perturbation in the Eq. (3.1) for the resolvent \( {\mathbf{G}}\). We start by estimating \({\mathbf{D}}(\zeta ) \) in terms of the auxiliary quantity \(\Lambda (\zeta ) \) (cf. (3.9)) when \( \zeta \) is away from the convex hull of \({{\mathrm{supp\,}}}\rho \). To this end, we recall the two endpoints of this convex hull (cf. Proposition 3.5):

$$\begin{aligned} \begin{aligned} \kappa _-\,:=\,\min {{\mathrm{supp\,}}}\rho , \qquad \kappa _+\,:=\,\max {{\mathrm{supp\,}}}\rho . \end{aligned} \end{aligned}$$
(5.1)

Lemma 5.1

Let \(\delta >0\) and \(\varepsilon >0\). Then the error matrix \({\mathbf{D}}\), defined in (3.1b), satisfies

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{D}}(\zeta ) \Vert _{\mathrm{max}}\,\mathbbm {1}(\Lambda (\zeta )\leqslant N^{-\varepsilon })\,\prec \,\frac{1}{\sqrt{N }}+\left( {\frac{\Lambda (\zeta )}{N\, {{\mathrm{Im}}}\, \zeta }}\right) ^{1/2}, \end{aligned} \end{aligned}$$
(5.2)

for all \(\zeta \in \mathbb {H}\) with \( \delta \leqslant {{\mathrm{dist}}}(\zeta , [\kappa _-,\kappa _+])\leqslant \delta ^{-1}\) and \({{\mathrm{Im}}}\, \zeta \geqslant N^{-1+\varepsilon }\).

Convention 5.2

Throughout this section we will use Convention 4.1 with the set of model parameters \(\mathscr {P}\) replaced by the set \(\mathscr {K}\) from (2.27). If the constant C, hidden in the comparison relation, depends on additional parameters \(\mathscr {L}\), then we write \(\alpha \lesssim _\mathscr {L} \beta \).

We rewrite the entries \(d_{xy}\) of \({\mathbf{D}}\) in a different form, that allows us to see their smallness, by expanding the term \(({\mathbf{H}}-{\mathbf{A}})\,{\mathbf{G}}\) (cf. (3.1b)) in neighborhoods of x and y. For any \(B \subseteq \{1, \ldots ,N\}\) we introduce the matrix

$$\begin{aligned} {\mathbf{H}}^B\,=\,(h^B_{xy})_{x,y=1}^N,\qquad h^B_{xy}\,:=\,h_{xy} \mathbbm {1}(x,y \not \in B ), \end{aligned}$$
(5.3)

obtained from \({\mathbf{H}}\) by setting the rows and the columns labeled by the elements of B equal to zero. The corresponding resolvent is

$$\begin{aligned} {\mathbf{G}}^B(\zeta ) \,:=\, ({\mathbf{H}}^B-\zeta {\mathbf{1}})^{-1}. \end{aligned}$$
(5.4)

With this definition, we have the resolvent expansion formula \( {\mathbf{G}}= {\mathbf{G}}^B - {\mathbf{G}}^{B}({\mathbf{H}}-{\mathbf{H}}^B){\mathbf{G}} \). In particular, for any \(y \in B \) the rows of \({\mathbf{G}}\) outside B have the expansion

$$\begin{aligned} G_{u y}\,=\, - \sum _{v}^B \sum _{z\in B} G_{u {v}}^B h_{{v}z} G_{z y},\qquad u \not \in B. \end{aligned}$$
(5.5)

Here we introduced, for any two index sets \(A,B \subseteq \mathbb {X}\), the short hand notation

$$\begin{aligned} \sum _{x \in A}^{B} \;:=\, \sum _{x \in A{\setminus } B}. \end{aligned}$$

In case \(A=\mathbb {X}\) we simply write \(\sum _x^B\) and \( \sum _x = \sum _x^\varnothing \), i.e., the superscript over the summation means exclusion of these indices from the sum. Recall that \({\mathbf{H}}\) is written as a sum of its expectation matrix \({\mathbf{A}}\) and its fluctuation \(\frac{1}{\sqrt{N}}{\mathbf{W}}\) (cf. (2.22)) and therefore

$$\begin{aligned} {\mathbf{D}}\,=\,- N^{-1/2}{\mathbf{W}}{\mathbf{G}}-\mathcal {S}[{\mathbf{G}}]{\mathbf{G}}. \end{aligned}$$

We use the expansion formula (5.5) on the resolvent elements in \(({\mathbf{W}}{\mathbf{G}})_{xy}=\sum _u w_{xu}G_{uy}\) and find that the entries of \({\mathbf{D}}\) can be written in the form

$$\begin{aligned} \begin{aligned} d_{xy}&= - \frac{1}{\sqrt{N}}\sum _{u \in B}w_{x u}G_{u y}\,+\,\frac{1}{\!\sqrt{N}}\sum _{u,\mathrm{v}}^{B} \sum _{z \in B}w_{x u}G_{u {v}}^{B} h_{{v}z} G_{z y}\\&\quad -\, \frac{1}{N}\sum _{u, {v}, z}G_{u {v}}(\mathbb {E}w_{x u}w_{{v} z})G_{z y}. \end{aligned} \end{aligned}$$
(5.6)

Note that the set B with \(y \in B\) here is arbitrary, e.g., it may depend on x and y. In fact, we will choose it to be a neighborhood of \(\{x,y\}\), momentarily.

Let \(A \subseteq B \) be another index set. We split the sum over \(z \in B\) in the second term on the right hand side of (5.6) into a sum over \(w \in A\) and \(w \in B{\setminus }A\) and use (2.22) again,

$$\begin{aligned} \sum _{z \in B}h_{{v}z} G_{z y}\,=\, \sum _{z \in A}a_{{v}z} G_{z y}+\sum _{z \in B }^Ah_{vz} G_{z y}+\frac{1}{\sqrt{N}}\sum _{z \in A}w_{{v}z} G_{z y}. \end{aligned}$$

We end up with the following decomposition of the error matrix \( {\mathbf{D}}= \sum _{k=1}^{5}{\mathbf{D}}^{(k)} \), where the entries \(d^{(k)}_{xy}\) of the individual matrices \({\mathbf{D}}^{(k)}\) are given by

$$\begin{aligned} d_{xy}^{(1)}\,&=\, -\, \frac{1}{\sqrt{N}}\sum _{u \in B_{xy}}w_{x u}G_{u y}, \end{aligned}$$
(5.7a)
$$\begin{aligned} d_{xy}^{(2)}\,&=\, \frac{1}{\sqrt{N}}\sum _{u,v}^{B_{xy}} \sum _{z \in A_{xy}}w_{x u}G_{u {v}}^{B_{xy}} a_{{v}z} G_{z y}, \end{aligned}$$
(5.7b)
$$\begin{aligned} d_{xy}^{(3)}\,&=\, \frac{1}{\sqrt{N}}\sum _{u,v}^{B_{xy}} \sum _{z \in B_{xy}}^{A_{xy}}w_{x u}G_{u {v}}^{B_{xy}} h_{{v}z} G_{z y}, \end{aligned}$$
(5.7c)
$$\begin{aligned} d_{xy}^{(4)}\,&=\, \frac{1}{{N}}\sum _{u,{v}}^{B_{xy}} \sum _{z \in A_{xy}}G_{u v}^{B_{xy}} (w_{x u}w_{{v}z}-\mathbb {E}w_{x u}w_{{v}z}) G_{z y}, \end{aligned}$$
(5.7d)
$$\begin{aligned} d_{xy}^{(5)}\,&=\, \frac{1}{{N}}\sum _{u,{v}}^{B_{xy}} \sum _{z \in A_{xy}}G_{u {v}}^{B_{xy}} (\mathbb {E}w_{x u}w_{{v}z}) G_{z y} - \frac{1}{N}\sum _{u, {v}, z}G_{u {v}}(\mathbb {E}w_{x u}w_{{v} z})G_{z y}, \end{aligned}$$
(5.7e)

and

$$\begin{aligned} \begin{aligned} B_{xy}\,:=\, B_{2N^{\varepsilon _1}}(x) \cup B_{2N^{\varepsilon _1}}(y),\qquad A_{xy}\,:=\, B_{N^{\varepsilon _1}}(x) \cup B_{N^{\varepsilon _1}}(y), \end{aligned} \end{aligned}$$
(5.8)

for some \(\varepsilon _1>0\). Note that although \({\mathbf{D}}\) itself does not depend on the choice of \(\varepsilon _1\), its decomposition into \({\mathbf{D}}^{(k)}\) does. We will estimate each error matrix \({\mathbf{D}}^{(k)}\) separately, where the estimates may still depend on \(\varepsilon _1\). Since \(\varepsilon _1>0\) is arbitrarily small, it is eliminated from the final bounds on \({\mathbf{D}}\) using the following property of the stochastic domination (Definition 3.3): if some positive random variables XY satisfy \( X \prec N^\varepsilon Y \) for every \( \varepsilon > 0\), then \( X \prec Y \).

The following lemma provides entrywise estimates on the individual error matrices.

Lemma 5.3

Let \(C>0\) a constant and \(\zeta \in \mathbb {H}\) with \({{\mathrm{dist}}}(\zeta , {{\mathrm{Spec}}}({\mathbf{H}}^B))^{-1}\prec N^{C}\) for all \(B \subsetneq \mathbb {X}\). The entries of the error matrices \({\mathbf{D}}^{(k)}={\mathbf{D}}^{(k)}(\zeta )\), defined in (5.7), satisfy the bounds

$$\begin{aligned} |d_{xy}^{(1)} |\,&\prec \, \frac{|B_{xy} |}{\!\sqrt{N}}\,\Vert {\mathbf{G}} \Vert _{\mathrm{max}}, \end{aligned}$$
(5.9a)
$$\begin{aligned} |d_{xy}^{(2)} |\,&\prec \, N|A_{xy} |\left( {\max _{u\not \in B_{xy}}\frac{ {{\mathrm{Im}}}\, G_{uu}^{B_{xy}}}{N\, {{\mathrm{Im}}}\, \zeta }}\right) ^{\!1/2} \left( {\max _{z \in A_{xy}}\sum _{v}^{B_{xy}}|a_{vz} |^2}\right) ^{\!1/2}\Vert {\mathbf{G}} \Vert _{\mathrm{max}}, \end{aligned}$$
(5.9b)
$$\begin{aligned} |d_{xy}^{(3)} |\,&\prec \, \frac{|B_{xy} |}{\sqrt{N\, {{\mathrm{Im}}}\, \zeta }} \max _{z \in B_{xy}}\frac{\left( {{\mathrm{Im}}}\, G_{zz}^{B_{xy}{\setminus }\{z\}}\right) ^{1/2} }{ |G_{zz}^{B_{xy}{\setminus }\{z\}} |}\,\Vert {\mathbf{G}} \Vert _{\mathrm{max}}, \end{aligned}$$
(5.9c)
$$\begin{aligned} |d^{(4)}_{xy} |\,&\prec \, |A_{xy} |\, \biggl (\frac{N^{-1}\!\sum _{u}^{B_{xy}} {{\mathrm{Im}}}\, G_{uu}^{B_{xy}}}{N\, {{\mathrm{Im}}}\, \zeta }\biggr )^{1/2} \Vert {\mathbf{G}} \Vert _{\mathrm{max}}, \end{aligned}$$
(5.9d)
$$\begin{aligned} |d^{(5)}_{xy} |\,&\prec \, |A_{xy} ||B_{xy} | \max _{k=0}^{|B_{xy} |-1}\biggl (\,\frac{{{\mathrm{Im}}}\, G_{x_kx_k}^{B_k}}{|G_{x_kx_k}^{B_k} |{N\, {{\mathrm{Im}}}\, \zeta }}\,+\,\left( {\frac{{{\mathrm{Im}}}\, G_{x_kx_k}^{B_k}}{{N^2}{{\mathrm{Im}}}\, \zeta }}\right) ^{1/2}\,\biggr )\,\Vert {\mathbf{G}} \Vert _{\mathrm{max}} \nonumber \\&+\,\left( {\frac{{{\mathrm{Im}}}\, G_{yy}}{N\, {{\mathrm{Im}}}\, \zeta }}\right) ^{1/2}\Vert {\mathbf{G}} \Vert _{\mathrm{max}}, \end{aligned}$$
(5.9e)

where the \((B_k)_{k=0}^{|B_{xy} |}\) in (5.9e) are an arbitrary increasing sequence of subsets of \(B_{xy}\) with \(B_{k+1}=B_{k}\cup \{x_k\}\) for some \(x_k \in B_{xy}\). In particular, \(\varnothing = B_0 \subsetneq B_1 \subsetneq \cdots \subsetneq B_{|B_{xy} |-1} \subsetneq B_{|B_{xy} |}=B_{xy}\).

Proof

We show the estimates (5.9a)–(5.9e) one by one. The bound (5.9a) is trivial since by the bounded moment assumption (2.23) the entries of \({\mathbf{W}}\) satisfy \(|w_{xy} |\prec 1\). For the proof of (5.9b) we simply use first Cauchy–Schwarz in the v-summation of (5.7b) and then the Ward-identity,

$$\begin{aligned} \begin{aligned} \sum _{u}^B\,|G^B_{xu}(\zeta ) |^2\,=\, \frac{{{\mathrm{Im}}}\, G^B_{xx}(\zeta )}{{{\mathrm{Im}}}\, \zeta },\quad B \subsetneq \mathbb {X},\;x \not \in B. \end{aligned} \end{aligned}$$
(5.10)

For (5.9c) we rewrite the entries of \({\mathbf{D}}^{(3)}\) in the form

$$\begin{aligned} \begin{aligned} d_{xy}^{(3)}\,=\, -\frac{1}{\sqrt{N}} \sum _{z \in B_{xy}}^{A_{xy}}\sum _{u}^{B_{xy}}w_{x u}\frac{G_{u z}^{B_{xy}{\setminus }\{z\}} }{G_{zz}^{B_{xy}{\setminus }\{z\}} }\, G_{z y}, \end{aligned} \end{aligned}$$
(5.11)

where we used the Schur complement formula in the form of the general resolvent expansion identity

$$\begin{aligned} G_{uz}^B\,=\, -\,G_{zz}^B\sum ^B_{v} G^{B \cup \{z\}}_{u {v}} h_{{v} z},\qquad B \subsetneq \mathbb {X},\;u,z \not \in B. \end{aligned}$$

To the u-summation in (5.11) we apply the large deviation estimate (A.34) of Lemma A.2 with the choices \(X_u:=w_{xy}\) and \(b_u:=G_{uz}^{B_{xy}{\setminus } \{z\}}\), i.e.

$$\begin{aligned} \begin{aligned} \left|\sum _{u}^{B_{xy}}w_{x u}G_{u z}^{B_{xy}{\setminus }\{z\}} \right| \,\prec \, \left( {\sum _{u}^{\,B_{xy}}\left|G_{u z}^{B_{xy}{\setminus }\{z\}} \right|^2}\right) ^{\!1/2}. \end{aligned} \end{aligned}$$
(5.12)

The assumption (A.32) of Lemma A.2 is an immediate consequence of the decay of correlation (2.25). In order to verify (A.33) we use both (2.25) and the N-dependent smoothness

$$\begin{aligned} \begin{aligned} \Vert \nabla _{{\mathbf{R}}}{\mathbf{G}}^B \Vert \,=\, N^{-1/2}\Vert {\mathbf{G}}^B{\mathbf{R}}{\mathbf{G}}^B \Vert \,\leqslant \, N^{2C}\Vert {\mathbf{R}} \Vert , \end{aligned} \end{aligned}$$
(5.13)

of the resolvent, where \(\nabla _{{\mathbf{R}}}\) denotes the directional derivative with respect to \({\mathbf{W}}^B\) in the direction \({\mathbf{R}}={\mathbf{R}}^* \in \mathbb {C}^{(N-|B |) \times (N-|B |)}\). For the inequality in (5.13) we used the assumption \({{\mathrm{dist}}}(\zeta , {{\mathrm{Spec}}}({\mathbf{H}}^B))\geqslant N^{-C}\) with high probability. By the Ward-identity (5.10) the bound (5.9c) follows from (5.12) and (5.11).

To show (5.9d) we employ the quadratic large deviation result Lemma A.3 with the choices

$$\begin{aligned} X\,:=\,(w_{xu})_{u \in \mathbb {X} {\setminus } B_{xy}},\qquad Y\,:=\,(w_{{v}y})_{{v} \in \mathbb {X} {\setminus } B_{xy}},\qquad b_{u{v}}\,:=\,G_{u {v}}^{B_{xy}}. \end{aligned}$$

The assumptions (A.46) and (A.47) are again easily verified using (2.25) and (5.13). Applying (A.48) on the (uv)-summation in (5.7d) we find

$$\begin{aligned} \left|\sum _{u,{v}}^{B_{xy}} G_{u {v}}^{B_{xy}} (w_{x u}w_{{v}z}-\mathbb {E}w_{x u}w_{{v}z})\, \right| \;\prec \; \left( {\,\sum _{u,{v}}^{B_{xy}} |G_{u {v}}^{B_{xy}} |^2}\right) ^{\!1/2} \!=\, \left( {\,\sum _{u}^{B_{xy}} \frac{{{\mathrm{Im}}}{G_{u u}^{B_{xy}}}}{{{\mathrm{Im}}}\, \zeta }}\right) ^{\!1/2} , \end{aligned}$$

where we used (5.10) again.

Finally, we turn to the proof of (5.9e). Let \(B_k\) be as in the statement of Lemma 5.3. We set

$$\begin{aligned} \alpha ^{(k)}_{xz}\,:=\, \frac{1}{{N}}\sum _{u,{v}}^{B_k} G_{u {v}}^{B_k}\,\mathbb {E}w_{x u}w_{{v}z}, \end{aligned}$$

and use a telescopic sum to write \(d^{(5)}_{xy}\) as

$$\begin{aligned} \begin{aligned} {d^{(5)}_{xy}}\,= \sum _{z \in A_{xy}}\sum _{k=0}^{|B_{xy} |-1}\left( \alpha ^{(k+1)}_{xz}\!-\alpha ^{(k)}_{xz}\right) G_{zy} - \frac{1}{N}\sum _{u, v}\sum _z^{A_{xy}}G_{u v}(\mathbb {E}w_{x u}w_{v z})G_{z y}. \end{aligned} \end{aligned}$$
(5.14)

We estimate the rightmost term in (5.14) simply by

$$\begin{aligned} \begin{aligned} \left|\frac{1}{N}\sum _{u, v}\sum _z^{A_{xy}}G_{u v}(\mathbb {E}w_{x u}w_{v z})G_{z y} \right|^2 \,&\leqslant \, \frac{\Vert {\mathbf{G}} \Vert _{\mathrm{max}}^2}{N^2}\, \sum _z^{A_{xy}} \left( {\,\sum _{u, v}\,|\mathbb {E}w_{x u}w_{v z} |}\right) ^{\!2}\, \sum _z^{A_{xy}}|G_{z y} |^2\\&\lesssim \Vert {\mathbf{G}} \Vert _{\mathrm{max}}^2\frac{{{\mathrm{Im}}}\, G_{yy}}{N\,{{\mathrm{Im}}}\, \zeta } , \end{aligned} \end{aligned}$$

where the sum over u and v on the right hand side of the first inequality is bounded by a constant because of the decay of covariances (3.12) and we used (5.10) in the second ineqality. Thus, (5.9e) follows from (5.14) and the bound

$$\begin{aligned} \begin{aligned} |\alpha ^{(k+1)}_{xz}-\alpha ^{(k)}_{xz} | \;\prec \; \frac{1}{N\, {{\mathrm{Im}}}\, \zeta }\frac{{{\mathrm{Im}}}\, G_{x_k x_k}^{B_k}}{|G_{x_k x_k}^{B_k} |} +\frac{1}{\sqrt{N}}\left( {\frac{{{\mathrm{Im}}}\, G_{x_kx_k}^{B_k}}{{N}{{\mathrm{Im}}}\, \zeta }\!}\right) ^{1/2} . \end{aligned} \end{aligned}$$
(5.15)

To show (5.15) we first see that

$$\begin{aligned} \begin{aligned} \alpha ^{(k+1)}_{xz}\!-\alpha ^{(k)}_{xz}&= -\frac{1}{{N}}\sum _{u,{v}}^{B_{k+1}} \frac{G_{\!u x_k}^{B_k}G_{\!x_k v}^{B_k}\!}{G^{B_k}_{x_kx_k}} \, \mathbb {E}w_{x u}w_{{v}z} \!-\frac{1}{{N}}\!\sum _{u}^{B_{k}}G_{\!u x_k}^{B_{k}} \mathbb {E}w_{x u}w_{x_kz} \\&\quad -\, \frac{1}{{N}}\sum _{{v}}^{\;B_{k+1}} G_{\!x_k v}^{B_{k}} \mathbb {E}w_{x x_k}w_{{v}z}, \end{aligned} \end{aligned}$$
(5.16)

where we used the general resolvent identity

$$\begin{aligned} G^{B}_{xy}\,=\,G^{B\cup \{u\}}_{xy}+ \frac{G^{B}_{xu}G^{B}_{uy}}{G^{B}_{uu}},\qquad B \subsetneq \{1, \ldots ,N\},\; x,y,u \not \in B,\; x \ne u,\; y \ne u. \end{aligned}$$

The last two terms on the right hand side of (5.16) are estimated by the second term on the right hand side of (5.15) using first Cauchy–Schwarz, the decay of covariances (3.12), and then the Ward-identity (5.10). For the first term in (5.16) we use the same argument as in (3.14) to see that (3.12) implies

$$\begin{aligned} \begin{aligned} \sum _{u,{v}} \,|r_u t_{v} |\, |\mathbb {E}w_{x u}w_{{v}z} |\,\lesssim \, \Vert {\mathbf{r}} \Vert \Vert {\mathbf{t}} \Vert , \end{aligned} \end{aligned}$$
(5.17)

for any two vectors \({\mathbf{r}},{\mathbf{t}} \in \mathbb {C}^{N \times N}\). We obtain (5.15) by applying (5.17) with the choice \(r_u:=G_{u x_k}^{B_k}\), \(t_{v}:=G_{x_k {v}}^{B_k}\) and using the Ward-identity afterwards. In this way (5.9e) follows and Lemma 5.3 is proven. \(\square \)

The following definition is motivated by the formula that expresses the matrix elements of \({\mathbf{G}}^{B}\) in terms of the matrix elements of \({\mathbf{G}}\). For \({\mathbf{R}} \in \mathbb {C}^{N \times N}\) and \(A,B \subsetneq \mathbb {X}\) we denote by \({\mathbf{R}}_{AB}:=(r_{xy})_{x \in A, y \in B}\) its submatrix. In case \(A=B\) we write \({\mathbf{R}}_{AB}={\mathbf{R}}_{A}\) for short. Then we have

$$\begin{aligned} \begin{aligned} {\mathbf{G}}^B_{\mathbb {X}{\setminus } B}\,=\, ({\mathbf{H}}_{\mathbb {X}{\setminus } B}-\zeta {\mathbf{1}})^{-1}\,=\, (({\mathbf{G}}^{-1})_{\mathbb {X}{\setminus } B})^{-1}. \end{aligned} \end{aligned}$$
(5.18)

In particular, \( ({\mathbf{G}}^B)_{\mathbb {X}\backslash B} = {\mathbf{G}}^B_{\mathbb {X}\backslash B}\).

Definition 5.4

For \(B \subsetneq \mathbb {X}\) we define the \(\mathbb {C}^{(N-|B |)\times (N-|B |)}\)-matrix

$$\begin{aligned} \begin{aligned} {\mathbf{M}}^B\,:=\,(({\mathbf{M}}^{-1})_{\mathbb {X}{\setminus } B})^{-1}. \end{aligned} \end{aligned}$$
(5.19)

Lemma 5.5

Let \(\delta >0\) and \(\zeta \in \mathbb {H}\) be such that \( \delta \leqslant {{\mathrm{dist}}}(\zeta ,[\kappa _-,\kappa _+])+\rho (\zeta )\leqslant \delta ^{-1} \). Then for all \( B \subsetneq \mathbb {X}\) the matrix \({\mathbf{M}}^B\), defined in (5.19), satisfies

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{M}}^B \Vert _{\underline{\gamma } \!\,}\,\lesssim _{\delta } 1, \end{aligned} \end{aligned}$$
(5.20)

for some sequence \(\underline{\gamma } \!\,\), depending only on \(\delta \) and the model parameters. For every \(x \not \in B\) we have

$$\begin{aligned} \begin{aligned} |m^B_{xx}(\zeta ) |\,\sim _\delta 1,\qquad {{\mathrm{Im}}}\, m^B_{xx}(\zeta )\,\sim _\delta \rho (\zeta ). \end{aligned} \end{aligned}$$
(5.21)

Furthermore, there is a positive constant c, depending only on \(\mathscr {K}\) and \(\delta \), such that

$$\begin{aligned} \begin{aligned} \max _{x,y\notin B}|G^B_{xy}(\zeta )-m^B_{xy}(\zeta ) |\,\mathbbm {1}\left( {\Lambda (\zeta ) \leqslant {\textstyle \frac{ c }{ 1+|B | }}\,}\right) \;\lesssim _{\delta }\; (1+ |B |)\,\Lambda (\zeta ). \end{aligned} \end{aligned}$$
(5.22)

Proof

We begin by establishing upper and lower bounds on the singular values of \({\mathbf{M}}^B\),

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{M}}^B \Vert \,\sim _\delta 1,\qquad \Vert ({\mathbf{M}}^B)^{-1} \Vert \,\sim _\delta 1. \end{aligned} \end{aligned}$$
(5.23)

We will make use of the following general fact: If \({\mathbf{R}}\in \mathbb {C}^{N \times N}\) satisfies \(\Vert {\mathbf{R}} \Vert \lesssim _\delta 1\), as well as

$$\begin{aligned} \begin{aligned} {{\mathrm{Im}}}\, {\mathbf{R}}\,\gtrsim _\delta {\mathbf{1}},\qquad \text {or}\qquad {{\mathrm{Re\,}}}{\mathbf{R}}\,\gtrsim _\delta {\mathbf{1}},\qquad \text {or}\qquad -{{\mathrm{Re\,}}}{\mathbf{R}}\gtrsim _\delta {\mathbf{1}}, \end{aligned} \end{aligned}$$
(5.24)

then any submatrix \({\mathbf{R}}_{A}\) of \({\mathbf{R}}\) satisfies

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{R}}_A \Vert \,\sim _\delta 1,\qquad \Vert ({\mathbf{R}}_A)^{-1} \Vert \,\sim _\delta \, 1, \qquad A \subseteq \mathbb {X}. \end{aligned} \end{aligned}$$
(5.25)

We verify (5.24) for \({\mathbf{R}}={\mathbf{M}}\) in two separate regimes and thus show (5.23). First let \(\zeta \) be such that \(\rho (\zeta )\geqslant \delta /2\). Then the lower bound in the imaginary part in (5.24) follows from (4.11) and (4.9).

Now let \(\zeta \) be such that \(\delta /2 \leqslant {{\mathrm{dist}}}(\zeta ,[\kappa _-,\kappa _+])\leqslant \delta ^{-1}\). Then we may also assume that we have \({{\mathrm{dist}}}({{\mathrm{Re\,}}}\zeta , [\kappa _-,\kappa _+])\geqslant \delta /4 \), because otherwise \({{\mathrm{Im}}}\, \zeta \geqslant \delta /4\) and thus \(\rho (\zeta )\gtrsim _\delta 1\). In this situation the claim follows from the case that we already considered, namely \(\rho (\zeta )\geqslant \delta /2\) because there \(\delta \) was arbitrary. Since \({\mathbf{M}}\) is the Stieltjes transform of a \(\mathscr {C}_+\)-valued measure with support in \([\kappa _-,\kappa _+]\) (cf. (2.5)), its real part is positive definite to the left of \(\kappa _-\) and negative definite to the right of \(\kappa _+\). In both cases we also have the effective bound \(|{{\mathrm{Re\,}}}{\mathbf{M}} |\gtrsim _\delta {\mathbf{1}}\) because \({{\mathrm{dist}}}({{\mathrm{Re\,}}}\zeta , [\kappa _-,\kappa _+])\geqslant \frac{\delta }{4} \).

Now we apply (5.23) to see (5.20). By (2.24) and (2.13) the right hand side of (2.20) and with it \({\mathbf{M}}^{-1}\) has faster than power law decay. The same is true for its submatrix with indices in \(\mathbb {X}{{\setminus }}B\). Thus (5.20) follows directly from the definition (5.19) of \({\mathbf{M}}^B\), the upper bound on its singular values from (5.23) and the Combes–Thomas estimate in Lemma 4.3.

To prove (5.21) we use

$$\begin{aligned} {{\mathrm{Im}}}\, {\mathbf{M}}^{B} \,=\, -({\mathbf{M}}^{B})^* {{\mathrm{Im}}}({\mathbf{M}}^{-1})_{\mathbb {X}{\setminus } B}{\mathbf{M}}^{B} \;\sim _{\delta }\, - {{\mathrm{Im}}}({\mathbf{M}}^{-1})_{\mathbb {X}{\setminus } B} \;\sim _\delta \, \rho {\mathbf{1}} , \end{aligned}$$

where we applied (5.23) for the first comparison relation and used \(-{{\mathrm{Im}}}\, {\mathbf{M}}^{-1}\sim _\delta \rho {\mathbf{1}}\) (cf. (4.11) and (4.10)) for the second. The bound on \({{\mathrm{Im}}}\, m_{xx}^B\) in (5.21) follows and the bound on \(|m_{xx}^B |\) follows at least in the regime \(\rho (\zeta )\geqslant \delta /2\). We are left with showing \(|m_{xx}^B |\gtrsim _\delta 1 \) in the case \(\delta /2 \leqslant {{\mathrm{dist}}}(\zeta ,[\kappa _-,\kappa _+])\leqslant \delta ^{-1}\). As we did above, we may assume that \({{\mathrm{dist}}}({{\mathrm{Re\,}}}\zeta , [\kappa _-,\kappa _+])\geqslant \frac{\delta }{4} \). We restrict to \({{\mathrm{Re\,}}}\zeta \leqslant \kappa _--\frac{\delta }{2}\). The case \({{\mathrm{Re\,}}}\zeta \geqslant \kappa _++\frac{\delta }{2}\) is treated analogously. In this regime

$$\begin{aligned} {{\mathrm{Re\,}}}{\mathbf{M}}^{B} \,=\, ({\mathbf{M}}^{B})^* {{\mathrm{Re\,}}}({\mathbf{M}}^{-1})_{\mathbb {X}{\setminus } B}{\mathbf{M}}^{B} \,\sim _\delta \, {{\mathrm{Re\,}}}({\mathbf{M}}^{-1})_{\mathbb {X}{\setminus } B}\,\sim _\delta {\mathbf{1}}, \end{aligned}$$

where we used \({{\mathrm{Re\,}}}{\mathbf{M}}^{-1}= ({\mathbf{M}}^{-1})^*({{\mathrm{Re\,}}}{\mathbf{M}}){\mathbf{M}}^{-1} \!\sim _\delta {{\mathrm{Re\,}}}{\mathbf{M}}\sim _\delta {\mathbf{1}}\) for the last comparison relation. Thus, (5.21) follows.

Now we show (5.22). By the Schur complement formula we have for any \({\mathbf{T}} \in \mathbb {C}^{N \times N}\) the identity

$$\begin{aligned} \bigl (({\mathbf{T}}^B)_{\{x,y\}}\bigr )^{-1} =\, \bigl ({\mathbf{T}}_{\{x,y\}}- {\mathbf{T}}_{\!\{x,y\}B}({\mathbf{T}}_{B})^{-1}{\mathbf{T}}_{\!B\{x,y\}}\bigr )^{-1} \,=\, (({\mathbf{T}}_{\!B \cup \{x,y\}})^{-1})_{\{x,y\}} ,\nonumber \\ \end{aligned}$$
(5.26)

for \(x,y \not \in B\) and \({\mathbf{T}}^B:=(({\mathbf{T}}^{-1})_{\mathbb {X}{\setminus } B})^{-1}\), provided all inverses exist. We will use this identity for \({\mathbf{T}}= {\mathbf{M}},{\mathbf{G}}\). Note that this definition \({\mathbf{T}}^B\) with \({\mathbf{T}}={\mathbf{G}}\) is consistent with the definition (5.4) on the index set \(\mathbb {X}{\setminus } B\) because of (5.18). Recalling that \({\mathbf{G}}_{B\cup \{x,y\}}=(G_{u,v})_{u,v \in B\cup \{x,y\}}\) and \({\mathbf{M}}_{B\cup \{x,y\}}\) are matrices of dimension \(|B|+2\), we have

$$\begin{aligned} \Vert {\mathbf{G}}_{B\cup \{x,y\}}-{\mathbf{M}}_{B\cup \{x,y\}} \Vert \,\leqslant \, (|B |+2)\,\big \Vert {\mathbf{G}}_{B\cup \{x,y\}}-{\mathbf{M}}_{B\cup \{x,y\}}\big \Vert _{\mathrm{max}} \,\leqslant \, (|B |+2)\Lambda . \end{aligned}$$

Therefore, as long as \((|B |+2)\Lambda \Vert ({\mathbf{M}}_{B\cup \{x,y\}})^{-1} \Vert \leqslant \frac{1}{2}\) we get

$$\begin{aligned} \big \Vert ({\mathbf{G}}_{B\cup \{x,y\}})^{-1}-({\mathbf{M}}_{B\cup \{x,y\}})^{-1}\big \Vert \,&\leqslant \, 2\,\big \Vert ({\mathbf{M}}_{B\cup \{x,y\}})^{-1}\big \Vert ^2 \big \Vert {{\mathbf{G}}_{B\cup \{x,y\}}-{\mathbf{M}}_{B\cup \{x,y\}}}\big \Vert \\&\lesssim _{\delta }\, (1+|B |)\Lambda , \end{aligned}$$

where we used in the last step that \(\Vert ({\mathbf{M}}_{B\cup \{x,y\}})^{-1} \Vert \sim _\delta 1\), which follows from using (5.24) and (5.25) for the choice \({\mathbf{R}}={\mathbf{M}}\) in the regimes \(\rho \gtrsim _\delta 1\) and \({{\mathrm{dist}}}({{\mathrm{Re\,}}}\zeta , [\kappa _-,\kappa _+])\gtrsim _\delta 1\), respectively.

Again using the definite signs of the imaginary and real part of \({\mathbf{M}}\) as well as that of \(({\mathbf{M}}_{B \cup \{x,y\}})^{-1}\) in these two regimes, we infer that

$$\begin{aligned} \big \Vert {\bigl ((({\mathbf{M}}_{B \cup \{x,y\}})^{-1})_{\{x,y\}}\bigr )^{-1}}\big \Vert \,\sim _\delta \, 1 , \end{aligned}$$

as well. We conclude that there is a constant c, depending only on \(\delta \) and \(\mathscr {K}\), such that

$$\begin{aligned}&\Big \Vert \bigl ((({\mathbf{G}}_{B \cup \{x,y\}})^{-1})_{\{x,y\}}\bigr )^{-1} \!- \bigl ((({\mathbf{M}}_{B \cup \{x,y\}})^{-1})_{\{x,y\}}\bigr )^{-1}\Big \Vert \, \mathbbm {1}\left( {\Lambda \leqslant \frac{c}{1+|B |}}\right) \\&\quad \,\lesssim _\delta \, (1+|B |)\Lambda . \end{aligned}$$

With the identity (5.26) the claim (5.22) follows and Lemma 5.5 is proven.\(\square \)

Proof of Lemmas 3.4 and 5.1

We begin with the proof of (3.5). We continue the estimates on all the error matrices listed in Lemma 5.3. Therefore, we fix \(\zeta =\tau +\mathrm {i}\eta \) with \(|\tau |\leqslant C\) and \(\eta \in [1,C]\). Since \({{\mathrm{Im}}}\, \zeta \geqslant 1\), we have the trivial resolvent bound and a lower bound on diagonal elements,

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{G}}^B \Vert \,\leqslant \, 1, \, \text {and}\, \frac{1}{|G_{xx}^B |}\,\prec \, 1,\qquad x \not \in B \subseteq \mathbb {X} . \end{aligned} \end{aligned}$$
(5.27)

Indeed, to get the lower bound we apply the Schur complement formula applied to the (xx)-element of the resolvent \({\mathbf{G}}^B=({\mathbf{H}}^B-\zeta {\mathbf{1}})^{-1}\) to obtain

$$\begin{aligned} -\frac{1}{G_{xx}^B}\,=\, \zeta -a_{xx}+\sum _{u,v}^Bh_{x u}G_{u v}^{B \cup \{x\}}h_{vx}. \end{aligned}$$

We take absolute value on both sides and estimate trivially,

$$\begin{aligned} \frac{1}{|G_{xx}^B |}\,\leqslant \, |\zeta |+|a_{xx} |+\Vert {\mathbf{G}}^{B \cup \{x\}} \Vert \,\sum _u\,|h_{xu} |^2 \,\prec \; 1. \end{aligned}$$

Here we used the first bound of (5.27) to control the norm of the resolvent and the assumptions (2.24) and (2.23) to bound \(\sum _u|h_{xu} |^2\). Combining (5.8) and and the assumption (2.11) we get

$$\begin{aligned} \begin{aligned} |A_{xy} |\,\leqslant \, |B_{xy} |\,\prec \, N^{\varepsilon _1P}. \end{aligned} \end{aligned}$$
(5.28)

Using (5.28) and (5.27) in the main estimates (5.9) for \( |d^{(k)}_{xy} | \)’s yields

$$\begin{aligned} \begin{aligned} |d_{xy} |\,\prec \, \frac{N^{2\varepsilon _1P}}{\sqrt{N}}+N^{\varepsilon _1P}N \frac{\kappa _2(\nu )}{N^{\varepsilon _1\nu }},\qquad \text {for all }\nu \in \mathbb {N}. \end{aligned} \end{aligned}$$
(5.29)

Here we also used that by assumption (2.24) for any \(\nu \in \mathbb {N}\) the expectation matrix satisfies

$$\begin{aligned} \begin{aligned} |a_{vz} | \,\leqslant \, \kappa _2(\nu ) N^{-\varepsilon _1 \nu } ,\qquad z \in A_{xy},\; v \not \in B_{xy}, \end{aligned} \end{aligned}$$
(5.30)

to obtain the second summand on the right hand side of (5.29) from estimating \(|d^{(2)}_{xy} |\). Since \(\varepsilon _1>0\) was arbitrary (5.29) implies (3.5).

Now we prove (3.6) and (5.2) in tandem. Let \(\delta >0\) and \(\zeta \in \mathbb {H}\) such that \(\delta \leqslant {{\mathrm{dist}}}(\zeta ,[\kappa _-,\kappa _+])+\rho (\zeta )\leqslant \delta ^{-1}\) and \({{\mathrm{Im}}}\, \zeta \geqslant N^{-1+\varepsilon }\). We show that

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{D}} \Vert _{\mathrm{max}}\,\mathbbm {1}(\Lambda \leqslant N^{-\varepsilon }) \,\prec \, \left( {\frac{\rho +\Lambda }{N\, {{\mathrm{Im}}}\, \zeta }}\right) ^{\!1/2}. \end{aligned} \end{aligned}$$
(5.31)

From (5.31) the bound (3.6) follows immediately in the regime where \(\rho \geqslant \delta \). Also (5.2) follows from (5.31). Indeed, in the regime of spectral parameters \(\zeta \in \mathbb {H}\) with \(\delta \leqslant {{\mathrm{dist}}}(\zeta ,[\kappa _-,\kappa _+])\leqslant \delta ^{-1}\) we have \(\rho \sim _\delta {{\mathrm{Im}}}\, \zeta \) because \(\rho \) is the harmonic extension of a probability density supported inside \([\kappa _-,\kappa _+]\).

For the proof of (5.31) we use (5.22), (5.21), (5.30), (5.28) and \(\Vert {\mathbf{G}} \Vert _{\mathrm{max}}\lesssim 1+\Lambda \) (cf. (4.9)) to estimate the right hand side of each inequality in (5.9). In this way we get

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{D}} \Vert _{\mathrm{max}}\,\mathbbm {1}(\Lambda \leqslant N^{-\varepsilon }) \,\prec \, N^{2\varepsilon _1P}N^{3/2}\left( {\frac{ \rho +\Lambda }{N\, {{\mathrm{Im}}}\, \zeta }}\right) ^{\!1/2} \frac{\kappa _2(\nu )}{N^{\varepsilon _1\nu }} + N^{2\varepsilon _1P} \left( {\frac{\rho +\Lambda }{N\, {{\mathrm{Im}}}\, \zeta }}\right) ^{\!1/2}, \end{aligned} \end{aligned}$$
(5.32)

for any \(\nu \in \mathbb {N}\), provided \(\varepsilon > \varepsilon _1 P\) to ensure \(N^{-\varepsilon } \leqslant c/|B_{xy}|\), i.e. that the constraint \(\Lambda \leqslant N^{-\varepsilon }\) makes (5.22) applicable. Here, we also used \(\rho \gtrsim _\delta {{\mathrm{Im}}}\, \zeta \) to see that \(\frac{\rho }{N\, {{\mathrm{Im}}}\, \zeta }\gtrsim _\delta \frac{1}{N}\). Since (5.32) holds for arbitrarily small \(\varepsilon _1>0\), the claim (5.31) and with it Lemmas 3.4 and 5.1 are proven. \(\square \)

6 Fluctuation averaging

In this section we prove Proposition 3.5 by which a error bound \(\Psi \) for the entrywise local law can be used to improve the bound on the error matrix \({\mathbf{D}}\) to \(\Psi ^2\), once \({\mathbf{D}}\) is averaged against a non-random matrix \({\mathbf{R}}\) with faster than power law decay.

Proof of Proposition 3.5

Let \({\mathbf{R}} \in \mathbb {C}^{N \times N}\) with \(\Vert {\mathbf{R}} \Vert _\beta \leqslant 1\) for some positive sequence \(\beta \). Within this proof we use Convention 4.1 such that \(\varphi \lesssim \psi \) means \(\varphi \leqslant C \psi \) for a constant C, depending only on \(\widetilde{\mathscr {P}}:=(\mathscr {K}, \delta , \varepsilon _1,\beta ,C)\), where C and \(\delta \) are the constants from the statement of the proposition, \(\mathscr {K}\) are the model parameters (cf (2.27)) and \(\varepsilon _1\) enters in the splitting of the error matrix \({\mathbf{D}}\) into \({\mathbf{D}}^{(k)}\) (cf. (5.8)). Note that since \(\varepsilon _1\) is arbitrary it suffices to show (3.8) up to factors of \(N^{\varepsilon _1}\). We will also use the notation \(\mathcal {O}_\prec (\Phi )\) for a random variable that is stochastically dominated by some nonnegative \(\Phi \).

We split the expression \(\langle {\mathbf{R}},{\mathbf{D}} \rangle \) from (3.8) according to the definition (5.7) of the matrices \({\mathbf{D}}^{(k)}\). Then we estimate \(\langle {\mathbf{R}},{\mathbf{D}}^{(k)} \rangle \) separately for every k. We do this in three steps. First we estimate \(\Vert {\mathbf{D}}^{(k)} \Vert _{\mathrm{max}}\) for \(k=2,3,5\) directly without using the averaging effect of \({\mathbf{R}}\). Afterwards we show the bounds on \(\langle {\mathbf{R}},{\mathbf{D}}^{(1)} \rangle \) and \(\langle {\mathbf{R}},{\mathbf{D}}^{(4)} \rangle \), respectively. In the upcoming arguments the following observation will be useful. The local law (3.7) together with (5.22) implies that for every \(B \subseteq \mathbb {X}\) with \(|B |\leqslant N^{\varepsilon /2}\) we have

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{M}}^B\!-{\mathbf{G}}^B \Vert _{\mathrm{max}}\,\prec \, (1+|B |)\Psi . \end{aligned} \end{aligned}$$
(6.1)

Here, until the end of this proof, we consider \({\mathbf{G}}^B\) as the \(\mathbb {C}^{(N-|B |) \times (N-|B |)}\)-matrix \({\mathbf{G}}^B=(G_{xy}^B)_{x,y \not \in B}\) as opposed to the general convention (5.3).

Estimating \(\Vert {\mathbf{D}}^{(k)} \Vert _{\mathrm{max}}\): Here, we show that under the assumption (3.7) the error matrices with indices \(k=2,3,5\) satisfy the improved entrywise bound

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{D}}^{(k)} \Vert _{\mathrm{max}}\,\prec \, N^{3\varepsilon _1 P}\Psi ^2,\qquad k=2,3,5, \end{aligned} \end{aligned}$$
(6.2)

where \(\varepsilon _1\) stems from (5.8) and P is the model parameter from (2.11).

We start by estimating the entries of \({\mathbf{D}}^{(2)}\). Directly from its definition in (5.7b) we infer

$$\begin{aligned} |d_{xy}^{(2)} | \;\prec \; N^{3/2} |A_{xy} |\, \Vert {\mathbf{G}} \Vert _{\mathrm{max}} \Vert {\mathbf{G}}^{B_{xy}} \Vert _{\mathrm{max}}\, {\textstyle \max _{z \in A_{xy}}\!\sum _{\,{v}}^{B_{xy}}} |a_{{v}z} |. \end{aligned}$$

The maximum norm on the entries of the resolvents \({\mathbf{G}}\) and \({\mathbf{G}}^{B_{xy}}\) are bounded by (6.1) and (5.20). The decay (2.12) of the entries of the bare matrix and that \(d({v},z)\geqslant N^{\varepsilon _1}\) in the last sum then imply \(\Vert {\mathbf{D}}^{(2)} \Vert _{\mathrm{max}}\prec N^{-{\nu }}\) for any \({\nu } \in \mathbb {N}\).

To show (6.2) for \(k=3\) we use the representation (5.11) and the large deviation estimate (5.12) just as we did in the proof of Lemma 5.3. In this way we get

$$\begin{aligned} |d_{xy}^{(3)} | \,\prec \, \frac{|B_{xy} |}{\sqrt{N}}\,\biggl (\,\max _{z \not \in A_{xy}}\sum _{u}^{B_{xy}}|G_{u z}^{B_{xy}{\setminus }\{z\}} |^2\biggr )^{\!1/2}\frac{|G_{zy} |}{\left|G^{B_{xy}{\setminus }\{z\}}_{zz} \right|}. \end{aligned}$$

Now we use (6.1), (5.20), (5.21) and (5.28) to conclude

$$\begin{aligned} \Vert {\mathbf{D}}^{(3)} \Vert _{\mathrm{max}} \,\prec \, \frac{N^{\varepsilon _1 P}\!}{\sqrt{N}}\biggl (\Psi +\max _{ z \not \in A_{xy}}|m_{zy} |\biggr ). \end{aligned}$$
(6.3)

The faster than power law decay of \({\mathbf{M}}\) from (2.15) together with the definition of \(A_{xy}\) in (5.8) implies \( \max _{z \not \in A_{xy}}|m_{zy} | \leqslant C({\nu }) N^{-\varepsilon _1{\nu }} \) for any \({\nu }\in \mathbb {N}\). Since \(\Psi \geqslant N^{-1/2}\) we infer (6.2) for \(k=3\) from (6.3).

Finally we consider the case \(k=5\). We follow the proof of Lemma 5.3 and use the representation (5.14). We estimate the two summands on the right hand side of (5.14), starting with the second term. We rewrite this term in the form \(\sum _z^{A_{xy}}\mathcal {S}_{xz}[{\mathbf{G}}]G_{zy}\) and use (2.13) as well as \(\Vert {\mathbf{G}} \Vert _{\mathrm{max}}\leqslant \Vert {\mathbf{M}} \Vert _{\mathrm{max}}+\Psi \) together with the upper bound on \({\mathbf{M}}\) in (2.15).

To bound the first term on the right hand side of (5.14) we use (5.16). Each of the three terms on the right hand side of (5.16) has to be bounded by \(N^{2\varepsilon _1 P}\Psi ^2\). The second and third term are bounded by \(\frac{1}{N}\) by the decay of covariances (3.12). For the first term we use (5.17), (6.1) and (5.20).

Estimating \(\langle {\mathbf{R}},{\mathbf{D}}^{(1)} \rangle \): Here we will show that

$$\begin{aligned} |\langle {\mathbf{R}},{\mathbf{D}}^{(1)} \rangle |\,\prec \,N^{CP\varepsilon _1} \Psi ^2, \end{aligned}$$
(6.4)

for some numerical constant \(C>0\).

We split the error matrix \({\mathbf{D}}^{(1)}\) into two pieces \( {\mathbf{D}}^{(1)}= {\mathbf{D}}^{\mathrm{(1a)}}+{\mathbf{D}}^{\mathrm{(1b)}} \), defined by

$$\begin{aligned} d_{xy}^{\mathrm{(1a)}}\,:=\, - \frac{1}{\sqrt{N}}\sum _{u \in B_{xy}}w_{x u}m_{u y}, \qquad \text {and}\qquad d_{xy}^{\mathrm{(1b)}}\,:=\, - \frac{1}{\sqrt{N}}\sum _{u \in B_{xy}}w_{x u}(G_{uy}-m_{u y}), \end{aligned}$$

where \( B_{xy} \) is a \( 2N^{\varepsilon _1}\)-environment of the set \( \{ {x,y} \} \) (cf. (5.8)). The second part is trivially bounded, \( \Vert {\mathbf{D}}^{\mathrm{(1b)}} \Vert _{\mathrm{max}} \prec N^{\varepsilon _1 P} \Psi ^2 \), using the local law (6.1), with \( B = \varnothing \).

For the bound on \(\langle {\mathbf{R}},{\mathbf{D}}^{\mathrm{(1a)}} \rangle \) we write

$$\begin{aligned} \begin{aligned} \langle {\mathbf{R}},{\mathbf{D}}^{(1a)} \rangle \,=\, X+Y+Z , \end{aligned} \end{aligned}$$
(6.5)

where the term on the right hand side are sums of \( \sigma _{xuy}:=N^{-3/2}\,\overline{r} \!\,_{xy}w_{x u}m_{u y} \) over disjoint domains

$$\begin{aligned} \begin{aligned} X\,:=\, \sum _{x}\sum _{y}^{B_x^1}\sum _{u \in B_{x}^2}^{B_y^2}\sigma _{xuy}, \quad Y\,:=\, \sum _{x}\sum _{y}^{B_x^1}\sum _{u \in B_{y}^2}\sigma _{xuy}, \quad Z\,:=\, \sum _{x}\sum _{y\in B_x^1}\sum _{u \in B_{xy}}\sigma _{xuy}, \end{aligned} \end{aligned}$$

expressed in terms of the following metric balls:

$$\begin{aligned} B_x^k\,:=\, B_{kN^{\varepsilon _1}}(x). \end{aligned}$$

The fast decay of off-diagonal entries for \({\mathbf{R}}\) and \({\mathbf{M}}\), \(|r_{xy} |+|m_{xy} |\lesssim \frac{1}{N}\) for \(d(x,y)\geqslant N^{\varepsilon _1}\) (cf. (2.15)), yields immediately \( |X | \prec _\mu N^{2P\varepsilon _1\mu }N^{-3\mu } \). This suffices for (6.4). The off-diagonal decay also yields

$$\begin{aligned} \mathbb {E}|Y |^{2\mu }\,&\lesssim _\mu \frac{N^{2P\varepsilon _1\mu }}{N^{5\mu }}\sum _{{\mathbf{x}},{\mathbf{u}}} \left|\,\mathbb {E}\prod _{i=1}^\mu w_{x_iu_i}\overline{w} \!\,_{x_{\mu +i}u_{\mu +i}} \right|, \end{aligned}$$
(6.6a)
$$\begin{aligned} \mathbb {E}|Z |^{2\mu }\,&\lesssim _\mu \frac{N^{2P\varepsilon _1\mu }}{N^{3\mu }}\sum _{{\mathbf{x}}}\sum _{{\mathbf{u}} \in B_{{\mathbf{x}}}^3} \left|\,\mathbb {E}\prod _{i=1}^\mu w_{x_iu_i}\overline{w} \!\,_{x_{\mu +i}u_{\mu +i}} \right| , \end{aligned}$$
(6.6b)

where the sums are over index tuples \({\mathbf{x}}=(x_1,\ldots ,x_{2\mu }) \in \mathbb {X}^{2\mu }\) and \(B_{{\mathbf{x}}}^k:=B_{kN^{\varepsilon _1}}({\mathbf{x}})\) is the ball around \({\mathbf{x}}\) with respect to the product metric

$$\begin{aligned} d({\mathbf{x}},{\mathbf{y}}):=\max _{i=1}^{2\mu } d(x_i,y_i). \end{aligned}$$

In (6.6b) we have used the triangle inequality to conclude that \( d({\mathbf{u}},{\mathbf{x}}) \leqslant 3N^{\varepsilon _1} \). For Y and Z we continue the estimates in (6.6) by using the decay of correlations (2.25) and the ensuing lumping of index pairs \((x_i,u_i)\):

$$\begin{aligned} \begin{aligned} \left|\mathbb {E}\prod _{i=1}^\mu w_{x_iu_i}\overline{w} \!\,_{x_{\mu +i}u_{\mu +i}} \right| \,\lesssim _{\mu ,{\nu }} {\left\{ \begin{array}{ll} N^{-{\nu }}&{}\exists \,i \text { s.t. } d_{\mathrm{sym}} ((x_i,u_i),\{(x_j,u_j): j\ne i\}){\geqslant } N^{\varepsilon _1}; \\ 1&{} \text { otherwise, } \end{array}\right. } \end{aligned} \end{aligned}$$
(6.7)

where \(d_{\mathrm{sym}}((x_1,x_2),(y_1,y_2)):=d((x_1,x_2), \{(y_1,y_2),(y_2,y_1)\})\) is the symmetrized distance on \(\mathbb {X}^2\), induced by d. Inserting (6.7) into the moment bounds on Y and Z effectively reduces the combinatorics of the sum in (6.6a) from \(N^{4\mu }\) to \(N^{2\mu }\) and in (6.6b) from \(N^{2\mu }\) to \(N^{\mu }\). We conclude that \(|\langle {\mathbf{R}},{\mathbf{D}}^{(1\mathrm{a})} \rangle | \prec N^{CP\varepsilon _1}N^{-1} \). Moreover, together with \(\Psi \geqslant N^{-1/2}\) and our earlier estimate for \( \langle {\mathbf{R}},{\mathbf{D}}^{(1\mathrm{b})} \rangle \) this yields (6.4).

Estimating \(\langle {\mathbf{R}},{\mathbf{D}}^{(4)} \rangle \): Similarly to the strategy for estimating \(|\langle {\mathbf{R}},{\mathbf{D}}^{(1)} \rangle |\) we write

$$\begin{aligned} {\mathbf{D}}^{(4)}= & {} {\mathbf{D}}^{\mathrm{(4a)}}+{\mathbf{D}}^{\mathrm{(4b)}},\\ d_{xy}^{\mathrm{(4a)}}:= & {} \sum _{z \in A_{xy}}Z^{B_{xy}}_{xz}m_{zy},\quad d_{xy}^{\mathrm{(4b)}}:= \sum _{z \in A_{xy}}Z^{B_{xy}}_{xz}(G_{zy}-m_{zy}), \end{aligned}$$

where \( A_{xy} \) is from (5.8), and we have introduced for any \(B \subsetneq \mathbb {X} \) the short hand

$$\begin{aligned} \begin{aligned} Z^B_{xz}:=\frac{1}{{N}}\sum _{u,{v}}^{B}G_{u {v}}^{B}\, (w_{x u}w_{{v}z}-\mathbb {E}w_{x u}w_{{v}z}) . \end{aligned} \end{aligned}$$
(6.8)

From the decay of correlations (2.25) and \(d(\{x,z\},\mathbb {X}{\setminus } B_{xy})\geqslant N^{\varepsilon _1}\) for any \(z \in A_{xy}\) as well as the N-dependent smoothness of the resolvent as a function of the matrix entries of \({\mathbf{W}}\) for \({{\mathrm{dist}}}(\zeta , {{\mathrm{Spec}}}({\mathbf{H}}))\geqslant N^{-C}\) we see that Lemma A.3 can be applied for a large deviation estimate on the (uv)-sum in the definition (6.8) of \(Z^B_{xz}\) for \(B=B_{xy}\), i.e.

$$\begin{aligned} \begin{aligned} |Z_{xz}^{B_{xy}} | \,\prec \, \biggl (\frac{1}{N^2}\sum ^{B_{xy}}_{u,v}|G_{uv}^{B_{xy}} |^2\biggr )^{\!1/2} \!\prec \, N^{\varepsilon _1P}\Psi . \end{aligned} \end{aligned}$$
(6.9)

Here we also used (6.1) and (5.20) for the second stochastic domination bound. Combining (6.9) with (6.1) we see that

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{D}}^{\mathrm{(4b)}} \Vert _{\mathrm{max}}\,\prec \, N^{2\varepsilon _1P}\Psi ^2. \end{aligned} \end{aligned}$$
(6.10)

The rest of the proof of Proposition 3.5 is dedicated to showing the high moment bound

$$\begin{aligned} \begin{aligned} \mathbb {E}|\langle {\mathbf{R}},{\mathbf{D}}^{\mathrm{(4a)}} \rangle |^{2\mu }\,\lesssim _\mu \,N^{C(\mu )\varepsilon _1}\Psi ^{2\mu }. \end{aligned} \end{aligned}$$
(6.11)

Together with (6.2), (6.4) and (6.10) this bound implies (3.8) since \(\varepsilon _1\) can be chosen arbitrarily small. In analogy to (6.5) we write \( \langle {\mathbf{R}},{\mathbf{D}}^{\mathrm{(4a)}} \rangle = X+Y+Z \), where the three terms on the right hand side are obtained by summing \( \sigma _{xzy}:=N^{-1}\overline{r} \!\,_{xy}Z_{x z}^{B_{xy}}m_{z y} \) over disjoint sets of indices:

$$\begin{aligned} \begin{aligned} X\,:=\, \sum _{x}\sum _{y}^{B_x^1}\sum _{z \in B_{x}^1}^{B_y^1}\sigma _{xzy} , \quad Y\,:=\, \sum _{x}\sum _{y}^{B_x^1}\sum _{z \in B_{y}^1}\sigma _{xzy} , \quad Z\,:=\, \sum _{x}\sum _{y\in B_x^1}\sum _{z \in A_{xy}}\sigma _{xzy}. \end{aligned} \end{aligned}$$

Similar to (6.6) the fast decay of off-diagonal entries of both \({\mathbf{R}}\) and \({\mathbf{M}}\), and the a priori bound (6.9) immediately yield \( |X | \prec _\mu N^{4\varepsilon _1P\mu } N^{-2\mu }\Psi ^{2\mu } \). Since this is already sufficient for (6.11), we focus on the terms Y and Z in (6.12). Using again the decay of off-diagonal entries yields:

$$\begin{aligned} \mathbb {E}|Y |^{2\mu }\,&\lesssim _\mu \frac{1}{N^{4\mu }}\sum _{{\mathbf{x}},{\mathbf{y}}}\sum _{{\mathbf{z}} \in B_{{\mathbf{y}}}^1} \left|\,\mathbb {E}\prod _{i=1}^\mu Z^{B_{x_iy_i}}_{x_iz_i}\overline{Z} \!\,^{B_{x_{\mu +i}y_{\mu +i}}}_{x_{\mu +i}z_{\mu +i}} \right|, \end{aligned}$$
(6.12a)
$$\begin{aligned} \mathbb {E}|Z |^{2\mu }\,&\lesssim _\mu \frac{1}{N^{2\mu }}\sum _{{\mathbf{x}}}\sum _{{\mathbf{y}} \in B_{{\mathbf{x}}}^1}\sum _{{\mathbf{z}} \in B_{{\mathbf{x}}}^1\cup B_{{\mathbf{y}}}^1}\left|\,\mathbb {E}\prod _{i=1}^\mu Z^{B_{x_iy_i}}_{x_iz_i}\overline{Z} \!\,^{B_{x_{\mu +i}y_{\mu +i}}}_{x_{\mu +i}z_{\mu +i}} \right|. \end{aligned}$$
(6.12b)

We call the subscripts i of the indices \( x_i \) and \( z_i \)labels. In order to further estimate the moments of Y and Z we introduce the set of lone labels of \( ({\mathbf{x}},{\mathbf{z}}) \):

$$\begin{aligned} \begin{aligned} L({\mathbf{x}},{\mathbf{z}}) \,:=\, \Bigl \{{i : d\left( {\{x_i,z_i\},{\textstyle \bigcup _{j \ne i}}\{x_j,z_j\} }\right) \geqslant 3N^{\varepsilon _1} }\Bigr \}. \end{aligned} \end{aligned}$$
(6.13)

The corresponding index pair \((x_i, z_i)\) for \( i\in L({\mathbf{x}},{\mathbf{z}}) \), is called lone index pair. We partition the sums in (6.12a) and (6.12b) according to the number of lone labels, i.e. we insert the partition of unity \( 1 = \sum _{\ell =0}^{2\mu }\mathbbm {1}(|L({\mathbf{x}},{\mathbf{z}}) |=\ell ) \). A simple counting argument reveals that fixing the number of lone labels reduces the combinatorics of the sums in (6.12a) and (6.12b). More precisely,

$$\begin{aligned} \begin{aligned} \sum _{{\mathbf{x}},{\mathbf{y}}}\sum _{{\mathbf{z}} \in B_{{\mathbf{y}}}^1} \mathbbm {1}(|L({\mathbf{x}},{\mathbf{z}}) |=\ell \,) \,&\leqslant \, N^{C \mu \varepsilon _1}N^{2\mu +\ell }, \\ \sum _{{\mathbf{x}}}\sum _{{\mathbf{y}} \in B_{{\mathbf{x}}}^1} \sum _{\quad {\mathbf{z}} \in B_{{\mathbf{x}}}^1\cup B_{{\mathbf{y}}}^1} \mathbbm {1}(|L({\mathbf{x}},{\mathbf{z}}) |=\ell \,) \,&\leqslant \, N^{C \mu \varepsilon _1}N^{\mu +\ell /2}. \end{aligned} \end{aligned}$$
(6.14)

The expectation in (6.12a) and (6.12b) is bounded using the following technical result.

Lemma 6.1

(Key estimate for averaged local law) Assume the hypotheses of Proposition 3.5 hold, let \( \mu \in \mathbb {N}\) and \( {\mathbf{x}},{\mathbf{y}} \in \mathbb {X}^{2\mu }\). Suppose there are \( 2\mu \) subsets \( Q_1,\ldots , Q_{2\mu } \) of \( \mathbb {X} \), such that \( B_{N^{\varepsilon _1}}(x_i,y_i) \subseteq Q_i\subseteq B_{3N^{\varepsilon _1}}(x_i,y_i)\) for each i. Then

$$\begin{aligned} \begin{aligned} \left|\, \mathbb {E}\, \prod _{i=1}^{\mu } Z^{(Q_i)}_{x_iy_i} \overline{Z} \!\,^{(Q_{\mu +i})}_{x_{\mu +i}y_{\mu +i}} \right| \,&\lesssim _{\mu }\, N^{C(\mu )\varepsilon _1} \, \Psi ^{2\mu \,+|L({\mathbf{x}},{\mathbf{y}}) |}. \end{aligned} \end{aligned}$$
(6.15)

Using (6.14) and Lemma 6.1 on the right hand sides of (6.12a) and (6.12b) after partitioning according to the number of lone labels, yields

$$\begin{aligned} \begin{aligned} \mathbb {E}|Y |^{2\mu }\,\lesssim _\mu \frac{N^{C(\mu )\varepsilon _1}}{N^{4\mu }}\sum _{\ell =0}^{2\mu }\Psi ^{2\mu +\ell }N^{2\mu +\ell }, \mathbb {E}|Z |^{2\mu }\,\lesssim _\mu \frac{N^{C(\mu )\varepsilon _1}}{N^{2\mu }}\sum _{\ell =0}^{2\mu }\Psi ^{2\mu +\ell }N^{\mu +\ell /2}. \end{aligned} \end{aligned}$$
(6.16)

Since \(\Psi \geqslant N^{-1/2}\) the high moment bounds in (6.16) together with the simple estimate for X imply (6.11). This finishes the proof of Proposition 3.5 up to verifying Lemma 6.1 which will occupy the rest of the section. \(\square \)

Proof of Lemma 6.1

Let us consider the data \( \xi := ({\mathbf{x}},{\mathbf{y}},(Q_i)_{i=1}^{2\mu }) \) fixed. We start by writing the product on the left hand side of (6.15) in the form.

$$\begin{aligned} \begin{aligned} \prod _{i=1}^{\mu } Z^{(Q_i)}_{x_iy_i} \overline{Z} \!\,^{(Q_{\mu +i})}_{x_{\mu +i}y_{\mu +i}} \,=\, \sum _{{\mathbf{u}},{\mathbf{v}}} w_{{\mathbf{x}},{\mathbf{y}}}({\mathbf{u}},{\mathbf{v}})\, \Gamma _{\!\xi } ({\mathbf{u}},{\mathbf{v}}) , \end{aligned} \end{aligned}$$
(6.17)

where the two auxiliary functions \( \Gamma _{\!\xi }, w_{{\mathbf{x}},{\mathbf{y}}} : \mathbb {X}^{2\mu } \times \mathbb {X}^{2\mu } \rightarrow \mathbb {C}\), are defined by

$$\begin{aligned} \Gamma _{\!\xi } ({\mathbf{u}},{\mathbf{v}}) \,&:=\, \mathbbm {1}\bigl \{{u_i,{v}_i\notin Q_i,\forall i=1,\ldots ,2\mu }\bigr \} \prod _{i=1}^{\mu } G^{(Q_i)}_{u_i {v}_i} \overline{G} \!\,^{(Q_{\mu +i})}_{u_{\mu +i} {v}_{\mu +i}}, \end{aligned}$$
(6.18a)
$$\begin{aligned} w_{{\mathbf{x}},{\mathbf{y}}}({\mathbf{u}},{\mathbf{v}}) \,&:= \prod _{i=1}^{\mu } w_{x_iy_i}(u_i,{v}_i) \,\overline{w} \!\,_{x_{\mu +i}y_{\mu +i}}(u_{\mu +i},{v}_{\mu +i}), \end{aligned}$$
(6.18b)

and

$$\begin{aligned} \begin{aligned} w_{xy}(u,{v}) \,:=\,\frac{1}{N}(w_{xu}w_{{v}y}-\mathbb {E}\,w_{xu}w_{{v}y}) . \end{aligned} \end{aligned}$$
(6.19)

\(\square \)

In order to estimate (6.17) we partition the sum over the indices \( u_i \) and \( {v}_i \) depending on their distance from the set of lone index pairs, \( (x_i,y_i) \) with \( i \in L \), where \( L = L({\mathbf{x}},{\mathbf{y}}) \). To this end we introduce the partition \( \{{B_i:i \in \{{0}\}\cup L}\} \) of \( \mathbb {X} \),

$$\begin{aligned} \begin{aligned} B_i \,:=\, {\left\{ \begin{array}{ll} \;B_{N^{\varepsilon _1}}(x_i)\cup B_{N^{\varepsilon _1}}(y_i) &{}\quad \text {when }\;i \in L, \\ \mathbb {X}\backslash \bigcup \nolimits _{j\in L} (B_{N^{\varepsilon _1}}(x_j)\cup B_{N^{\varepsilon _1}}(y_j)) &{}\quad \text {when }\;i = 0, \end{array}\right. } \end{aligned} \end{aligned}$$
(6.20)

and the shorthand

$$\begin{aligned} \begin{aligned} \mathbb {B}(\xi ,\sigma ) \,:=\, \Bigl \{{({\mathbf{u}},{\mathbf{v}}) \in \mathbb {X}^{4\mu }: u_i \in B_{\sigma '_i}\!\backslash Q_i,\;{v}_i \in B_{\sigma ''_i}\!\backslash Q_i,\,i=1,\ldots ,2\mu }\Bigr \}, \end{aligned} \end{aligned}$$
(6.21)

where the components \( \sigma _i = (\sigma '_i,\sigma ''_i) \in (\{{0}\} \cup L)^2 \) of \( \sigma = (\sigma _i)_{i=1}^{2\mu } \) specify whether \(u_i\) and \(v_i\) are close to a lone index pair or not; e.g. \(\sigma '_i\) determines which lone index \(u_i\) is close to, if any. For any fixed \(\xi \), as \(\sigma \) runs through all possible elements of \((\{{0}\} \cup L)^{4\mu }\), the sets \(\mathbb {B}(\xi ,\sigma ) \) form a partition of the summation set on the right hand side of (6.17) (taking into account the restriction \(u_i, {v}_i\not \in Q_i\)). Therefore it will be sufficient to estimate

$$\begin{aligned} \begin{aligned} \sum _{\quad ({\mathbf{u}},{\mathbf{v}}) \in \mathbb {B}(\xi ,\sigma ) } w_{{\mathbf{x}},{\mathbf{y}}}({\mathbf{u}},{\mathbf{v}})\, \Gamma _{\!\xi } ({\mathbf{u}},{\mathbf{v}}) \end{aligned} \end{aligned}$$
(6.22)

for every fixed \(\sigma \in (\{{0}\} \cup L)^{4\mu }\). Since \( x_i \) and \( y_i \) are fixed, while \( u_i \) and \( v_i \) are free variables, with their domains depending on \( (\xi ,\sigma ) \), we say that the former are external indices and the latter are internal indices.

Let us define the set of isolated labels,

$$\begin{aligned} \begin{aligned} \widehat{L}({\mathbf{x}},{\mathbf{y}},\sigma ) \,=\, L({\mathbf{x}},{\mathbf{y}}) \backslash \{{\sigma '_1,\ldots ,\sigma '_{2\mu },\sigma ''_1,\ldots ,\sigma ''_{2\mu }}\} , \end{aligned} \end{aligned}$$
(6.23)

so that if an external index has an isolated label as subscript, then it is isolated from all the other indices in the following sense:

$$\begin{aligned}&d\left( {\{{x_i,y_i}\},\,{\textstyle \bigcup _{j=1}^{2\mu }}\{{u_j,{v}_j}\} \cup \, {\textstyle \bigcup _{j\ne i}}\{{x_j,y_j}\}}\right) \geqslant N^{\varepsilon _1}, \\&\quad ({\mathbf{u}},{\mathbf{v}}) \in \mathbb {B}(\xi ,\sigma ), \; i \in \widehat{L}({\mathbf{x}},{\mathbf{y}},\sigma ). \end{aligned}$$

Notice that isolated labels indicate not only separation from all other external indices, as lone labels do, but also from all internal indices. Given a resolvent entry \( G^{B}_{u{v}} \) we will refer to uv as lower indices and the set B as an upper index set.

The next lemma, whose proof we postpone until the end of this section, yields an algebraic representation for (6.22) provided the internal indices are properly restricted.

Lemma 6.2

(Monomial representation) Let \( \xi \) and \( \sigma \) be fixed. Then the restriction \( \Gamma _{\!\xi }|_{\mathbb {B}(\xi ,\sigma )} \) of the function (6.18a) to the subset \( \mathbb {B}(\xi ,\sigma ) \) of \( \mathbb {X}^{4\mu }\) has a representation

$$\begin{aligned} \begin{aligned} \Gamma _{\!\xi }|_{\mathbb {B}(\xi ,\sigma )} \,= \sum _{\alpha =1}^{\; M(\xi ,\sigma )} \Gamma _{\!\xi ,\sigma ,\alpha } , \end{aligned} \end{aligned}$$
(6.24)

in terms of

$$\begin{aligned} \begin{aligned} M(\xi ,\sigma ) \,\lesssim _\mu \, N^{C(\mu )\varepsilon _1}, \end{aligned} \end{aligned}$$
(6.25)

(signed) monomials \( \Gamma _{\!\xi ,\sigma ,\alpha } :\mathbb {B}(\xi ,\sigma ) \rightarrow \mathbb {C}\), such that \( \Gamma _{\!\xi ,\sigma ,\alpha }({\mathbf{u}},{\mathbf{v}}) \) for each \(\alpha \) is of the form:

$$\begin{aligned} \begin{aligned} (-1)^\#\prod _{t=1}^{n} \left( G^{E_{t}}_{ a_{ t} b_{ t}}\right) ^\# \,\prod _{r=1}^{q} \frac{\!1}{\left( G^{F_{r}}_{w_{ r} w_{r}}\right) ^\#\!} \prod _{ r\in R^{(1)}} \left( G^{U_{r}}_{u_rv_r}\right) ^\# \prod _{ t\in R^{(2)}} \left( G^{U'_{t}}_{u_{t}u'_{ t}}G^{V'_{t}}_{v'_{t}v_t}\right) ^\#. \end{aligned} \end{aligned}$$
(6.26)

Here the notations \( (-\,1)^\# \) and \( (\,\cdot \,)^\# \) indicate possible signs and complex conjugations that may depend only on \( (\xi ,\sigma ,\alpha ) \), respectively, and that will be irrelevant for our estimates. The dependence on \( (\xi ,\sigma ,\alpha ) \) has been suppressed in the notations, e.g., \( n = n(\xi ,\sigma ,\alpha ) \), \(U_r=U_r(\xi ,\sigma ,\alpha )\), etc.

The numbers n and q of factors in (6.26) are bounded, \( n+q \lesssim _\mu 1 \). Furthermore, for any fixed \(\alpha \) the two subsets \( R^{(k)}\), \( k=1,2\), form a partition of \(\{1,\ldots ,2\mu \}\), and the monomials (6.26) have the following three properties:

  1. 1.

    The lower indices \( a_{t} \), \( b_{t} \), \( u'_{t}\), \(v'_{t}\), \( w_{t} \) are in \( \cup _{i\in \widehat{L}} B_i \), and \( d(a_{t},b_{t}) \geqslant N^{\varepsilon _1} \).

  2. 2.

    The upper index sets \( E_{r}\), \( F_{r}\), \( U_{r}\), \( U'_{r}\), \( V'_{r} \) are bounded in size by \( N^{C(\mu )\varepsilon _1} \), and \( B_r \subseteq U_{r},U'_{r},V'_{r} \). The total number of these sets appearing in the expansion (6.24) is bounded by \( N^{C(\mu )\varepsilon _1} \).

  3. 3.

    At least one of the following two statements is always true:

    $$\begin{aligned}&\,(I) \quad \exists i \in \widehat{L}, \;\text { s.t. }\; B_i \;\subseteq \, {\textstyle \bigcap _{t=1}^{n}} E_{t} \,\cap \, {\textstyle \bigcap _{r=1}^{q}} F_{r} \,\cap \, {\textstyle \bigcap _{r\in R_1}}\! U_{r} \,\cap \, {\textstyle \bigcap _{ t\in R_2}} (\, U'_{t} \cap V'_{t}) \;;\\&(II) \quad n+ |R^{(1)} | + 2|R^{(2)} | \;\geqslant \; 2\mu + |\widehat{L} | . \end{aligned}$$

Since Lemma 6.1 relies heavily on this representation, we make a few remarks: (i) Monomials with different values of \( \alpha \) may be equal. The indices \( a_{t} \), \( b_{t} \), \( u'_{t}\), \(v'_{t}\), \( w_{t} \) may overlap, but they are always distinct from the internal indices since from (6.21) and (6.23) we see that

$$\begin{aligned} \{{u_r,{v}_r}\}_{r=1}^{2\mu } \subseteq \mathbb {X} \backslash \bigl (\cup _{\!i\in \widehat{L}}B_i\bigr ),\qquad ({\mathbf{u}},{\mathbf{v}}) \in \mathbb {B}(\xi ,\sigma ). \end{aligned}$$

(ii) The reciprocals of the resolvent entries are not important for our analysis because the diagonal resolvent entries are comparable to 1 in absolute value when a local law holds (cf. (5.21)). (iii) Property 3 asserts that each monomial is either a deterministic function of \( {\mathbf{H}}^{(B_i)} \) for some isolated label i, and consequently almost independent of the rows/columns of \( {\mathbf{H}} \) labeled by \(x_i,y_i \) [Case (I)], or the monomial contains at least \( |\widehat{L} | \) additional off-diagonal resolvent factors [Case (II)]. In the second case, each of these extra factors will provide an additional factor \(\Psi \) for typical internal indices due to faster than power law decay of \( {\mathbf{M}} \) and the local law (6.1). Atypical internal indices, e.g. when \(u_r\) and \(v_r\) are close to each other, do not give a factor \(\Psi \) since \(m_{u_r v_r}\) is not small, but there are much fewer atypical indices than typical ones and this entropy factor makes up for the lack of smallness. These arguments will be made rigorous in Lemma 6.3 below.

By using the monomial sum representation (6.24) in (6.22), and estimating each summand separately, we obtain

$$\begin{aligned} \begin{aligned}&\Biggl | \,\mathbb {E}\! \prod _{i=1}^{\mu } Z^{(Q_i)}_{x_iy_i} \overline{Z} \!\,^{(Q_{\mu +i})}_{x_{\mu +i}y_{\mu +i}}\Biggr | \,\lesssim _\mu \; N^{C(\mu )\varepsilon _1} \max _{\sigma } \max _{\alpha \,=\,1}^{M(\xi ,\sigma )}\, \Biggl |\, \mathbb {E}\sum _{\quad ({\mathbf{u}},{\mathbf{v}}) \in \mathbb {B}(\xi ,\sigma )} w_{{\mathbf{x}},{\mathbf{y}}}({\mathbf{u}},{\mathbf{v}})\, \Gamma _{\!\xi ,\sigma ,\alpha }({\mathbf{u}},{\mathbf{v}}) \, \Biggr | , \end{aligned} \end{aligned}$$
(6.27)

where the factor \(N^{C(\mu )\varepsilon _1} \) originates from (6.25), and we have bounded the summation over by a \(\mu \)-dependent constant. Thus (6.15) holds if we show, uniformly in \( \alpha \), that

$$\begin{aligned} \begin{aligned} \Biggl |\, \mathbb {E}\sum _{\quad ({\mathbf{u}},{\mathbf{v}}) \in \mathbb {B}(\xi ,\sigma )} w_{{\mathbf{x}},{\mathbf{y}}}({\mathbf{u}},{\mathbf{v}})\, \Gamma _{\!\xi ,\sigma ,\alpha }({\mathbf{u}},{\mathbf{v}}) \, \Biggr |&\leqslant N^{C(\mu )\varepsilon _1} N^{-\frac{1}{2}(|L({\mathbf{x}},{\mathbf{y}}) |\,-\,|\widehat{L}({\mathbf{x}},{\mathbf{y}},\sigma ) |)} \, \Psi ^{2\mu \,+|\widehat{L}({\mathbf{x}},{\mathbf{y}},\sigma ) |}. \end{aligned} \end{aligned}$$
(6.28)

In order to prove this bound, we fix \( \alpha \), and sum over the internal indices to get

$$\begin{aligned} \begin{aligned}&\Biggl |\; \mathbb {E}\sum _{\quad ({\mathbf{u}},{\mathbf{v}}) \in \mathbb {B}(\xi ,\sigma )} w_{{\mathbf{x}},{\mathbf{y}}}({\mathbf{u}},{\mathbf{v}})\, \Gamma _{\!\xi ,\sigma ,\alpha }({\mathbf{u}},{\mathbf{v}}) \, \Biggr | \\&\quad \leqslant \; \mathbb {E}\, \prod _{t=1}^{n}\, |G^{E_{t}}_{ a_{t} b_{t}} | \prod _{r=1}^{q}\; |G^{F_{ r}}_{w_{ r} w_{r}} |^{-1} \prod _{\quad r\in R^{(1)}} \Theta ^{(1)}_r \prod _{\quad r\in R^{(2)}} \Theta ^{(2)}_r , \end{aligned} \end{aligned}$$
(6.29)

where we have used the formula (6.26) for the monomial \(\Gamma _{\!\xi ,\sigma ,\alpha }\). The sums over the internal indices have been absorbed into the following factors:

$$\begin{aligned} \begin{aligned} \qquad \Theta ^{(1)}_r \,&:=\, \Biggl | \sum _{\quad u\in B_{\sigma '_r}\!\backslash Q_r} \sum _{\quad v\in B_{\sigma ''_r}\!\backslash Q_r} w_{{x_r},y_r}(u,v)\; G^{U_{r}}_{uv} \,\Biggr |, r \in R^{(1)}, \\ \qquad \Theta ^{(2)}_r \,&:=\, \Biggl | \sum _{\quad u\in B_{\sigma '_r}\!\backslash Q_r} \sum _{\quad v\in B_{\sigma ''_r}\!\backslash Q_r} w_{{x_r},y_r}(u,v)\; G^{U'_{r}}_{uu'_{r}}G^{V'_{r}}_{v'_{r}v} \; \Biggr | , \quad r\in R^{(2)}. \end{aligned} \end{aligned}$$
(6.30)

The right hand side of (6.29) will be bounded using the following three estimates which follow by combining the monomial representation with our previous stochastic estimates.

Lemma 6.3

(Three sources of smallness) Consider an arbitrary monomial \( \Gamma _{\!\xi ,\sigma ,\alpha } \), of the form (6.26). Then, under the hypotheses of Proposition 3.5, the following three estimates hold:

  1. 1.

    The resolvent entries with no internal lower indices are small while the reciprocals of the resolvent entries are bounded, in the sense that

    $$\begin{aligned} \begin{aligned} |G^{E_{t}}_{ a_{t} b_{ t}} | \,&\prec \, \Psi ,\qquad | G^{F_{ r}}_{w_{r} w_{r}} |^{-1} \prec \; 1. \end{aligned} \end{aligned}$$
    (6.31)
  2. 2.

    If \( \Gamma _{\!\xi ,\sigma ,\alpha } \) satisfies (I) of Property 3 of Lemma 6.2, then its contribution is very small in the sense that

    $$\begin{aligned} \begin{aligned} |\mathbb {E}\,w_{{\mathbf{x}},{\mathbf{y}}}({\mathbf{u}},{\mathbf{v}})\,\Gamma _{\!\xi ,\sigma ,\alpha }({\mathbf{u}},{\mathbf{v}}) | \,&\lesssim _{\mu ,\nu }\; N^{-\nu }, \qquad ({\mathbf{u}},{\mathbf{v}})\in \mathbb {B}(\xi ,\sigma ). \end{aligned} \end{aligned}$$
    (6.32)
  3. 3.

    Sums over the internal indices around external indices with lone labels yield extra smallness:

    $$\begin{aligned} \begin{aligned} \qquad \quad {\Theta ^{(k)}_r} \;&\prec \, N^{C(\mu )\varepsilon _1} N^{-\frac{1}{2}|\sigma _r |_*} \Psi ^{k} , \qquad 1 \leqslant r \leqslant 2\mu ,\; k=1,2 , \end{aligned} \end{aligned}$$
    (6.33)

    where \( |\sigma _r |_*\,:=\, |\{{0,\sigma '_r,\sigma ''_r}\} | - 1 \) counts how many, if any, of the two indices \( u_r \) and \( v_r \), are restricted to vicinity of distinct external indices.

We postpone the proof of Lemma 6.3 and first see how it is used to finish the proof of Lemma 6.1. The bound (6.28) follows by combining Lemmas 6.2 and 6.3 to estimate the right hand side of (6.29). If (I) of Property 3 of Lemma 6.2 holds, then applying (6.32) and (6.31) in (6.29) yields (6.28). On the other hand, if (I) of Property 3 of Lemma 6.2 is not true, then we use (6.31) and (6.33) to get

$$\begin{aligned} \begin{aligned}&\prod _{t=1}^{n}\, |G^{E_{t}}_{ a_{t} b_{ t}} | \prod _{r=1}^{q}\, |G^{F_{r}}_{w_{ r} w_{r}} |^{-1} \prod _{\quad r\in R^{(1)}} \Theta ^{(1)}_r \prod _{\quad r\in R^{(2)}} \Theta ^{(2)}_r\\&\quad \prec N^{C(\mu )\varepsilon _1} \Psi ^{n+|R^{(1)} |+2|R^{(2)} |}\,N^{-\frac{1}{2}\sum _r |\sigma _r |_*} . \end{aligned} \end{aligned}$$
(6.34)

By Part 3 of Lemma 6.2 we know that (II) holds. Thus the power of \( \Psi \) on the right hand side of (6.34) is at least \( 2\mu + |\widehat{L} | \). On the other hand, from (6.23) we see that

$$\begin{aligned} |L |-|\widehat{L} | \,\leqslant \, \left|\,{\textstyle \bigcup _{r=1}^{2\mu }} \{{\sigma '_r, \sigma ''_r}\}\backslash \{{0}\} \right| \,\leqslant \, \sum _{r=1}^{2\mu } \left|\{{\sigma '_r, \sigma ''_r}\}\backslash \{{0}\} \right| \,\leqslant \, \sum _{r=1}^{2\mu }|\sigma _r |_*. \end{aligned}$$

Hence the power of \( N^{-1/2} \) on the right hand side of (6.34) is at least \( |L |-|\widehat{L} | \). Using these bounds together with \(\Psi \geqslant N^{-1/2}\) in (6.34), and then taking expectations yields (6.28). Plugging (6.28) into (6.27) completes the proof of (6.15).\(\square \)

Proof of Lemma 6.3

Combining (6.1) and (5.20) we see that for some sequence \( \alpha \)

$$\begin{aligned} \begin{aligned} |G^{E}_{u{v}} | \,\prec \; N^{C\varepsilon _1}\Psi + \frac{\alpha ({\nu })}{(1+d(u,{v}))^{\nu }} ,\quad \text {whenever} \quad u,{v} \notin E,\;\text { and }\; |E | \leqslant N^{C\varepsilon _1} . \end{aligned} \end{aligned}$$
(6.35)

By the bound on the size of \( E_{t}\), \( F_{r} \) in Property 2 of Lemma 6.2, (6.35) is applicable for these upper index sets. Then (6.31) follows from the second bound of Property 1 of Lemma 6.2 and the decay of the entries of \({\mathbf{M}}^{E}\) from (5.20).

In order to prove Part 2, let \( i \in \widehat{L} \) be the label from (I) of Property 3 of Lemma 6.2. We have

$$\begin{aligned} \mathbb {E}\,w_{{\mathbf{x}},{\mathbf{y}}}({\mathbf{u}},{\mathbf{v}})\,\Gamma _{\!\xi ,\sigma ,\alpha }({\mathbf{u}},{\mathbf{v}})&= \mathbb {E}\Bigl [w_{x_iy_i}(u_i,{v}_i) ^\#\,\Bigr ]\cdot \mathbb {E}\biggl [ \Gamma _{\!\xi ,\sigma ,\alpha }({\mathbf{u}},{\mathbf{v}}) \prod _{r\ne i}w_{x_ry_r}(u_r,{v}_r)^\# \,\biggr ] \\&\quad +\, \mathrm{Cov}\biggl (\, w_{x_iy_i}(u_i,{v}_i)^\#,\;\Gamma _{\!\xi ,\sigma ,\alpha }({\mathbf{u}},{\mathbf{v}})\, \prod _{r\ne i}w_{x_ry_r}(u_r,{v}_r)^\#\biggl ) , \end{aligned}$$

where the first term on the right hand side vanishes because \( w_{xy}(u,{v}) \)’s are centred random variables by (6.19). Now the covariance is smaller than any inverse power of N, since \( w_{x_iy_i}\!(u_i,{v}_i) \) depends only on the \( x_i \)th and \(y_i\)th row/column of \( {\mathbf{H}} \), while \( \Gamma _{\!\xi ,\sigma ,\alpha } \) is a deterministic function of \( {\mathbf{H}}^{B_i} \) by (I) of Property 3 of Lemma 6.2. Indeed, the faster than power law decay of correlations (2.25) yields (6.32), because the derivative of \(\Gamma _{\!\xi ,\sigma ,\alpha }({\mathbf{u}},{\mathbf{v}})\) with respect to the entries of \({\mathbf{H}}\) are bounded in absolute value by \( N^{C(\mu )} \) by the N-dependent smoothness of the resolvents \({\mathbf{G}}^E(\zeta )\) as a function of \({\mathbf{H}}\) for spectral parameters \(\zeta \) with \({{\mathrm{dist}}}(\zeta , {{\mathrm{Spec}}}({\mathbf{H}}^{E}))\geqslant N^{-C}\). For more details we refer to the proof of Lemmas A.2 and A.3, where a similar argument was used.

Now we will prove Part 3. To this end, fix an arbitrary label \( r =1,2,\ldots , 2\mu \). Let us denote \( B_L := \bigcup _{s\in L} B_s \) and \( B_{\widehat{L}} := \bigcup _{t \in \widehat{L}} B_t\).

Let us first consider \( \Theta ^{(1)}_r \). If \( \sigma '_r = s \) and \( \sigma ''_r = t \), then we need to estimate

$$\begin{aligned} \begin{aligned} \sum _{\;u \in B_s\!\backslash Q_r}\sum _{\;{v} \in B_t\!\backslash Q_r} w_{x_ry_r}(u,{v}) \,G^{U_r}_{u{v}} ,\quad \text {where }\, s,t \in L\backslash \widehat{L},\text { and } Q_r \subseteq U_r \subseteq Q_r\cup B_{\widehat{L}}. \end{aligned} \end{aligned}$$
(6.36)

Since \( B_s\!\backslash Q_r, B_t\!\backslash Q_r \subseteq \mathbb {X}\backslash U_r \), the indices uv do not overlap the upper index set \( U_r \). Hence, in the case \( k=1\) and \( s = t = 0 \) the estimate (6.33) follows from (A.48) of Lemma A.3.

If \( s,t \in L \), then taking modulus of (6.36) and using (6.35) yields (6.33):

$$\begin{aligned} \begin{aligned} \Theta ^{(1)}_r \,&\leqslant \,|B_s\!\backslash Q_r |\,|B_t\!\backslash Q_r |\, \Bigl (\max _{\;u,{v}\in \mathbb {X}}\, |w_{x_ry_r}(u,{v}) |\Bigr ) \max _{\;u \in B_s\!\backslash Q_r}\!\max _{\;{v} \in B_t\!\backslash Q_r} |G^{U_r}_{u{v}} | \\&\prec \; \frac{\,N^{C\varepsilon _1}\!}{N}\Bigl ( N^{C\varepsilon _1}\Psi + \frac{\alpha ({\nu })}{(1+d(B_s,B_t))^{\nu }}\Bigr ) \,\leqslant \, N^{C\varepsilon _1} N^{-\frac{1}{2}|\sigma _r |_*}\Psi , \end{aligned} \end{aligned}$$
(6.37)

where \( d(A,B) := \inf \{ {d(a,b):a\in A,\;b \in B} \} \) for any sets A and B of \( \mathbb {X} \). Here we have also used the definition (6.13) of lone labels and \( \Psi \geqslant N^{-1/2} \).

Suppose now that exactly one component of \( \sigma _r \) equals 0 and one is in L. In this case, we split \( w_{x_ry_r}(u,v) \) in (6.36) into two parts corresponding to \( w_{x_ru}w_{vy_r} \) and its expectation, and estimate the corresponding sums separately. First, using (A.34) of Lemma A.2 yields

$$\begin{aligned} \begin{aligned}&\frac{1}{N} \left|\sum _{\;u \in B_s\!\backslash Q_r}\sum _{\;v\in B_0\!\backslash Q_r} w_{x_ru}w_{vy_r} G^{U_r}_{uv} \right|\\&\prec \frac{|B_s\!\backslash Q_r |}{N}\, \biggl (\max _{u\in \mathbb {X}}\, |w_{{x_r}u} |\biggr ) \max _{u\notin U_r} \left| \sum _{\quad v \in B_0\!\backslash Q_r} G^{U_r}_{uv}w_{vy_r} \right| \prec \frac{N^{C\varepsilon _1}\Psi }{N^{1/2}}. \end{aligned} \end{aligned}$$
(6.38)

On the other hand, using (6.35) we estimate the expectation part:

$$\begin{aligned} \begin{aligned}&\frac{1}{N}\left|\sum _{\;u \in B_s\!\backslash Q_r}\sum _{\;{v} \in B_0\!\backslash Q_r} (\mathbb {E}\,w_{x_ru}w_{{v}y_r})\, G^{U_r}_{u{v}} \right| \\&\quad \prec \frac{|B_s\!\backslash Q_r |}{N}\, \max _{u\in \mathbb {X}} \sum _{\;{v} \in B_0\!\backslash Q_r} |\mathbb {E}\,w_{x_ru}w_{{v}y_r} |\bigl (N^{C\varepsilon _1}\Psi +|m^{U_r}_{u{v}} |\bigr ). \end{aligned} \end{aligned}$$
(6.39)

Similar to the part (6.38), because of (3.12), we can estimate (6.39) by \( \mathcal {O}_{\!\prec }(N^{C\varepsilon _1} N^{-1}) \). As \( \Psi \geqslant N^{-1/2} \), this finishes the proof of (6.33) in the case \( k = 1 \).

Now we prove (6.33) for \(\Theta ^{(2)}_r \). In this case, we need to bound,

$$\begin{aligned} \begin{aligned} \sum _{\;u \in B_s\!\backslash Q_r} \sum _{\; v \in B_t\!\backslash Q_r} w_{x_ry_r}(u,v) \,G^{U'_r}_{uu'_r}G^{V'_r}_{v'_rv} , \end{aligned} \end{aligned}$$
(6.40)

where \( s = \sigma '_r\), \( t=\sigma ''_r \) have again values in \( \{{0}\} \cup L\backslash \widehat{L} \). Here, \(u'_r \in B_{\widehat{L}} \!\backslash U'_r \), \( {v}'_r \in B_{\widehat{L}}\backslash V'_r \), and \( Q_r \subseteq \, U'_r,V'_r \subseteq Q_r \cup B_{\widehat{L}} \).

By definitions of the lone and isolated labels (6.13) and (6.23), respectively, we know that, if \( s \in L\backslash \widehat{L}\), then \( d(u,u'_r) \geqslant N^{\varepsilon _1} \), and similarly, if \( t \in L\backslash \widehat{L}\), then \( d(v'_r,{v}) \geqslant N^{\varepsilon _1} \). Thus, if \( s,t \in L\backslash \widehat{L} \), then estimating similarly as in (6.37) with (6.35), yields

$$\begin{aligned} \Theta ^{(2)}_r \,\prec \, N^{C\varepsilon _1} N^{-1}\Psi ^2, \qquad s,t \in L\backslash \widehat{L}. \end{aligned}$$

In the remaining cases, we split (6.40) into two parts corresponding to the term \( w_{x_ru}w_{vy_r} \) and its expectation in the definition of (6.19) of \( w_{x_ry_r}(u,v)\), and estimate these two parts separately.

The average part is bounded similarly as in (6.39), i.e., if \( s \in L\backslash \widehat{L} \) and \( t=0\), then

$$\begin{aligned}&\frac{1}{N}\left|\sum _{\;u\in B_s\!\backslash Q_r}\sum _{\;v\in B_0\!\backslash Q_r} (\mathbb {E}\,w_{x_ru}w_{{v}y_r})\, G^{U'_r}_{uu'_r}G^{V'_r}_{v'_r{v}} \right| \nonumber \\&\quad \prec \; \frac{|B_s\!\backslash Q_r |}{N} \max _{u\in B_s} \sum _{\;v \in B_0\!\backslash Q_r} (\mathbb {E}\,w_{x_ru}w_{{v}y_r})\nonumber \\&\quad \times \biggl ( N^{C\varepsilon _1}\Psi + \frac{\alpha (\nu )}{(1+d(u,u'_r))^{\nu }}\biggr ) \biggl ( N^{C\varepsilon _1}\Psi + \frac{\alpha (\nu )}{(1+d(v'_r,{v}))^{\nu }}\biggr ) . \end{aligned}$$
(6.41)

Here \( d(u,u'_r) \geqslant N^{\varepsilon _1} \) since \( u \in B_s \), \( s \in L\backslash \widehat{L} \), while \( u'_r \in B_{\widehat{L}} \). Taking \( \nu > C\varepsilon _1^{-1}\) and using the (3.12) to bound the sum over the covariances by a constant, we thus we see that the right hand side is \( \mathcal {O}_{\!\prec }(N^{C\varepsilon _1}N^{-1}\Psi ) \). Since \( \Psi \geqslant N^{-1/2} \), this matches (6.33) as \( |\sigma _r |_*= |\{{0,s,t}\} |-1 = 1 \).

Now, we are left to bound the size of terms of the form (6.40), where \(w_{x_ry_t}(u,v) \) is replaced with \( \frac{1}{N}w_{x_ru}w_{vy_r} \), and either \( s = 0 \) or \( t = 0 \). In these cases the sums over u and v factorize, i.e., we have

$$\begin{aligned} \frac{1}{N} \biggl ( \sum _{\quad u \in B_s\!\backslash Q_r} w_{x_ru}G^{U'_r}_{uu'_r} \biggr ) \biggl ( \sum _{\quad {v} \in B_t\!\backslash Q_r} G^{V'_r}_{{v}'_r{v}}w_{{v}y_r} \biggr ). \end{aligned}$$

When the sum is over a small set, i.e., over \( B_{s'} \) for some \( s' \in L\backslash \widehat{L} \), then we estimate the sizes of the entries of \( {\mathbf{W}} \) and \( {\mathbf{G}}^{(\#)} \) by \( \mathcal {O}_{\!\prec }(N^{-1/2})\) and \( \mathcal {O}_{\!\prec }(\Psi ) \), respectively. On the other hand, when u or v is summed over \( B_0\!\backslash Q_r \), we use (A.34) of Lemma A.2 to obtain a bound of size \( \mathcal {O}_{\!\prec }(\Psi ) \). In each case, we obtain an estimate that matches (6.33). \(\square \)

Proof of Lemma 6.2

We consider the data \( (\xi ,\sigma ) \) fixed, and write \( \widehat{L} = \widehat{L}({\mathbf{x}},{\mathbf{y}},\sigma ) \), etc. We start by enumerating the isolated labels (see (6.23))

$$\begin{aligned} \begin{aligned} \{{s_1,\ldots ,s_{\widehat{\ell }}\,}\} \,=\,\widehat{L} ,\qquad \widehat{\ell } := |\widehat{L} | , \end{aligned} \end{aligned}$$
(6.42)

and set \( \widehat{B}(k) := \cup _{j=1}^k B_{s_j} \) for \( 1 \leqslant k \leqslant \widehat{\ell }\) [recall the definition from (6.20) and that \(B_{s_j}\)’s are disjoint].

The monomial expansion (6.24) is constructed iteratively in \( \widehat{\ell } \) steps. Indeed, we will define \( 1+ \widehat{\ell } \) representations,

$$\begin{aligned} \begin{aligned} \Gamma _{\!\xi }|_{\mathbb {B}(\xi ,\sigma )} \,=\, \sum _{\alpha =1}^{M_k} \,\Gamma ^{(k)}_\alpha ,\qquad k=0,1,\ldots ,\widehat{\ell } . \end{aligned} \end{aligned}$$
(6.43)

where the \( M_k = M_k(\xi ,\sigma ) \) monomials \( \Gamma ^{(k)}_{\!\alpha } = \Gamma ^{(k)}_{\!\xi ,\sigma ,\alpha } : \mathbb {B}(\xi ,\sigma ) \rightarrow \mathbb {C}\), evaluated at \( ({\mathbf{u}},{\mathbf{v}}) \in \mathbb {B}(\xi ,\sigma )\), are of the form

$$\begin{aligned} \begin{aligned} (-\,1)^{\#}\prod _{t=1}^m \left( G^{E_t}_{a_t b_t}\right) ^{\#} \prod _{r=1}^q \frac{1}{\left( G^{F_r}_{w_r w_r}\right) ^{\#}}\,\, , \end{aligned}\, \end{aligned}$$
(6.44)

with some indices \(a_t,b_t \notin E_t \), \( w_r \notin F_r \). The numbers m and q as well as the sets \( E_t \), \( F_r \) may vary from monomial to monomial, i.e., they are functions of k and \(\alpha \). Furthermore, for each fixed k and \(\alpha \), the lower indices and the upper index sets satisfy

  1. (a)

    \( a_t,b_t \in \{{u_r,{v}_r}\}_{r=1}^p \!\cup \widehat{B}(k) \), and \( w_s \in \{{a_t,b_t}\}_{t=1}^m \);

  2. (b)

    \( E_t \subseteq \widehat{B}(k) \cup Q_{t'} \), for some \( 1 \leqslant t' \leqslant 2\mu \), and \( F_r \subseteq \widehat{B}(k) \cup Q_{r'} \), for some \( 1 \leqslant r' \leqslant 2\mu \);

  3. (c)

    If \( a_t \in B_{s_i} \) and \( b_t \in B_{s_j} \), with \( 1\leqslant i,j\leqslant k \), then \( i \ne j \);

  4. (d)

    For each \( s=1,\ldots ,2\mu \) there are two unique labels \( 1 \leqslant t'(s), t''(s) \leqslant m \), such that \(a_{t'(s)} = u_s \), \( b_{t'(s)} \notin \{{v_r}\}_{r\ne s}\), and \( a_{t''(s)} \notin \{{u_r}\}_{r\ne s}\), \( b_{t''(s)} = v_s \) hold, respectively.

We will call the right hand side of (6.43) the level-kexpansion in the following and we will define it by a recursion on k.

The level-0 expansion is determined by the formula (6.18a):

$$\begin{aligned} \begin{aligned} \Gamma ^{(0)}_1 := \Gamma _{\!\xi }|_{\mathbb {B}(\xi ,\sigma )} ,\qquad M_0 := 1 . \end{aligned} \end{aligned}$$
(6.45)

This monomial clearly satisfies (a)–(d), with \( m = 2\mu \), \( q = 0 \), \( E_t = Q_t \), and \( t'(s) = t''(s) = s \). The final goal, the representation (6.24), is the last level-\( \widehat{\ell }\) expansion, i.e.,

$$\begin{aligned} \begin{aligned} \Gamma _{\!\xi ,\sigma ,\alpha } :=\, \Gamma ^{(\widehat{\ell })}_\alpha ,\qquad \alpha =1,2,\ldots ,M_{\widehat{\ell }} =: M(\xi ,\sigma ) . \end{aligned} \end{aligned}$$
(6.46)

Now we show how the level-k expansion is obtained given the level-\( (k-1)\) expansion. In order to do that, first we list the elements of each \(B_{s_k}\) as \( \{{x_{ka}: 1 \leqslant a \leqslant |B_{s_k} |}\} = B_{s_k} \), and we define

$$\begin{aligned} B_{k1} \!:= \!\varnothing ,\qquad B_{ka} \!:=\! \{{x_{kb}: 1 \leqslant b \leqslant a-1}\} , \qquad a =2,\ldots ,|B_{s_k} |, \quad k =1,2,\ldots ,\widehat{\ell } , \end{aligned}$$

which is a one-by-one exhaustion of \(B_{s_k}\); namely \(B_{k1}\subseteq B_{k2} \subseteq \cdots \subseteq B_{k, |B_{s_k} |}\subseteq B_{s_k}\). Note that \(B_{k,a+1} = B_{ka}\cup \{ x_{ka}\}\).

We now consider a generic level-\( (k-1) \) monomial \( \Gamma ^{(k-1)}_\alpha \), which is of the form (6.44) and satisfies (a)–(d). Each monomial \( \Gamma ^{(k-1)}_\alpha \) will give rise several level-k monomials that are constructed independently for different \(\alpha \)’s as follows. Expanding each of the m factors in the first product of (6.44) using the standard resolvent expansion identity

$$\begin{aligned} \begin{aligned} G^{E}_{ab} \,=\; G^{E\cup B_{s_k}}_{ab} \,+\; \sum _{ a'=1 } ^{|B_{s_k} |} \mathbbm {1}(x_{ka'}\notin E)\, \frac{ G^{E\cup B_{ka'}}_{ax_{ka'}}G^{E\cup B_{ka'}}_{x_{ka'}b} }{ G^{E\cup B_{ka'}}_{x_{ka'}x_{ka'}} } , \end{aligned} \end{aligned}$$
(6.47a)

and each of the q factors in the second product of (6.44) using

$$\begin{aligned} \begin{aligned} \frac{\!1}{G^{F}_{ww}} \,&=\, \frac{1}{\,G^{F \cup B_{s_k}}_{ww}\!}\; - \sum _{a =1}^{|B_{s_k} |} \mathbbm {1}(x_{ka}\notin F)\, \frac{ G^{F\cup B_{ka}}_{wx_{ka}}G^{F\cup B_{ka}}_{x_{sa}w} }{ G^{F\cup B_{ka}}_{ww} G^{F\cup B_{k,a+1}}_{ww} G^{F\cup B_{ka}}_{x_{ka}x_{ka}} }, \end{aligned} \end{aligned}$$
(6.47b)

yields a product of sums of resolvent entries and their reciprocals.

Inserting these formulas into (6.44) and expressing the resulting product as a single sum yields the representation

$$\begin{aligned} \begin{aligned} \Gamma ^{(k-1)}_\alpha = {\textstyle \sum \limits _{\beta \in \mathcal {A}_\alpha (k)}} \Gamma ^{(k)}_\beta , \end{aligned} \end{aligned}$$
(6.48)

where \( \mathcal {A}_\alpha (k) \) is some finite subset of integers and \(\beta \) simply labels the resulting monomials in an arbitrary way. From the resolvent identities (6.47) it is easy to see that the monomials \( \Gamma ^{(k)}_\beta \) inherit the properties (a)–(d) from the level-\( (k-1) \) monomials. In particular, summing over \( \alpha =1,\ldots ,M_{k-1} \) in (6.48) yields the level-k monomial expansion (6.43), with \( M_k := \sum _\alpha |\mathcal {A}_\alpha (k) | \). We will assume w.l.o.g. that the sets \( \mathcal {A}_\alpha (k) \), \( 1 \leqslant \alpha \leqslant M_{k-1}\), form a partition of the first \( M_k \) integers.

This procedure defines the monomial representation recursively. Since \( \Gamma ^{(k)}_{\!\alpha }\) is a function of the \(({\mathbf{u}},{\mathbf{v}})\) indices, strictly speaking we should record which lower indices in the generic form (6.44) are considered independent variables. Initially, at level \(k=0\), all indices are variables, see (6.18a). Later, the expansion formulas (6.47) bring in new lower indices, denoted generically by \(x_{ka} \) from the set \( \cup _{s\in \widehat{L}} B_s \) which is disjoint from the range of the components \( u_r,v_r \) of the variables \( ({\mathbf{u}},{\mathbf{v}}) \) as \( \mathbb {B}(\xi ,\sigma ) \) is a subset of \( (\mathbb {X}\backslash \!\cup _{s\in \widehat{L}} B_s)^{2\mu } \). However, the structure of (6.47) clearly shows at which location the “old” ab indices from the left hand side of these formulas appear in the “new” formulas on the right hand side. Now the simple rule is that if any of these indices ab were variables on the left hand side, they are considered variables on the right hand side as well. In this way the concept of independent variables is naturally inherited along the recursion. With this simple rule we avoid the cumbersome notation of explicitly indicating which indices are variables in the formulas.

We note that the monomials of the final expansion (6.46) can be written in the form (6.26). Indeed, the second products in (6.26) and (6.44) are the same, while the first product of (6.44) is split into the three other products in (6.26) using (d). Properties 1 and 2 in Lemma 6.2 for the monomials in (6.46) follow easily from (a)–(d). Indeed, (a) yields the first part of Property 1, while the second part of Property 1 follows from (c) and the basic property \( d(B_s,B_t) \geqslant N^{\varepsilon _1} \) for distinct lone labels \( s,t \in \widehat{L} \).

For a given \(\xi \), we define the family of subsets of \( \mathbb {X} \):

$$\begin{aligned} \mathscr {E} := \Bigl \{{ B_{1,a(1)} \cup B_{2,a(2)} \cup \cdots \cup B_{\widehat{\ell },a(\widehat{\ell })} \cup Q_r : 1 \leqslant a(k) \leqslant |B_{s_k} |, \;1 \leqslant k\leqslant \widehat{\ell },\;1\leqslant r\leqslant 2\mu \; }\Bigr \} . \end{aligned}$$

By construction [cf. (6.47) and (b)] the upper index sets are members of this \( \xi \)-dependent family. Since \( |Q_r |,|B_{s_k} | \leqslant N^{C_0\varepsilon _1} \), for some \( C_0 \sim 1 \), we get \( |\mathscr {E} | \lesssim _\mu N^{C_0\mu } \). Property 2 follows directly from these observations.

Next we prove Property 3 of the monomials (6.46). To this end, we use the formula (6.48) to define a partial ordering ’<’ on the monomials by

$$\begin{aligned} \begin{aligned} \Gamma ^{(k-1)}_\alpha <\,\Gamma ^{(k)}_\beta \Longleftrightarrow \beta \in \mathcal {A}_\alpha (k) . \end{aligned} \end{aligned}$$
(6.49)

It follows that for every \( \alpha =1,2,\ldots , M = M_{\widehat{\ell }} \), there exists a sequence \( (\alpha _k)_{k=1}^{\widehat{\ell }-1} \), such that

$$\begin{aligned} \begin{aligned} \Gamma _{\!\xi }|_{\mathbb {B}(\xi ,\sigma )} =\; \Gamma _1^{(0)} \,<\; \Gamma ^{(1)}_{\alpha _1} \,<\;\cdots \;<\; \Gamma ^{(\widehat{\ell }-1)}_{\!\alpha _{\widehat{\ell }-1}} <\; \Gamma ^{(\widehat{\ell })}_\alpha =\;\Gamma _{\!\xi ,\sigma ,\alpha } . \end{aligned} \end{aligned}$$
(6.50)

Let us fix an arbitrary label \( \alpha =1,\ldots ,M \) of the final expansion. Suppose that the k-th monomial \( \Gamma ^{(k)}_{\alpha _k} \), in the chain (6.50), is of the form (6.44), and define

$$\begin{aligned} \begin{aligned} D_k \,:=\, \left( {\textstyle \bigcap _{t=1}^{m} E_t}\right) \,\cap \, \left( {\textstyle \bigcap _{r=1}^{q} F_r}\right) ,\qquad m_k \,:=\, m . \end{aligned} \end{aligned}$$
(6.51)

Here, \( D_k \) is the largest set \( A \subseteq \mathbb {X} \), such that \( \Gamma ^{(k)}_{\alpha _k} \) depends only on the matrix elements of \({\mathbf{H}}^{(A)} \).

Since both the upper index sets and the total number of resolvent elements of the form \( G^{(A)}_{ab}\) are both larger (or equal) on the right hand side than on the left hand sides of the identities (6.47), and the added indices on the right hand side are from \( B_{s_k}\), we have

$$\begin{aligned} D_{k-1} \subseteq D_k,\qquad D_k\backslash D_{k-1} \subseteq B_{s_k}, \qquad \text {and}\qquad m_k \;\geqslant \; m_{k-1} . \end{aligned}$$

We claim that

$$\begin{aligned} \begin{aligned} B_{s_k} \nsubseteq D_{\widehat{\ell }} \;&\implies \; B_{s_k} \nsubseteq D_k \;\implies \; m_k \,\geqslant \,m_{k-1}+1 . \end{aligned} \end{aligned}$$
(6.52)

The first implication follows from the monotonicity of \( D_k\)’s. In order to get the second implication, suppose that \( \Gamma ^{(k-1)}_{\alpha _{k-1}} \) equals (6.44). Since \( D_k \) does not contain \( B_{s_k} \) the monomial \( \Gamma ^{(k)}_{\alpha _k} \) can not be of the form (6.44), with the upper index sets \( E_t \) and \( F_t \) replaced with \( E_t \cup B_{s_k} \) and \( F_t \cup B_{s_k} \), respectively. The formulas (6.47) hence show that \( \Gamma ^{(k)}_{\alpha _k} \) contains at least one more resolvent entry of the form \( G^{(A)}_{ab} \) than \( \Gamma ^{(k-1)}_{\alpha _{k-1}} \), and thus \( m_k \geqslant m_{k-1}+1 \).

Property 3 follows from (6.52). Indeed, suppose that there are no isolated label s such that \( B_s \subseteq D_{\widehat{\ell }} \). Then applying (6.52) for each \( k =1,\ldots ,\widehat{\ell } \), yields \( m_{\widehat{\ell }} \,\geqslant \, m_0 + \widehat{\ell }. \) Since \( m_0 = p \), using the notations from (6.26) we have

$$\begin{aligned} m_{\widehat{\ell }} \;=\, n + |R^{(1)} | + 2|R^{(2)} | , \end{aligned}$$

by Property (c) of the monomials. This completes the proof of Property 3.

Now only the bound (6.25) on the number of monomials \( M = M_{\widehat{\ell }} \) remains to be proven, which is a simple counting. Let \(p_k\) be the largest number of factors among the monomials at the level-k expansion, i.e., writing a monomial \( \Gamma ^{(k)}_\alpha = \Gamma ^{(k)}_{\!\xi ,\sigma ,\alpha } \) in the form (6.26) we have

$$\begin{aligned} p_k \,:= \max _{1 \leqslant \alpha \leqslant M_k}\, \bigl (n(\alpha )+ |R^{(1)}(\alpha ) | + 2|R^{(2)}(\alpha ) | +q(\alpha )\,\bigr ), \end{aligned}$$

where \( M_k = M_k(\xi ,\sigma ) \), \( n(\alpha ) = n(\xi ,\sigma ,\alpha ) \), \( R^{(1)}(\alpha ) = R^{(1)}(\xi ,\sigma ,\alpha ) \), etc. Let us set \( b_*:= 1+ \max _{x,y}\,|B_{N^{\varepsilon _1}}(x,y) | \). Each of the factors in every monomial at the level \( k-1 \) is turned into a sum over monomials by the resolvent identities (6.47). Since each such monomial contains at most five resolvent entries (cf. the last terms in (6.47b)), we obtain the first of the following two bounds:

$$\begin{aligned} \begin{aligned} p_k \leqslant 5p_{k-1} \qquad \text {and}\qquad M_k \leqslant M_{k-1} b_*^{\,p_{k-1}} . \end{aligned} \end{aligned}$$
(6.53)

For the second bound we recall that each of the at most \( p_{k-1} \) factors in every level-\( (k-1)\) monomial is expanded by the resolvent identities (6.47) into a sum of at most \( b_*\) terms. The product of these sums yields single sum of at most \( b_*^{p_{k-1}} \) terms. From (6.45) and (6.18a) we get: \( M_0 := 1 \), \( p_0 = 2\mu \). Since \( k \leqslant \widehat{\ell } \leqslant 2\mu \), we have \( \max _k p_k \leqslant 2\mu \,25^\mu \). Plugging this into the second bound of (6.53) yields \( M_k \leqslant ((b_*)^{2\mu \,25^\mu })^{2\mu } \). This proves (6.25) since \( b_*\leqslant N^{C\varepsilon _1} \) by (2.11). Finally, we obtain the bound on the number of factors in (6.26) using \( n+q \leqslant p_{\widehat{\ell }} \lesssim _\mu 1 \). \(\square \)

7 Bulk universality and rigidity

In this section we show how to use the strong local law, Theorem 2.7, to obtain the remaining results of Sect. 2.2 on random matrices with correlated entries.

7.1 Rigidity

Proposition 7.1

(Local law away from \([\kappa _-,\kappa _+]\)) Let \({\mathbf{G}}\) be the resolvent of a random matrix \({\mathbf{H}}\) of the form (2.22) that satisfies B1-B4. Let \(\kappa _-, \kappa _+\) be the endpoints of the convex hull of \({{\mathrm{supp\,}}}\rho \) as in (5.1). For all \(\delta ,\varepsilon >0\) and \(\mathrm{v} \in \mathbb {N}\) there exists a positive constant C such that away from \([\kappa _-,\kappa _+]\),

$$\begin{aligned} \begin{aligned} \mathbb {P}\biggl [\, \exists \, \zeta \in \mathbb {H}\, \text { s.t. }\delta&\leqslant {{\mathrm{dist}}}(\zeta ,[\kappa _-,\kappa _+])\leqslant \frac{1}{\delta },\, \max _{x,y=1}^N|G_{xy}(\zeta )-m_{xy}(\zeta ) |\geqslant \frac{N^\varepsilon }{\sqrt{N }} \,\biggr ]\\&\leqslant \frac{C}{N^\mathrm{v}\!}. \end{aligned} \end{aligned}$$
(7.1)

The normalized trace converges with the improved rate

$$\begin{aligned} \begin{aligned} \mathbb {P}\biggl [\, \exists \, \zeta \in \mathbb {H}\, \text { s.t. }\delta&\leqslant {{\mathrm{dist}}}(\zeta ,[\kappa _-,\kappa _+])\leqslant \frac{1}{\delta },\; \left|{\frac{1}{N}}{{\mathrm{Tr\,}}}{\mathbf{G}}(\zeta )-{\frac{1}{N}}{{\mathrm{Tr\,}}}{\mathbf{M}}(\zeta ) \right|\geqslant \frac{N^\varepsilon }{N } \,\biggr ]\\&\leqslant \frac{C}{N^\mathrm{v}\!}. \end{aligned} \end{aligned}$$
(7.2)

The constant C depends only on the model parameters \(\mathscr {K}\) in addition to \(\delta \), \(\varepsilon \) and \(\nu \).

Remark 7.2

Theorem 2.7 and Proposition 7.1 provide a local law with optimal convergence rate \(\frac{1}{N\, {{\mathrm{Im}}}\, \zeta }\) inside the bulk of the spectrum and convergence rate \(\frac{1}{N}\) away from the convex hull of \({{\mathrm{supp\,}}}\rho \), respectively. In order to prove a local law inside spectral gaps and at the edges of the self-consistent spectrum, additional assumptions on \({\mathbf{H}}\) are needed to exclude a naturally appearing instability that may be caused by exceptional rows and columns of \({\mathbf{H}}\) and the outlying eigenvalues they create. This instability is already present in the case of independent entries as explained in Section 11.2 of [1].

Remark 7.3

The local law in Proposition 7.1 extends beyond the regime of bounded spectral parameters \(\zeta \). The upper bound \(\frac{1}{\delta }\) on the distance of \(\zeta \) from \([\kappa _-,\kappa _+]\) can be dropped in both (7.1) and (7.2). Furthermore, as was done e.g. for Wigner-type matrices in [4], by following the \(|\zeta |\)-dependence along the proof the estimates on the difference \({\mathbf{G}}-{\mathbf{M}}\) in (7.1) and (7.2) can be improved to \(\frac{N^\varepsilon }{(1+|\zeta |^2)\sqrt{N} }\) and \(\frac{N^\varepsilon }{(1+|\zeta |^2){N}}\), respectively. Since this extra complication only extends the local law to a regime far outside the spectrum of \({\mathbf{H}}\) (cf. Lemma 7.4 below) we refrain from carrying out this analysis.

Proof of Proposition 7.1

The proof has three steps. In the first step we will establish a weaker version of Proposition 7.1 where instead of the bound \(\Lambda \prec N^{-1/2}\) we will only show \(\Lambda \prec N^{-1/2} \!+ (N\,{{\mathrm{Im}}}\, \zeta )^{-1}\). Then we will use this version in the second step to prove that there are no eigenvalues outside a small neighborhood of \([\kappa _-,\kappa _+]\). Finally, in the third step we will show (7.1) and (7.2).

Step 1 The proof of this step follows the same strategy as the proof of Theorem 2.7. Only instead of using Lemma 3.4 to estimate the error matrix \({\mathbf{D}}\) we will use Lemma 5.1. In analogy to the proof of (2.28) we begin by showing the entrywise bound

$$\begin{aligned} \begin{aligned} \Lambda (\zeta )\,\prec \, \frac{1}{\sqrt{N}}+\frac{1}{N\,{{\mathrm{Im}}}\,\zeta },\qquad \zeta \in \mathbb {H},\; \delta \leqslant {{\mathrm{dist}}}(\zeta ,[\kappa _-,\kappa _+])\leqslant \frac{1}{\delta },\;{{\mathrm{Im}}}\, \zeta \geqslant N^{-1+\varepsilon }. \end{aligned} \end{aligned}$$
(7.3)

\(\square \)

In fact, following the same line of reasoning that was used to prove (3.11), but using (5.2) instead of (3.6) to estimate \(\Vert {\mathbf{D}} \Vert _{\mathrm{max}}\) we see that

$$\begin{aligned} \Lambda (\zeta )\mathbbm {1}(\Lambda (\zeta )\leqslant & {} N^{-\varepsilon /2})\nonumber \\\prec & {} \frac{1}{\sqrt{N }}+\left( {\frac{\Lambda (\zeta )}{N\, {{\mathrm{Im}}}\, \zeta }}\right) ^{1/2} \leqslant \frac{1}{\sqrt{N }}+ \frac{N^{\varepsilon }}{N\, {{\mathrm{Im}}}\, \zeta } + 4 N^{-\varepsilon }\Lambda (\zeta ),\qquad \qquad \end{aligned}$$
(7.4)

for any \(\varepsilon >0\). The last term on the right hand side can be absorbed into the left hand side and since \(\varepsilon \) was arbitrary (7.4) yields

$$\begin{aligned} \begin{aligned} \Lambda (\zeta )\mathbbm {1}(\Lambda (\zeta )\leqslant N^{-\varepsilon /2})\,\prec \,\frac{1}{\sqrt{N}}+\frac{1}{N\,{{\mathrm{Im}}}\,\zeta }. \end{aligned} \end{aligned}$$
(7.5)

This inequality establishes a gap in the possible values that \(\Lambda \) can take, provided \(\varepsilon < 1/2\) because \(N^{-\varepsilon } \geqslant N^{-1/2} \!+ (N\,{{\mathrm{Im}}}\, \zeta )^{-1}\). Exactly as we argued for (3.11) we can get rid of the indicator function in (7.5) by using a continuity argument together with a union bound in order to obtain (7.3).

As in the proof of Theorem 2.7 we now use the fluctuation averaging to get an improved convergence rate for the normalized trace,

$$\begin{aligned} \begin{aligned} \left|{\frac{1}{N}}{{\mathrm{Tr\,}}}({\mathbf{G}}(\zeta )-{\mathbf{M}}(\zeta )) \right| \,\prec \, \frac{1}{{N}}+\frac{1}{(N\,{{\mathrm{Im}}}\,\zeta )^2}. \end{aligned} \end{aligned}$$
(7.6)

for all \(\zeta \in \mathbb {H}\) with \( \delta \leqslant {{\mathrm{dist}}}(\zeta ,[\kappa _-,\kappa _+])\leqslant \frac{1}{\delta }\) and \({{\mathrm{Im}}}\, \zeta \geqslant N^{-1+\varepsilon }\). Indeed, (7.6) is an immediate consequence of (7.3) and the fluctuation averaging Proposition 3.5.

Step 2: In this step we use (7.6) to prove the following lemma.

Lemma 7.4

(No eigenvalues away from \([\kappa _-,\kappa _+]\)) For any \(\delta ,\nu >0\) we have

$$\begin{aligned} \begin{aligned} \mathbb {P}\Bigl [{ {{\mathrm{Spec}}}({\mathbf{H}}) \cap (\mathbb {R}{{\setminus }}[\kappa _-\!-\delta ,\kappa _+\!+\delta ]) = \varnothing }\Bigr ] \,\geqslant \, 1- C N^{-\nu }, \end{aligned} \end{aligned}$$
(7.7)

for a positive constant C, depending only on the model parameters \(\mathscr {K}\) in addition to \(\delta \) and \(\nu \).

In order to show (7.7) fix \(\tau \in [-\delta ^{-1}\!,\kappa _-\!-\delta ] \cup [\kappa _+\!+\delta , \delta ^{-1}]\) and \( \eta \in [N^{-1+\varepsilon }\!,1]\), and let \( \{ {\lambda _i} \} _{i=1}^N \) be the eigenvalues of \( {\mathbf{H}}\). Employing (7.6) we get

$$\begin{aligned} \begin{aligned} \frac{\eta }{(\lambda _i-\tau )^2+\eta ^2}&\leqslant {{\mathrm{Im}}}\, {{\mathrm{Tr\,}}}{\mathbf{G}}(\tau +\mathrm {i} \eta ) \,\prec \, {{\mathrm{Im}}}\, {{\mathrm{Tr\,}}}{\mathbf{M}}(\tau +\mathrm {i} \eta ) + 1 + \frac{1}{N\eta ^2}\\&\lesssim _{\delta }\, N \eta + 1 + \frac{1}{N\eta ^{2}\!} . \end{aligned} \end{aligned}$$
(7.8)

Here, we used in the last inequality that \(\frac{1}{N} {{\mathrm{Tr\,}}}{\mathbf{M}}\) is the Stieltjes transform of the self-consistent density of states \(\rho \) with \({{\mathrm{supp\,}}}\rho \subseteq [\kappa _-, \kappa _+]\). Since the left hand side of (7.8) is a Lipschitz continuous function in \(\tau \) with Lipschitz constant bounded by N we can use a union bound to establish (7.8) first on a fine grid of \(\tau \)-values and then uniformly for all \(\tau \) and for the choice \(\eta =N^{-2/3}\),

$$\begin{aligned} \sup _{\tau }\frac{1}{N^{4/3}(\lambda _i-\tau )^2+1}\,\prec \, \frac{1}{N^{1/3}}. \end{aligned}$$

In particular, the eigenvalue \(\lambda _i\) cannot be at position \(\tau \) with very high probability, i.e.

$$\begin{aligned} \begin{aligned} \mathbb {P}\bigl [{\,\exists \,i \text { s.t. } \delta \leqslant {{\mathrm{dist}}}(\lambda _i, [\kappa _-, \kappa _+])\leqslant \delta ^{-1} }\bigr ] \,\leqslant \, C(\delta ,\nu ) N^{-\nu }. \end{aligned} \end{aligned}$$
(7.9)

Now we exclude that there are eigenvalues far away from the self-consistent spectrum by using a continuity argument. Let \(\widetilde{{\mathbf{W}}}\) be a standard GUE matrix with \(\mathbb {E}|\widetilde{w}_{xy} |^2=\frac{1}{N}\), \((\lambda _i^{(\alpha )})_i\) the eigenvalues of \({\mathbf{H}}^{(\alpha )}:=\alpha {\mathbf{H}}+(1-\alpha )\widetilde{{\mathbf{W}}}\) for \(\alpha \in [0,1]\) and \(\kappa :=\sup _{\alpha }\max \{|\kappa ^{(\alpha )}_+ |, |\kappa ^{(\alpha )}_- |\}\), where \(\kappa _\pm ^{(\alpha )}\) are defined as in (5.1) for the matrix \({\mathbf{H}}^{(\alpha )}\). In particular, \(\kappa _\pm ^{(0)}=\pm \,2\). Since the constant \(C(\delta ,{v})\) in (7.9) is uniform for all random matrices with the same model parameters \(\mathscr {K}\), we see that

$$\begin{aligned} \sup _{\alpha \in [0,1]}\, \mathbb {P}\bigl [{\,\exists \, i \text { s.t. }|\lambda _i^{(\alpha )} | \in [\kappa +\delta , \delta ^{-1}]\, }\bigr ]\,\leqslant \, C(\delta ,{v}) N^{-{v}} \end{aligned}$$

The eigenvalues \(\lambda _i^{(\alpha )}\) are Lipschitz continuous in \(\alpha \). In fact, \(|\partial _\alpha \lambda _i^{(\alpha )} |\leqslant \Vert {\mathbf{H}}-\widetilde{{\mathbf{W}}} \Vert \prec \sqrt{N}\). Here, the simple bound on \(\Vert {\mathbf{H}}-\widetilde{{\mathbf{W}}} \Vert \) follows from \( \mathbb {E}\Vert {\mathbf{H}}-\widetilde{{\mathbf{W}}} \Vert ^{2\mu } = \mathbb {E}[{{{\mathrm{Tr\,}}}({\mathbf{H}}-\widetilde{{\mathbf{W}}})^2}]^\mu \,\leqslant \,C(\mu )N^\mu \), for some positive constant \(C(\mu )\), depending on \(\mu \), the upper bound \(\underline{\kappa } \!\,_1\) from (2.23) on the moments, the sequence \(\underline{\kappa } \!\,_2\) from (2.24) and P from (2.11). Thus we can use a union bound to establish

$$\begin{aligned} \begin{aligned} \mathbb {P}\bigl [{\,\exists \alpha , i\, \text { s.t. } |\lambda _i^{(\alpha )} |\in [\kappa +2\delta , \delta ^{-1}-\delta ]\, }\bigr ]\,\leqslant \, C(\delta ,\nu ) N^{-{v}}. \end{aligned} \end{aligned}$$
(7.10)

Since for \(\alpha =0\) all eigenvalues are in \([-\kappa -2\delta , \kappa +2\delta ]\) with very high probability and with very high probability no eigenvalue can leave this interval by (7.10), we conclude that

$$\begin{aligned} \mathbb {P}\bigl [{\,\exists \, i \text { s.t. }|\lambda _i |\geqslant \kappa +2\delta \, }\bigr ]\,\leqslant \, C(\delta ,{v}) N^{-{v}}. \end{aligned}$$

Together with (7.9) this finishes the proof of Lemma 7.4.

Step 3 In this step we use (7.7) to improve the bound on the error matrix \({\mathbf{D}}\) away from \([-\kappa _-, \kappa _+]\) and thus show (7.1) and (7.2) by following the same strategy that was used in Step 1 and in the proof of Theorem 2.7.

By Lemma 7.4 there are with very high probability no eigenvalues in \( \mathbb {R}{\setminus } [\kappa _{-}-\delta /2,\kappa _{+}+\delta /2]\, \). Therefore, for any \(B \subseteq \mathbb {X}\) also the submatrix \({\mathbf{H}}^B\) of \({\mathbf{H}}\) has no eigenvalues in this interval. In particular, for any \(x \in \mathbb {X}{\setminus } B\) we have

$$\begin{aligned} \begin{aligned} {{\mathrm{Im}}}\, G_{xx}^B(\zeta )\,\sim _\delta \, {{\mathrm{Im}}}\, \zeta , \qquad \zeta \in \mathbb {H},\; \delta \leqslant {{\mathrm{dist}}}(\zeta ,[\kappa _-,\kappa _+])\leqslant \delta ^{-1}, \end{aligned} \end{aligned}$$
(7.11)

in a high probability event. As in the proof of Lemma 3.4 we bound the entries of the error matrix \({\mathbf{D}}\) by estimating the right hand sides of the Eqs. (5.9a)–(5.9e) further. But now we use (7.11), so that \({{\mathrm{Im}}}\, \zeta \) in the denominators cancel and we end up with

$$\begin{aligned} \begin{aligned} \Vert {\mathbf{D}}(\zeta ) \Vert _{\mathrm{max}}\mathbbm {1}(\Lambda (\zeta )\leqslant N^{-\varepsilon })\,\prec \, N^{-1/2} , \quad \text {whenever}\quad \delta \leqslant {{\mathrm{dist}}}(\zeta ,[\kappa _-,\kappa _+])\leqslant \delta ^{-1}. \end{aligned} \end{aligned}$$
(7.12)

Following the strategy of proof from Step 1 we see that (7.12) implies (7.1) and (7.2). This finishes the proof of Proposition 7.1. \(\square \)

Proof of Corollary 2.9

The proof follows a standard argument that establishes rigidity from the local law, which we present here for the convenience of the reader. The argument uses a Cauchy-integral formula that was also applied in the construction of the Helffer–Sjöstrand functional calculus (cf. [15]) and it already appeared in different variants in [22, 27, 28].

Let \(\tau \in \mathbb {R}\) such that \(\rho (\tau )\geqslant \delta \) for some \(\delta >0\). We will now apply Lemma 5.1 of [4] which shows how to estimate the difference between two measures in terms of the difference of their Stieltjes transforms. With the same notation that was used in the statement of that lemma we make the choices

$$\begin{aligned} \mathrm{v}_1(\mathrm {d}\sigma )\,:=\, \rho (\sigma )\mathrm {d}\sigma ,\qquad \nu _2(\mathrm {d}\sigma )\,:=\, \frac{1}{N}\sum _i \delta _{\lambda _i}(\mathrm {d}\sigma ), \end{aligned}$$

and \( \tau _1 := \kappa _- -\widetilde{\delta }\), \( \tau _2 := \tau \), \( \eta _1 := N^{-1/2} \), \( \eta _2 := N^{-1+\widetilde{\varepsilon }}\), \( \varepsilon := 1\), for some fixed \(\widetilde{\delta }, \widetilde{\varepsilon }>0\). We estimate the error terms \(J_1\), \(J_2\) and \(J_3\) from Lemma 5.1 of [4] by using (7.2) and (2.29). In this way we find

$$\begin{aligned} \left|\,N {\textstyle \,\int _{[\kappa _--\widetilde{\delta },\tau ]}} \rho (\sigma ) \mathrm {d}\sigma \,-\, |{{\mathrm{Spec}}}({\mathbf{H}})\!\cap \! [\kappa _-\!-\widetilde{\delta },\tau ] |\, \right| \,\prec \, N^{\widetilde{\varepsilon }}. \end{aligned}$$

Since \(\widetilde{\varepsilon }\) was arbitrary and there are no eigenvalues of \({\mathbf{H}}\) to the left of \(\kappa _-\!-\widetilde{\delta }\) (cf. Lemma 7.4), we infer

$$\begin{aligned} \begin{aligned} \left|\,N {\textstyle \,\int _{[-\infty ,\tau ]}} \rho (\sigma ) \mathrm {d}\sigma \,-\, |{{\mathrm{Spec}}}({\mathbf{H}})\!\cap \! (-\infty ,\tau ] |\, \right| \,\prec \, 1, \end{aligned} \end{aligned}$$
(7.13)

for any \(\tau \in \mathbb {R}\) with \(\rho (\tau )\geqslant \delta \). Combining (7.13) with the definition (2.30) of \(i(\tau )\) yields the bound \( \left| \int _\tau ^{\lambda _{i(\tau )}} \!\rho (\sigma )\mathrm {d}\sigma \, \right| \prec N^{-1} \). This in turn implies (2.31) and Corollary 2.9 is proven. \(\square \)

7.2 Bulk universality

Given the local law (Theorem 2.7), the proof of bulk universality (Corollaries 2.10 and 2.11) follows standard arguments based upon the three step strategy explained in the introduction. We will only sketch the main differences due to the correlations. We start by introducing an Ornstein–Uhlenbeck (OU) process on random matrices \( {\mathbf{H}}_t \) that conserves the first two mixed moments of the matrix entries

$$\begin{aligned} \begin{aligned} \mathrm {d}{\mathbf{H}}_{t} \,=\, -\frac{1}{2}({\mathbf{H}}_{t}-{\mathbf{A}})\mathrm {d}t + \Sigma ^{1/2}[\mathrm {d}{\mathbf{B}}_{t}], \qquad {\mathbf{H}}_0\,=\, {\mathbf{H}}, \end{aligned} \end{aligned}$$
(7.14)

where the covariance operator \(\Sigma : \mathbb {C}^{N \times N} \rightarrow \mathbb {C}^{N \times N}\) is given as

$$\begin{aligned} \Sigma [{\mathbf{R}}]\,:=\, \mathbb {E}\,\langle {{\mathbf{W}}} , {{\mathbf{R}}}\rangle {\mathbf{W}}, \end{aligned}$$

and \({\mathbf{B}}_t\) is matrix of standard real (complex) independent Brownian motions with the appropriate symmetry \({\mathbf{B}}_t^*={\mathbf{B}}_t\) for \(\beta =1\) (\(\beta =2\)) whose distribution is invariant under the orthogonal (unitary) symmetry group. We remark that a large Gaussian component, as created by the flow (7.14), was first used in [39] to prove universality for the Hermitian symmetry class.

Along the flow the matrix \( {\mathbf{H}}_t = {\mathbf{A}} +\frac{1}{\sqrt{N}} {\mathbf{W}}_t \) satisfies the condition B3 on the dependence of the matrix entries uniformly in t. In particular, since \( \Sigma \) determines the operator \( \mathcal {S} \) we see that \( {\mathbf{H}}_t \) is associated to the same MDE as the original matrix \( {\mathbf{H}} \). Also the condition B4 and B5 can be stated in terms of \( \Sigma \), and are hence both conserved along the flow.

For the following arguments we write \({\mathbf{W}}_t\) as a vector containing all degrees of freedom originating from the real and imaginary parts of the entries of \({\mathbf{W}}_t\). This vector has \(N(N+1)/2\) real entries for \(\beta =1\) and \(N^2\) real entries for \(\beta =2\). We partition \(\mathbb {X}^2=\mathbb {I}_{\leqslant }\dot{\cup }\mathbb {I}_>\) into its upper, \(\mathbb {I}_{\leqslant }:=\{(x,y): x \leqslant y\}\), and lower, \(\mathbb {I}_{>}=\{(x,y): x > y\}\), triangular part. Then we identify

$$\begin{aligned} \frac{1}{\sqrt{N}}{\mathbf{W}}_t\,=\, {\left\{ \begin{array}{ll} (w_t(\alpha ))_{\alpha \in \mathbb {I}_{\leqslant }}&{}\quad \text { if }\; \beta =1, \\ (w_t(\alpha ))_{\alpha \in \mathbb {X}^2}&{}\quad \text { if }\; \beta =2, \end{array}\right. } \end{aligned}$$

where \(w_t((x,y)):=\frac{1}{\sqrt{N}}w_{xy}\) for \(\beta =1\) and

$$\begin{aligned} w_t((x,y))\,:=\, {\left\{ \begin{array}{ll} \frac{1}{\sqrt{N}}{{\mathrm{Re\,}}}w_{xy}&{}\quad \text { for }\; (x,y) \in \mathbb {I}_{\leqslant }, \\ \frac{1}{\sqrt{N}}{{\mathrm{Im}}}\, w_{xy}&{}\quad \text { for }\; (x,y) \in \mathbb {I}_>, \end{array}\right. } \end{aligned}$$

for \(\beta =2\). In terms of the vector \(w_t\) the flow (7.14) takes the form

$$\begin{aligned} \begin{aligned} \mathrm {d}w_t \,=\, -\frac{w_t}{2} + \Sigma ^{1/2}\mathrm {d}b_t, \end{aligned} \end{aligned}$$
(7.15)

where \( b_t=(b_t(\alpha ))_\alpha \) is a vector of independent standard Brownian motions, and \( \Sigma ^{1/2} \) is the square-root of the covariance matrix corresponding to \( {\mathbf{H}}={\mathbf{H}}_0\):

$$\begin{aligned} \Sigma (\alpha ,\beta ) \,:=\, \mathbb {E}\,w_0(\alpha )w_0(\beta ). \end{aligned}$$

Recall the notation \( B_\tau (x) = \{{y \in \mathbb {X} : d(x,y) \leqslant \tau }\} \) for any \(x\in \mathbb {X}\), and set

$$\begin{aligned} \begin{aligned} \mathcal {B}_k((x,y)) \,&:=\, (B_{kN^{\varepsilon }}(x)\times B_{kN^{\varepsilon }}(y))\cup (B_{kN^{\varepsilon }}(y)\times B_{kN^{\varepsilon }}(x)), \qquad k=1,2. \end{aligned} \end{aligned}$$

Using (2.11) and B3 we see that for any \( \alpha \)

$$\begin{aligned} \begin{aligned} |\mathcal {B}_2(\alpha ) | \,\leqslant \, N^{C\varepsilon } \qquad \text {and}\qquad |\Sigma (\alpha ,\gamma ) |&\leqslant C(\varepsilon ,{v})N^{-{v}},\qquad \gamma \notin \mathcal {B}_1(\alpha ) , \end{aligned} \end{aligned}$$
(7.16)

respectively. For any fixed \(\alpha \), we denote by \( w^\alpha \) the vector obtained by removing all the entries of w which may become strongly dependent on the component \( w(\alpha ) \) along the flow (7.15), i.e., we define

$$\begin{aligned} \begin{aligned} w^\alpha (\gamma ) := w(\gamma ) \mathbbm {1}(\gamma \notin \mathcal {B}_2(\alpha )) . \end{aligned} \end{aligned}$$
(7.17)

In the case that \( {\mathbf{X}}\) has independent entries it was proven in [12] that the process (7.15) conserves the local eigenvalue statistics of \( {\mathbf{H}} \) up to times \( t \ll N^{-1/2} \), provided bulk local law holds uniformly in t along the flow as well. We will now show that this insight extends for dependent random matrices as well. The following result is a straightforward generalization of Lemma A.1. from [12] to matrices with dependent entries. A similar result was independently given in [14].

Lemma 7.5

(Continuity of the OU flow) For every \( \varepsilon > 0 \), \({v} \in \mathbb {N}\) and smooth function f there is \( C(\varepsilon ,{v}) < \infty \), such that

$$\begin{aligned} \begin{aligned} |\mathbb {E}\,f(w_t)-\mathbb {E}\,f(w_0) | \,\leqslant \, C(\varepsilon ,{v})\,\bigl (\,N^{1/2+\varepsilon }\,\Xi \, + N^{-{v}}\widetilde{\Xi }\,\bigr )\,t, \end{aligned} \end{aligned}$$
(7.18)

where

$$\begin{aligned} \Xi \,:= & {} \sup _{s\leqslant t}\max _{\alpha ,\delta ,\gamma } \sup _{\theta \in [0,1]} \mathbb {E}\biggl [\,\Bigl (N^{1/2}|w_s(\alpha ) |+N^{3/2}|w_s(\alpha )w_s(\delta )w_s(\gamma ) |\Bigr )\,\left|\partial ^3_{\alpha \delta \gamma }f\bigl (w_s^{\alpha ,\theta }\bigr ) \right|\, \biggr ]\nonumber \\ \widetilde{\Xi }:= & {} \sup _{\widetilde{w}} \max _{\alpha ,\delta ,\gamma }\, \Bigl ( \left|\partial ^2_{\alpha \delta }f(\widetilde{w}) \right| \,+\, (1+|w(\alpha ) |)\, \left|\partial ^3_{\alpha \delta \gamma }f(\widetilde{w}) \right|\Bigr ) , \end{aligned}$$
(7.19)

where \( w^{\alpha ,\theta }_s \!:= w^{\alpha }_s + \theta \,(w_s-w^{\alpha }_s) \) for \( \theta \in [0,1] \), and \( \partial ^k_{\alpha _1\cdots \alpha _k} =\frac{\partial }{\partial w(\alpha _1)}\cdots \frac{\partial }{\partial w(\alpha _k)} \).

Proof

We will suppress the t-dependence, i.e. we write \( w = w_t \), etc. Ito’s formula yields

$$\begin{aligned} \begin{aligned} \mathrm {d}f (w) \,=\, \sum _\alpha \biggl ( -\frac{w(\alpha )}{2}\partial _\alpha f(w) + \frac{1}{2}\sum _{\delta } \Sigma (\alpha ,\delta )\partial ^2_{\alpha \delta } f(w)\biggr ) \mathrm {d}t\,+\,\mathrm {d}M, \end{aligned} \end{aligned}$$
(7.20)

where \( \mathrm {d}M = \mathrm {d}M_t \) is a martingale term. Taylor expansion around \( w = w^\alpha \) yields

$$\begin{aligned} \partial _\alpha f(w)&= \partial _\alpha f(w^\alpha ) \,+ \sum _{\delta \in \mathcal {B}_2(\alpha )}w(\delta )\partial ^2_{\alpha \delta }f(w^\alpha )\\&\quad +\, \sum _{\delta ,\gamma \in \mathcal {B}_2(\alpha )} w(\delta )w(\gamma )\!\int _0^1\!(1-\theta )\partial ^3_{\alpha \delta \gamma }f(w^{\alpha ,\theta }) \mathrm {d}\theta \\ \partial ^2_{\alpha \delta } f(w)&= \partial ^2_{\alpha \delta } f(w^\alpha ) \,+ \sum _{\gamma \in \mathcal {B}_2(\alpha )} w(\gamma )\! \int _0^1\!(1-\theta )\, \partial ^3_{\alpha \delta \gamma } f(w^{\alpha ,\theta })\mathrm {d}\theta . \end{aligned}$$

By plugging these into (7.20) and taking expectation, we obtain

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t}\mathbb {E}\,f(w) \,&=\, -\,\frac{1}{2}\sum _\alpha \mathbb {E}\,w(\alpha )\partial _\alpha f(w^\alpha ) \end{aligned}$$
(7.21a)
$$\begin{aligned}&\quad -\,\frac{1}{2} \sum _{\alpha }\sum _{\delta \in \mathcal {B}_2(\alpha )} \mathbb {E}\Bigl [ \,\bigl (w(\alpha )w(\delta )-\Sigma (\alpha ,\delta )\bigr )\partial ^2_{\alpha \delta }f(w^\alpha )\,\Bigr ] \end{aligned}$$
(7.21b)
$$\begin{aligned}&\quad +\,\frac{1}{2}\sum _\alpha \sum _{\delta \notin \mathcal {B}_2(\alpha )} \Sigma (\alpha ,\delta )\;\mathbb {E}\,\partial ^{2}_{\alpha \delta }f(w^\alpha ) \end{aligned}$$
(7.21c)
$$\begin{aligned}&\quad -\, \sum _\alpha \sum _{\delta ,\gamma \in \mathcal {B}_2(\alpha )} \int _0^1\! (1-\theta )\,\mathbb {E}\Bigl [w(\alpha )w(\delta )w(\gamma )\partial ^{3}_{\alpha \delta \gamma }f(w^{\alpha ,\theta })\Bigr ]\,\mathrm {d}\theta \end{aligned}$$
(7.21d)
$$\begin{aligned}&\quad +\,\frac{1}{2}\sum _{\alpha ,\delta } \,\Sigma (\alpha ,\delta ) \sum _{\gamma \in \mathcal {B}_2(\alpha )} \int _0^1\! (1-\theta )\, \mathbb {E}\Bigl [w(\gamma )\partial ^{3}_{\alpha \delta \gamma } f(w^{\alpha ,\theta })\Bigr ]\,\mathrm {d}\theta . \end{aligned}$$
(7.21e)

Now, we estimate the five terms on the right hand side of (7.21) separately.

First, (7.21a) is small since \( w(\alpha ) \) is almost independent of \( w^\alpha \) by B3 and (7.17):

$$\begin{aligned} \mathbb {E}\,w(\alpha )\partial _\alpha f(w^\alpha ) \,=\, \mathbb {E}\,w(\alpha )\;\mathbb {E}\,\partial _\alpha f(w^\alpha ) \,+\,\mathrm{Cov}(w(\alpha ),\partial _\alpha f(w^\alpha )) =\, \mathcal {O}_{\varepsilon ,\nu }\bigl (\,\widetilde{\Xi }N^{-{v}}\,\bigr ) . \end{aligned}$$

In the term (7.21b), if \( \delta \in \mathcal {B}_1(\alpha ) \), then \( w(\alpha )w(\delta ) \) is almost independent of \( w^\alpha \):

$$\begin{aligned}&\mathbb {E}\Bigl [ \bigl (w(\alpha )w(\delta )-\Sigma (\alpha ,\delta )\bigr )\partial ^2_{\alpha \delta }f(w^\alpha )\Bigr ] \\&=\,\mathrm{Cov}\bigl (w(\alpha )w(\delta ),\,\partial ^2_{\alpha \delta }f(w^\alpha )\bigr ) \,=\, \mathcal {O}_{\varepsilon ,\nu }\bigl (\,\widetilde{\Xi }N^{-\mathrm{v}}\bigr ) . \end{aligned}$$

If \( \delta \in \mathcal {B}_2(\alpha )\backslash \mathcal {B}_1(\alpha ) \), then \( w(\alpha ) \) is almost independent of \( (w(\delta ),w^\alpha ) \) and

$$\begin{aligned}&\left|\,\mathbb {E}\Bigl [ \bigl (w(\alpha )w(\delta )-\Sigma (\alpha ,\delta )\bigr )\partial ^2_{\alpha \delta }f(w^\alpha )\Bigr ] \right|\\&\quad \leqslant \, \left|\mathrm{Cov}\bigl (w(\alpha ),\,w(\delta )\partial ^2_{\alpha \delta }f(w^\alpha )\bigr ) \right| +\,|\Sigma (\alpha ,\delta ) |\, \left|\mathbb {E}\,\partial ^2_{\alpha \delta }f(w^\alpha ) \right| \\&\quad \leqslant \, C(\varepsilon ,\nu )\, {\textstyle \sup _{w} \max _{\alpha ,\delta ,\gamma }}\,\bigl (|\partial ^2_{\alpha \delta }f(w) | +|w(\alpha ) ||\partial ^3_{\alpha \delta \gamma }f(w) |\bigr )N^{-\nu } , \end{aligned}$$

where we have used (7.16). The last term containing derivatives is bounded by \( \widetilde{\Xi } \).

The term (7.21c) is negligible by \( |\Sigma (\alpha ,\delta ) | \lesssim _{\varepsilon ,\mathrm{v}} N^{-\mathrm{v}} \) and \( |\mathbb {E}\,\partial ^2_{\alpha \delta }f(w^\alpha ) | \leqslant \widetilde{\Xi } \). For (7.21d) we use (7.16) and the definition of \(\Xi \) to obtain

$$\begin{aligned} \sum _\alpha \sum _{\delta ,\gamma \in \mathcal {B}_2(\alpha )} \int _0^1\! (1-\theta )\,\left|\,\mathbb {E}\Bigl [w(\alpha )w(\delta )w(\gamma )\partial ^{3}_{\alpha \delta \gamma }f(w^{\alpha ,\theta })\Bigr ]\,\mathrm {d}\theta \, \right| \;\leqslant \; N^{1/2+C\varepsilon }\,\Xi . \end{aligned}$$

The last term (7.21e) is estimated similarly

$$\begin{aligned} \sum _{\gamma \in \mathcal {B}_2(\alpha )} \int _0^1\! (1-\theta )\, \mathbb {E}\Bigl [w(\gamma )\partial ^{3}_{\alpha \delta \gamma } f(w^{\alpha ,\theta })\Bigr ]\,\mathrm {d}\theta \;\leqslant \; N^{-1/2+C\varepsilon }\,\Xi , \end{aligned}$$

and the double sum over \( \alpha ,\delta \) produces a factor of size CN due to the exponential decay of \(\Sigma \). Combining the estimates for the five terms on the right hand side of (7.21) we obtain (7.18).\(\square \)

Proof of Corollaries 2.10 and 2.11

We will only sketch the argument here as the procedure is standard. First we show that the matrix \( {\mathbf{H}}_t \) defined through (7.14) satisfies the bulk universality if \( t \geqslant t_N := N^{-1+\xi _1} \), for any \( \xi _1 > 0\). For simplicity, we will focus on \(t\leqslant N^{-1/2}\) only. Indeed, from the fullness assumption B5 it follows that \( {\mathbf{H}}_t \) is of the form

$$\begin{aligned} \begin{aligned} {\mathbf{H}}_t = \widetilde{{\mathbf{H}}}_t + c(t)t^{1/2}{\mathbf{U}} , \end{aligned} \end{aligned}$$
(7.22)

where \( c(t) \sim 1\) and \( {\mathbf{U}} \) is a GUE/GOE-matrix independent of \( \widetilde{{\mathbf{H}}}_t \). For \( t \leqslant N^{-1/2} \) the matrix \( \widetilde{{\mathbf{H}}}_t \) has essentially the same correlation structure as \({\mathbf{H}}\), controlled by essentially the same model parameters. In particular the corresponding \( \widetilde{\mathcal {S}}_t \) operator is almost the same as \(\mathcal {S}\). Let \( \widetilde{{\mathbf{M}}}_t \) solve the corresponding MDE with \( \mathcal {S} \) replaced by \( \widetilde{\mathcal {S}}_t\) and let \( \widetilde{\rho }_t \) denote the function related to \( \widetilde{{\mathbf{M}}}_t \) similarly as \( \rho \) is related to \( {\mathbf{M}} \) (see Definition 2.3). Using the general stability for MDEs, Theorem 2.6, with

$$\begin{aligned} \varvec{\mathfrak {G}}({\mathbf{0}}) \,:=\, {\mathbf{M}},\qquad {\mathbf{D}} \,:=\, (\widetilde{\mathcal {S}}_t-\mathcal {S})[\widetilde{{\mathbf{M}}}_t]\,\widetilde{{\mathbf{M}}}_t,\qquad \varvec{\mathfrak {G}}({\mathbf{D}}) \,=\, \widetilde{{\mathbf{M}}}_t, \end{aligned}$$

(cf. (2.17)) it is easy to check that \( \widetilde{{\mathbf{M}}}_t \) is close to \({\mathbf{M}}\), in particular \( \widetilde{\rho }_t(\omega ) \geqslant \delta /2 \) when \( \rho (\omega ) \geqslant \delta \). Moreover, the local law applies to \( \widetilde{{\mathbf{H}}}_t \) as well, i.e. the resolvent \( \widetilde{{\mathbf{G}}}_t(\zeta ) \) of \( \widetilde{{\mathbf{H}}}_t\) approaches \( \widetilde{{\mathbf{M}}}_t(\zeta ) \) for spectral parameters \(\zeta \) with \(\rho ({{\mathrm{Re\,}}}\zeta )\geqslant \delta \). The bulk spectrum of \( \widetilde{{\mathbf{H}}}_t \) is therefore the same as that of \( {\mathbf{H}} \) in the limit. Combining these facts with the decomposition (7.22) we can apply Theorem 2.2 from the recent work [41] to conclude bulk universality for \( {\mathbf{H}}_t \), with \( t=t_N = N^{-1+\xi _1} \) in the sense of correlation functions as in Corollary 2.10. In order to prove the gap universality, Corollary 2.11, we use Theorems 2.4 and 2.5 from [42] or Theorem 2.1 and Remark 2.2 from [26].

Second, we use Lemma 7.5 to show that \({\mathbf{H}}\) and \({\mathbf{H}}_t\) have the same local correlation functions in the bulk. Suppose \( \rho (\omega ) \geqslant \delta \) for some \( \omega \in \mathbb {R}\). We show that the difference

$$\begin{aligned} (\tau _1,\ldots ,\tau _k) \mapsto (\rho _{k;t_N}-\rho _k)\bigl (\omega +{\textstyle \frac{\,\tau _{1}\!}{N},\ldots ,\omega +\frac{\,\tau _{k}\!}{N}}\bigr ) \end{aligned}$$

of the local k-point correlation functions \( \rho _k \) and \( \rho _{k;t_N} \) of \( {\mathbf{H}} \) and \( {\mathbf{H}}_{t_N} \), respectively, converge weakly to zero as \( N \rightarrow \infty \). This convergence follows from the standard arguments provided that

$$\begin{aligned} |\mathbb {E}\,F({\mathbf{H}}_{t})-\mathbb {E}\,F({\mathbf{H}}) | \,\rightarrow \, 0 , \end{aligned}$$

where \( F = F_N \) is a function of \( {\mathbf{H}} \) expressed as a smooth function \( \Phi \) of the following observables

$$\begin{aligned} \qquad \frac{1}{N^p}{{\mathrm{Tr\,}}}\,\prod _{j=1}^p({\mathbf{H}}-\zeta _j^\pm )^{-1} ,\qquad \zeta ^\pm _j := \omega +\frac{\tau _{i_j}}{N}\pm \mathrm {i}N^{-1-\xi _2},\quad j =1,\ldots ,p, \end{aligned}$$

with \( p \leqslant k \) and \( \xi _2 \in (0,1) \) sufficiently small. Here the derivatives of \( \Phi \) might grow only as a negligible power of N (for details see the proof of Theorem 6.4 in [27]). In particular, basic resolvent formulas yield

$$\begin{aligned} \text {RHS of }(7.18) \,\leqslant \, CN^{\varepsilon '}N^{C'\xi _2}\mathbb {E}\Bigl [(1+\Lambda _t)^{C''}\Bigr ]\,N^{1/2+\varepsilon }t , \end{aligned}$$

where \( \Lambda _t \) is defined like \( \Lambda \) in (3.9) but for the entries of \( {\mathbf{G}}_t(\zeta ) := ({\mathbf{H}}_t-\zeta )^{-1} \) with \( {{\mathrm{Im}}}\,\zeta \geqslant N^{-1+\xi _2} \). In particular, we have used \( |G_{xy}(t) | \leqslant |m_{xy}(t) | + \Lambda _t \lesssim 1+\Lambda _t \) here. The constant \( \Xi \) from (7.19) is easily bounded by \( N^{\varepsilon '+C\xi _2} \), where the arbitrary small constant \( \varepsilon ' > 0 \) originates from stochastic domination estimates for \( \Lambda _t \) and \( |w_s(\alpha ) |\)’s. The constant \( \widetilde{\Xi } \) from (7.19), on the other hand, is trivially bounded by \( N^{C} \) since the resolvents satisfy trivial bounds in the regime \( |{{\mathrm{Im}}}\,\zeta | \geqslant N^{-2} \), and the weight \( |w(\alpha ) | \) multiplying the third derivatives of f is canceled for large values of \( |w(\alpha ) | \) by the inverse in the definition \( {\mathbf{G}} = ({\mathbf{A}}+N^{-1/2}{\mathbf{W}}-\zeta {\mathbf{1}})^{-1} \). Since the local law holds for \( {\mathbf{H}}_t \), uniformly in \( t \in [0,t_N]\), we see that \( \Lambda _t \leqslant N^{\varepsilon '} (N\eta )^{-1/2} \leqslant N^{\varepsilon '-\xi _2/2} \) with very high probability and hence

$$\begin{aligned} \begin{aligned} |\mathbb {E}\,F({\mathbf{H}}_{t_N})-\mathbb {E}\,F({\mathbf{H}}) | \leqslant C(\varepsilon )N^{1/2+\varepsilon }N^{-1+\xi _1}N^{C(\varepsilon '+\xi _2)}. \end{aligned} \end{aligned}$$
(7.23)

Choosing the exponents \( \varepsilon ,\varepsilon ',\xi _1,\xi _2 \) sufficiently small we see that the right hand side goes to zero as \(N \rightarrow \infty \). This completes the proof of Corollary 2.10. Finally, the comparison estimate (7.23) and the rigidity bound (2.31) allows us to compare the gap distributions of \({\mathbf{H}}_{t_N}\) and \({\mathbf{H}}\), see Theorem 1.10 of [40]. This proves Corollary 2.11. \(\square \)