1 Introduction and main results

1.1 Motivation

Many objects which arise in Diophantine Geometry exhibit random-like behavior. For instance, the classical Khinchin theorem in Diophantine approximation can be interpreted as the Borel–Cantelli Property for quasi-independent events, while Schmidt’s quantitative generalization of Khinchin’s Theorem is analogous to the Law of Large Numbers. One might ask whether much deeper probabilistic phenomena also take place.

In this paper, we develop a general framework which allows us to capture certain independence properties which govern the asymptotic behavior of arithmetic counting functions. We expect that the new methods will have a wide range of applications in Diophantine Geometry; here we apply the techniques to study the distribution of a class of counting functions which we now describe.

A basic problem in Diophantine approximation is to find “good” rational approximants of vectors \(\overline{u} = (u_1,\ldots ,u_m) \in \mathbb {R}^m\). More precisely, given positive numbers \(w_1,\ldots ,w_m\), which we shall assume sum to one, and positive constants \(\vartheta _1,\ldots ,\vartheta _m\), we consider the system of inequalities

$$\begin{aligned} \left| u_j - \frac{p_j}{q}\right| \leqslant \frac{\vartheta _j}{q^{1+w_j}}, \quad \hbox {for}\ j = 1,\ldots ,m, \end{aligned}$$
(1.1)

with \((\overline{p},q)\in \mathbb {Z}^m\times \mathbb {N}\). It is well-known that for Lebesgue-almost all \(\overline{u} \in \mathbb {R}^m\), the system (1.1) has infinitely many solutions \((\overline{p},q)\in \mathbb {Z}^m\times \mathbb {N}\), so it is natural to try to count solutions in bounded regions, which leads us to the counting function

$$\begin{aligned} \Delta _T(\overline{u}):=|\{(\overline{p},q)\in \mathbb {Z}^m\times \mathbb {N}:\, 1\leqslant q< T \ \text {and}\ (1.1)\hbox { holds} \}|. \end{aligned}$$

Schmidt [15] proved that for Lebesgue-almost all \(\overline{u}\in [0,1]^m\),

$$\begin{aligned} \Delta _T(\overline{u})=C_m\, \log T+O_{\overline{u},\varepsilon }((\log T)^{1/2+\varepsilon }),\quad \hbox {for all }\varepsilon >0, \end{aligned}$$
(1.2)

where \(C_m:=2^m\vartheta _1\cdots \vartheta _m\). One may view this as an analogue of the Law of Large Numbers, the heuristic for this analogy runs along the following lines. First, note that

$$\begin{aligned} \Delta _T(\overline{u})\approx \sum _{s=0}^{\lfloor \log T\rfloor } \Delta ^{(s)}(\overline{u}), \end{aligned}$$

where

$$\begin{aligned} \Delta ^{(s)}(\overline{u}):=|\{(\overline{p},q)\in \mathbb {Z}^m\times \mathbb {N}:\, e^s\leqslant q< e^{s+1} \ \text {and}\ (1.1) \hbox { holds} \}|. \end{aligned}$$

If one could prove that the functions \(\Delta ^{(s_1)}(\cdot )\) and \(\Delta ^{(s_2)}(\cdot )\) were “quasi-independent” random variables on \([0,1]^m\), at least when \(s_1\), \(s_2\), and \(|s_1-s_2|\) are sufficiently large, then (1.2) would follow by some version of the Law of Large Numbers. Moreover, the same heuristic further suggests that, in addition to the Law of Large Numbers, a central limit theorem and perhaps other probabilistic limit laws also hold for \(\Delta _T(\cdot )\).

In this paper, we put the above heuristic on firm ground. We do so by representing \(\Delta ^{(s)}(\cdot )\) as a function on the space of unimodular lattices. It turns out that the “quasi-independence” of the family \((\Delta ^{(s)})\) that we are trying to capture can be translated into the dynamical language of higher-order mixing for a subgroup of linear transformations acting on the space of lattices.

1.2 Main results

We are not the first to explore central limit theorems for Diophantine approximants. The one-dimensional case \((m=1)\) has been thoroughly investigated by Leveque [11, 12], Philipp [13], and Fuchs [6], leading to the following result proved by Fuchs [6]: there exists an explicit \(\sigma > 0\) such that the counting function

$$\begin{aligned} \Delta _T(u):=\left| \left\{ (p,q)\in \mathbb {Z}\times \mathbb {N}:\, 1\leqslant q<T,\quad \left| u-p/q\right| < \frac{\vartheta }{q^2 \log (1+q)} \right\} \right| \end{aligned}$$

satisfies

$$\begin{aligned} \left| \left\{ u\in [0,1]:\, \frac{\Delta _T(u)-2\vartheta \,\log \log T}{(\log \log T\cdot \log \log \log T)^{1/2}}<\xi \right\} \right| \longrightarrow \hbox {Norm}_\sigma (\xi ) \end{aligned}$$
(1.3)

as \(T\rightarrow \infty \), where

$$\begin{aligned} \hbox {Norm}_\sigma (\xi ):=(2\pi \sigma )^{-1/2} \int _{-\infty }^\xi e^{-s^2/(2\sigma )}\,ds \end{aligned}$$

denotes the normal distribution with the variance \(\sigma \).

Central limit theorems in higher dimensions when \(w_1=\cdots =w_m=1/m\) have recently been studied Dolgopyat et al. [3]. In this paper, using very different techniques, we establish the following CLT for general exponents \(w_1,\ldots ,w_m\).

Theorem 1.1

Let \(m\geqslant 2\). Then for every \(\xi \in \mathbb {R},\)

$$\begin{aligned} \left| \left\{ \overline{u}\in [0,1]^m:\, \frac{\Delta _T(\overline{u})-C_m\,\log T}{(\log T)^{1/2}}<\xi \right\} \right| \longrightarrow {\mathrm{Norm}}_{\sigma _m}(\xi ) \end{aligned}$$
(1.4)

as \(T\rightarrow \infty ,\) where

$$\begin{aligned} \sigma _{m}:=2 C_m(2\zeta (m)\zeta (m+1)^{-1}-1), \end{aligned}$$

and \(\zeta \) denotes Riemann’s \(\zeta \)-function.

Our proof of Theorem 1.1, as well as the proof in [3], proceeds by interpreting \(\Delta _T(\cdot )\) as a function on a certain subset \(\mathcal {Y}\) of the space of all unimodular lattices in \(\mathbb {R}^{m+1}\), and then studies how the sequence \(a^s\mathcal {Y}\), where a is a fixed linear transformation of \(\mathbb {R}^{m+1}\), distributes inside this space. However, the arguments in the two papers follow very different routes. The proof in [3] contains a novel refinement of the martingale method (this approach was initiated in this setting by Le Borgne [10]). Here, one crucially uses the fact that when \(w_1=\cdots =w_m=1/m\), then the set \(\mathcal {Y}\) is an unstable manifold for the action of a on the space of lattices. For general weights, \(\mathcal {Y}\) has strictly smaller dimension than the unstable leaves, and it seems challenging to apply martingale approximation techniques. Instead, our method involves a quantitative analysis of higher-order correlations for functions on the space of lattices. We establish an asymptotic formula for correlations of arbitrary orders and use this formula to compute limits of all the moments of \(\Delta _T(\cdot )\) directly. One of the key innovations of our approach is an efficient way of estimating sums of cumulants (alternating sums of moments) developed in our recent work [2].

We also investigate the more general problem of Diophantine approximation for systems of linear forms. The space \(\hbox {M}_{m,n}(\mathbb {R})\) of m linear forms in n real variables is parametrized by real \(m\times n\) matrices. Given \(u\in \hbox {M}_{m,n}(\mathbb {R})\), we consider the family \((L_u^{(i)})\) of linear forms defined by

$$\begin{aligned} L_u^{(i)}(x_1,\ldots ,x_n)=\sum _{j=1}^n u_{ij} x_j,\quad i=1,\ldots ,m. \end{aligned}$$

Let \(\Vert \cdot \Vert \) be a norm on \(\mathbb {R}^n\). Fix \(\vartheta _1,\ldots ,\vartheta _m > 0\) and \(w_1,\ldots ,w_m>0\) which satisfy

$$\begin{aligned} w_1+\cdots +w_m=n, \end{aligned}$$

and consider the system of Diophantine inequalities

$$\begin{aligned} \left| p_i+L_u^{(i)}(q_1,\ldots ,q_n)\right| < \vartheta _i\, \Vert \overline{q}\Vert ^{-w_i},\quad i=1,\ldots ,m, \end{aligned}$$
(1.5)

with \((\overline{p},\overline{q})=(p_1,\ldots ,p_m,q_1,\ldots ,q_n)\in \mathbb {Z}^m\times (\mathbb {Z}^n\backslash \{0\})\). The number of solutions of this system with the norm of the “denominator” \(\overline{q}\) bounded by T is given by

$$\begin{aligned} \Delta _T({u}):=\left| \left\{ (\overline{p},\overline{q})\in \mathbb {Z}^m\times \mathbb {Z}^n:\, 0< \Vert \overline{q}\Vert < T \ \text {and}\ (1.5)\hbox { holds} \right\} \right| . \end{aligned}$$
(1.6)

Our main result in this paper is the following generalization of Theorem 1.1.

Theorem 1.2

If \(m\geqslant 2,\) then for every \(\xi \in \mathbb {R},\)

$$\begin{aligned} \left| \left\{ {u}\in \hbox { M}_{m,n}([0,1]):\, \frac{\Delta _T({u})-C_{m,n}\,\log T}{(\log T)^{1/2}}<\xi \right\} \right| \longrightarrow {\mathrm{Norm}}_{\sigma _{m,n}}(\xi ) \end{aligned}$$
(1.7)

as \(T\rightarrow \infty ,\) where

$$\begin{aligned} C_{m,n} :=C_m\omega _n\quad \hbox {with }\omega _n:=\int _{S^{n-1}} \Vert \overline{z}\Vert ^{-n}\,d\overline{z} \end{aligned}$$

and

$$\begin{aligned} \sigma _{m,n}:=2C_{m,n}(2\zeta (m+n-1)\zeta (m+n)^{-1}-1). \end{aligned}$$

The special case \(w_1 =\cdots =w_m=n/m\) was proved earlier in [3].

1.3 An outline of the proof of Theorem 1.2

We begin by observing that \(\Delta _T(\cdot )\) can be interpreted as a function on the space of lattices in \(\mathbb {R}^{m+n}\). Given \(u\in \hbox {M}_{m,n}([0,1])\), we define the unimodular lattice \(\Lambda _u\) in \(\mathbb {R}^{m+n}\) by

$$\begin{aligned} \Lambda _{u}:=\left\{ \left( p_1+\sum _{j=1}^n u_{1j} q_j,\ldots , p_m+\sum _{j=1}^n u_{mj} q_j, \overline{q}\right) :\, (\overline{p},\overline{q})\in \mathbb {Z}^m\times \mathbb {Z}^n\right\} ,\qquad \end{aligned}$$
(1.8)

and we see that

$$\begin{aligned} \Delta _T(u)=|\Lambda _u\cap \Omega _T|+O(1), \quad \hbox { for}\ T > 0, \end{aligned}$$

where \(\Omega _T\) denotes the domain

$$\begin{aligned} \Omega _T:=\left\{ (\overline{x},\overline{y})\in \mathbb {R}^{m+n}:\, 1\leqslant \Vert \overline{y}\Vert<T,\, |x_i|<\vartheta _i\,\Vert \overline{y}\Vert ^{-w_i}, \, i=1,\ldots , m\right\} .\qquad \end{aligned}$$
(1.9)

The space \({\mathcal {X}}\) of unimodular lattices in \(\mathbb {R}^{m+n}\) is naturally a homogeneous space of the group \(\hbox {SL}_{m+n}(\mathbb {R})\) equipped with the invariant probability measure \(\mu _{\mathcal {X}}\). The set

$$\begin{aligned} \mathcal {Y}:=\{\Lambda _u:\, u\in \hbox {M}_{m,n}([0,1])\} \end{aligned}$$

is a mn-dimensional torus embedded in \({\mathcal {X}}\), and we equip \(\mathcal {Y}\) with the Haar probability measure \(\mu _\mathcal {Y}\), interpreted as a Borel measure on \({\mathcal {X}}\).

We further observe (see Sect. 6 for more details) that each domain \(\Omega _T\) can be tessellated using a fixed diagonal matrix a in \(\hbox {SL}_{m+n}(\mathbb {R})\), so that for a suitable function \({\hat{\chi }}:{\mathcal {X}}\rightarrow \mathbb {R}\), we have

$$\begin{aligned} |\Lambda \cap \Omega _T|\approx \sum _{s=0}^{N-1} {\hat{\chi }}(a^s\Lambda )\quad \hbox {for }\Lambda \in {\mathcal {X}}. \end{aligned}$$

Hence we are left with analyzing the distribution of values for the sums \(\sum _{s=0}^N {\hat{\chi }}(a^sy)\) with \(y\in \mathcal {Y}\). This will allow us to apply techniques developed in our previous work [2], as well as in [1] (joint with M. Einsiedler). Intuitively, our arguments will be guided by the hope that the observables \({\hat{\chi }}\circ a^s\) are “quasi-independent” with respect to \(\mu _{\mathcal {Y}}\). Due to the discontinuity and unboundedness of the function \({\hat{\chi }}\) on \({\mathcal {X}}\), it gets quite technical to formulate this quasi-independence directly. Instead, we shall argue in steps.

We begin in Sect. 2 by establishing quasi-independence for observables of the form \(\phi \circ a^s\), where \(\phi \) is a smooth and compactly supported function on \({\mathcal {X}}\). This amounts to an asymptotic formula (Corollary 2.4) for the higher-order correlations

$$\begin{aligned} \int _{\mathcal {Y}} \phi _1(a^{s_1}y)\cdots \phi _r(a^{s_r}y)\, d\mu _\mathcal {Y}(y) \quad \hbox {with }\phi _1,\ldots ,\phi _r\in C_c^\infty ({\mathcal {X}}). \end{aligned}$$
(1.10)

It will be crucial for our arguments later that the error term in this formula is explicit in terms of the exponents \(s_1,\ldots ,s_r\) and in (certain norms of) the functions \(\phi _1,\ldots ,\phi _r\). In Sect. 3, we use these estimates to prove the central limit theorems for sums of the form

$$\begin{aligned} F_N(y) :=\sum _{s=0}^{N-1} \left( \phi (a^sy)-\mu _\mathcal {Y}(\phi )\right) \quad \hbox {with }\phi \in C_c^\infty ({\mathcal {X}})\hbox { and }y\in \mathcal {Y}. \end{aligned}$$

To do this, we use an adaption of the classical Cumulant Method (see Proposition 3.4), which provides bounds on cumulants (alternating sums of moments) given estimates on expressions as in (1.10), at least in certain ranges of the parameters \((s_1,\ldots ,s_r)\). Here we shall exploit the decomposition (3.7) into “separated”/“clustered” tuples. We stress that the cumulant \({{\,\mathrm{Cum}\,}}^{(r)}(F_N)\) of order r can be expressed as a sum of \(O(N^r)\) terms, normalized by \(N^{r/2}\), so that in order to prove that it vanishes asymptotically, we require more than just square-root cancellation; however, the error term in the asymptotic formula for (1.10) is rather weak. Nonetheless, by using intricate combinatorial cancellations of cumulants, we can establish the required bounds.

In order to extend the method in Sect. 3 to the kind of unbounded functions which arise in our subsequent approximation arguments we have to investigate possible escapes of mass for the sequence of tori \(a^s\mathcal {Y}\) inside the space \({\mathcal {X}}\). In Sect. 4, we prove several results in this direction (see e.g. Proposition 4.5), as well as \(L^p\)-bounds (see Propositions 4.6 and 4.8). We stress that the general non-divergence estimates for unipotent flows developed by Kleinbock–Margulis [8] are not sufficient for our purposes, and in particular, the exact value of the exponent in Proposition 4.5 will be crucial for our argument. The proof of the \(L^2\)-norm bound in Proposition 4.8 is especially interesting in this regard since it uncovers that the escape of mass is related to delicate arithmetic questions; our arguments require careful estimates on the number of solutions of certain Diophantine equations.

To make the technical passages in the final steps of the proof of Theorem 1.2 a bit more readable, we shall devote Sect. 5 to central limit theorems for sums of the form

$$\begin{aligned} \sum _{s=0}^{N-1} {{\hat{f}}}(a^sy)\quad \hbox {for }y\in \mathcal {Y}, \end{aligned}$$

where f is a smooth and compactly supported function on \(\mathbb {R}^{m+n}\), and \(\hat{f}\) denotes the Siegel transform of f (see Sect. 4.3 for definitions). We stress that even though f is assumed to be bounded, \(\hat{f}\) is unbounded on \({\mathcal {X}}\). To prove the central limit theorems in this setting, we approximate \({\hat{f}}\) by compactly supported functions on \({\mathcal {X}}\) and then use the estimates from Sect. 3. However, the bounds in these estimates crucially depend on the order of approximation, so this step requires a delicate analysis of the error terms. The non-divergence results established in Sect. 4 play important role here.

Finally, to prove the central limit theorem for the function \(\hat{\chi }\) (which is the Siegel transform of an indicator function on a nice bounded domain in \(\mathbb {R}^{m+n}\)), and thus establish Theorem 1.2, we need to approximate \(\chi \) with smooth functions, and show that the arguments in Sect. 5 can be adapted to certain sequences of Siegel transforms of smooth and compactly supported functions. This will be done in Sect. 6.

2 Estimates on higher-order correlations

Let \({\mathcal {X}}\) denote the space of unimodular lattices in \(\mathbb {R}^{m+n}\). Setting

$$\begin{aligned} G:=\hbox {SL}_{m+n}(\mathbb {R})\quad \hbox { and }\quad \Gamma :=\hbox {SL}_{m+n}(\mathbb {Z}), \end{aligned}$$

we may consider the space \({\mathcal {X}}\) as a homogeneous space under the linear action of the group G, so that

$$\begin{aligned} {\mathcal {X}}\simeq G/\Gamma . \end{aligned}$$

Let \(\mu _{{\mathcal {X}}}\) denote the G-invariant probability measure on \({\mathcal {X}}\).

We fix \(m,n\geqslant 1\) and denote by U the subgroup

$$\begin{aligned} U:=\left\{ \left( \begin{array}{ll} I_m &{} u\\ 0 &{} I_n \end{array} \right) :\, u\in \hbox {M}_{m,n}(\mathbb {R}) \right\} < G, \end{aligned}$$
(2.1)

and set \(\mathcal {Y}:=U\mathbb {Z}^{m+n}\subset {\mathcal {X}}\). Geometrically, \(\mathcal {Y}\) can be visualized as a mn-dimensional torus embedded in the spaces of lattices \({\mathcal {X}}\). We denote by \(\mu _\mathcal {Y}\) the probability measure on \(\mathcal {Y}\) induced by the Lebesgue probability measure on \(\hbox {M}_{m,n}([0,1])\), and we note that \(\mathcal {Y}\) corresponds to the collection of unimodular lattices \(\Lambda _u\), for \(u\in \hbox {M}_{m,n}([0,1))\), introduced earlier in (1.8).

Let us further fix positive numbers \(w_1,\ldots ,w_{m+n}\) satisfying

$$\begin{aligned} \sum _{i=1}^{m}w_i=\sum _{i=m+1}^{m+n}w_i, \end{aligned}$$

and denote by \((a_t)\) the one-parameter semi-group

$$\begin{aligned} a_t:=\hbox {diag}\left( e^{w_1 t},\ldots , e^{w_{m}t},e^{-w_{m+1} t},\ldots , e^{-w_{m+n}t} \right) ,\quad t>0. \end{aligned}$$
(2.2)

The aim of this section is to analyze the asymptotic behavior of \(a_t\mathcal {Y} \subset {\mathcal {X}}\) as \(t\rightarrow \infty \), and investigate “decoupling” of correlations of the form

$$\begin{aligned} \int _{\mathcal {Y}}\phi _1(a_{t_1}y)\ldots \phi _r(a_{t_r}y)\, d\mu _\mathcal {Y}(y)\quad \hbox {for }\phi _1,\ldots ,\phi _r\in C_c^\infty ({\mathcal {X}}), \end{aligned}$$
(2.3)

for “large” \(t_1,\ldots ,t_r > 0\). It will be essential for our subsequent argument that the error terms in this “decoupling” are explicit in terms of the parameters \(t_1,\ldots ,t_r>0\) and suitable norms of the functions \(\phi _1,\ldots , \phi _r\), which we now introduce.

Every \(Y \in \hbox {Lie}(G)\) defines a first order differential operator \(\mathcal {D}_Y\) on \(C_c^\infty ({\mathcal {X}})\) by

$$\begin{aligned} \mathcal {D}_Y(\phi )(x):=\frac{d}{dt}\phi (\exp (tY)x)|_{t=0}. \end{aligned}$$

If we fix an (ordered) basis \(\{Y_1,\ldots ,Y_r\}\) of \(\hbox {Lie}(G)\), then every monomial \(Z=Y_1^{\ell _1}\cdots Y_r^{\ell _r}\) defines a differential operator by

$$\begin{aligned} \mathcal {D}_Z:=\mathcal {D}_{Y_1}^{\ell _1}\cdots \mathcal {D}_{Y_r}^{\ell _r}, \end{aligned}$$
(2.4)

of degree\(\deg (Z) = \ell _1+\cdots +\ell _r\). For \(k\geqslant 1\) and \(\phi \in C_c^\infty ({\mathcal {X}})\), we define the norms

$$\begin{aligned} \Vert \phi \Vert _{L^2_k({\mathcal {X}})}:=\left( \sum _{\deg (Z)\leqslant k} \int _{{\mathcal {X}}} | (\mathcal {D}_Z\phi )(x)|^2\, d\mu _{\mathcal {X}}(x)\right) ^{1/2}, \end{aligned}$$
(2.5)

and

$$\begin{aligned} \Vert \phi \Vert _{C^k} := \sum _{\deg (Z)\leqslant k} \Vert (\mathcal {D}_Z\phi )(x) \Vert _\infty . \end{aligned}$$
(2.6)

Note that for every \(g \in G\), \(\phi \in C^\infty ({\mathcal {X}})\) and \(Y \in {{\,\mathrm{Lie}\,}}(G)\), we have \(\mathcal {D}_Y(\phi \circ g) = \mathcal {D}_{{{\,\mathrm{Ad}\,}}(g)Y}(\phi ) \circ g\). This identity readily extends to the universal enveloping algebras \(\mathcal {U}({{\,\mathrm{Lie}\,}}(G))\) as well, and thus we also have \(\mathcal {D}_Z(\phi \circ g) = \mathcal {D}_{{{\,\mathrm{Ad}\,}}(g)Z}(\phi ) \circ g\), for every monomial Z in \(\{Y_1,\ldots ,Y_r\}\), where \({{\,\mathrm{Ad}\,}}(g)\) denotes the extension of the \({{\,\mathrm{Ad}\,}}(g)\) from \({{\,\mathrm{Lie}\,}}(G)\) to \(\mathcal {U}({{\,\mathrm{Lie}\,}}(G))\). Since \({{\,\mathrm{Ad}\,}}(g)Z\) can be written as a finite sum of monomials of degrees not exceeding the degree of Z, we conclude that for every \(k \geqslant 1\), there exists a sub-multiplicative function \(g \mapsto C_k(g)\) such that

$$\begin{aligned} \Vert \phi \circ g\Vert _{C^k} \leqslant C_k(g) \Vert \phi \Vert _{C^k}, \quad \hbox {for all}\ \phi \in \mathcal {C}_c^\infty ({\mathcal {X}}). \end{aligned}$$

In particular, there a constant \(\xi = \xi (m,n,k)\) (which also depends on our fixed choice of weights \(w_1,\ldots ,w_{m+n}\)) such that

$$\begin{aligned} \Vert \phi \circ a^t\Vert _{C^k} \ll e^{\xi t} \Vert \phi \Vert _{\mathcal {C}^k}, \quad \hbox {for all } t \in \mathbb {R}\hbox { and } \phi \in \mathcal {C}_c^\infty ({\mathcal {X}}), \end{aligned}$$
(2.7)

where the suppressed constants are independent of t and \(\phi \).

The starting point of our discussion is a well-known quantitative estimate on correlations of smooth functions on \({\mathcal {X}}\):

Theorem 2.1

There exist \(\gamma >0\) and \(k\geqslant 1\) such that for all \(\phi _1,\phi _2\in C_c^\infty ({\mathcal {X}})\) and \(g\in G,\)

$$\begin{aligned} \int _{{\mathcal {X}}} \phi _1(g x)\phi _2(x)\, d\mu _{\mathcal {X}}(x)= & {} \left( \int _{{\mathcal {X}}}\phi _1 \, d\mu _{\mathcal {X}}\right) \left( \int _{{\mathcal {X}}}\phi _2 \, d\mu _{\mathcal {X}}\right) \\&+\,O\left( \Vert g\Vert ^{-\gamma }\, \Vert \phi _1\Vert _{L^2_k({\mathcal {X}})}\Vert \phi _1\Vert _{L^2_k({\mathcal {X}})}\right) . \end{aligned}$$

This theorem has a very long history that we will not attempt to survey here, but only mention that a result of this form can be found, for instance, in [7, Corollary 2.4.4].

From now on we fix \(k\geqslant 1\) so that Theorem 2.1holds.

Our goal is to decouple the higher-order correlations in (2.3), but in order to state our results we first need to introduce a family of finer norms on \(C^\infty _c({\mathcal {X}})\) than \((\Vert \cdot \Vert _{L^2({\mathcal {X}})_k})\). Let us denote by \(\Vert \cdot \Vert _{C^0}\) the uniform norm on \(C_c({\mathcal {X}})\). If we fix a right-invariant Riemannian metric on G, then it induces a metric d on \({\mathcal {X}}\simeq G/\Gamma \), which allows us to define the norms

$$\begin{aligned} \Vert \phi \Vert _{Lip}:=\sup \left\{ \frac{|\phi (x_1)-\phi (x_2)|}{d(x_1,x_2)}:\, x_1,x_2\in {\mathcal {X}}, x_1\ne x_2 \right\} , \end{aligned}$$

and

$$\begin{aligned} \mathcal {N}_k(\phi ):=\max \left\{ \Vert \phi \Vert _{C^0}, \Vert \phi \Vert _{Lip}, \Vert \phi \Vert _{L^2_k({\mathcal {X}})}\right\} , \end{aligned}$$
(2.8)

for \(\phi \in C_c^\infty ({\mathcal {X}})\). We shall prove:

Theorem 2.2

There exists \(\delta >0\) such that for every compact \(\Omega \subset U,\)\(f\in C_c^\infty (U)\) with \(\hbox \mathrm{supp}(f)\subset \Omega ,\)\(\phi _1,\ldots ,\phi _r\in C_c^\infty ({\mathcal {X}}),\)\(x_0\in {\mathcal {X}},\) and \(t_1,\ldots ,t_r>0,\) we have

$$\begin{aligned} \int _{U} f(u)\left( \prod _{i=1}^r \phi _i(a_{t_i}ux_0)\right) \,du= & {} \left( \int _{U}f(u)\,du\right) \prod _{i=1}^r\left( \int _{{\mathcal {X}}}\phi _i \, d\mu _{\mathcal {X}}\right) \\&+\,O_{x_0,\Omega ,r}\left( e^{-\delta D(t_1,\ldots ,t_r)}\, \Vert f\Vert _{C^k}\prod _{i=1}^r \mathcal {N}_k(\phi _i)\right) , \end{aligned}$$

where

$$\begin{aligned} D(t_1,\ldots ,t_r):=\min \{t_i, |t_i-t_j|:\, 1\leqslant i\ne j\leqslant r\}. \end{aligned}$$

Remark 2.3

The case \(r=1\) was proved by Kleinbock and Margulis in [9], and our arguments are inspired by theirs. We stress that the constant \(\delta \) in Theorem 2.2 is independent of r.

We also record the following corollary of Theorem 2.2.

Corollary 2.4

There exists \(\delta ' >0\) such that for every \(\phi _0\in C^\infty (\mathcal {Y}),\)\(\phi _1,\ldots ,\phi _r\in C_c^\infty ({\mathcal {X}})\) and \(t_1,\ldots ,t_r>0,\) we have

$$\begin{aligned} \int _{\mathcal {Y}} \phi _0(y) \left( \prod _{i=1}^r \phi _i(a_{t_i}y) \right) d\mu _\mathcal {Y}(y)= & {} \left( \int _{\mathcal {Y}}\phi _0 \, d\mu _\mathcal {Y}\right) \prod _{i=1}^r \left( \int _{{\mathcal {X}}}\phi _i \, d\mu _{\mathcal {X}}\right) \\&+\,O_{r}\left( e^{-\delta ' D(t_1,\ldots ,t_r)}\, \Vert \phi _0\Vert _{C^k}\prod _{i=1}^r \mathcal {N}_k(\phi _i)\right) . \end{aligned}$$

Proof of Corollary 2.4 (assuming Theorem 2.2)

Let \(x_0\) denote the identity coset in \({\mathcal {X}}\cong G/\Gamma \), which corresponds to the standard lattice \(\mathbb {Z}^{m+n}\), and recall that

$$\begin{aligned} \mathcal {Y}=Ux_0\simeq U/(U\cap \Gamma ). \end{aligned}$$

Let \({\tilde{\phi }}_0\in C^\infty (U)\) denote the lift of the function \(\phi _0\) to U, and \(\chi \) the characteristic function of the subset

$$\begin{aligned} U_0:=\left\{ \left( \begin{array}{ll} I_m &{} u\\ 0 &{} I_n \end{array} \right) :\, u\in \hbox {M}_{m,n}([0,1]) \right\} . \end{aligned}$$

Given \(\varepsilon > 0\), let \(\chi _\varepsilon \in C_c^\infty (U)\) be a smooth approximation of \(\chi \) with uniformly bounded support which satisfy

$$\begin{aligned} \chi \leqslant \chi _\varepsilon \leqslant 1,\quad \Vert \chi -\chi _\varepsilon \Vert _{L^1(U)}\ll \varepsilon ,\quad \Vert \chi _\varepsilon \Vert _{C^k}\ll \varepsilon ^{-k}. \end{aligned}$$

We observe that if \(f_\varepsilon :={\tilde{\phi }}_0\chi _\varepsilon \) and \(f_0:={\tilde{\phi }}_0\chi \), then

$$\begin{aligned} \Vert f_0-f_\varepsilon \Vert _{L^1(U)}\ll \varepsilon \, \Vert \phi _0\Vert _{C^0} \end{aligned}$$

and

$$\begin{aligned} \Vert f_\varepsilon \Vert _{C^k}\ll \Vert {\tilde{\phi }}_0\Vert _{C^k}\Vert \chi _\varepsilon \Vert _{C^k}\ll \varepsilon ^{-k} \Vert {\phi }_0\Vert _{C^k}, \end{aligned}$$

which implies that

$$\begin{aligned} \int _{\mathcal {Y}} \phi _0(y) \left( \prod _{i=1}^r \phi _i(a_{t_i}y) \right) d\mu _\mathcal {Y}(y)= & {} \int _{U} f_0(u) \left( \prod _{i=1}^r \phi _i(a_{t_i}ux_0) \right) \, du\\= & {} \int _{U} f_\varepsilon (u) \left( \prod _{i=1}^r \phi _i(a_{t_i}u x_0) \right) du \\&+\,O\left( \varepsilon \prod _{i=0}^r \Vert \phi _i\Vert _{C^0}\right) , \end{aligned}$$

and

$$\begin{aligned} \int _{\mathcal {Y}} \phi _0\, d\mu _\mathcal {Y}=\int _U f_0(u)\, du= \int _{U} f_\varepsilon (u)\, du +O\left( \varepsilon \Vert \phi _0\Vert _{C^0}\right) . \end{aligned}$$

Therefore, Theorem 2.2 implies that

$$\begin{aligned}&\int _{\mathcal {Y}} \phi _0(y) \left( \prod _{i=1}^r \phi _i(a_{t_i}y) \right) d\mu _\mathcal {Y}(y)= \left( \int _{U}f_\varepsilon (u) \, du \right) \prod _{i=1}^r \left( \int _{{\mathcal {X}}}\phi _i \, d\mu _{\mathcal {X}}\right) \\&\qquad +\,O_{r}\left( \varepsilon \prod _{i=0}^r \Vert \phi _i\Vert _{C^0}+ e^{-\delta D(t_1,\ldots ,t_r)}\, \Vert f_\varepsilon \Vert _{C^k}\prod _{i=1}^r \mathcal {N}_k(\phi _i)\right) \\&\quad = \left( \int _{\mathcal {Y}}\phi _0 \, d\mu _\mathcal {Y}\right) \prod _{i=1}^r \left( \int _{{\mathcal {X}}}\phi _i \, d\mu _{\mathcal {X}}\right) \\&\qquad +\,O_{r}\left( \left( \varepsilon + \varepsilon ^{-k} e^{-\delta D(t_1,\ldots ,t_r)}\right) \, \Vert \phi _0\Vert _{C^k}\prod _{i=1}^r \mathcal {N}_k(\phi _i)\right) . \end{aligned}$$

The corollary (with \(\delta ' = \delta /(k+1)\)) follows by choosing \(\varepsilon =e^{-\delta D(t_1,\ldots ,t_r)/(k+1)}\). \(\square \)

2.1 Preliminary results

We recall that d is a distance on \({\mathcal {X}}\cong G/\Gamma \) induced from a right-invariant Riemannian metric on G. We denote by \(B_G(\rho )\) the ball of radius \(\rho \) centered at the identity in G. For a point \(x\in {\mathcal {X}}\), we let \(\iota (x)\) denote the injectivity radius at x, that is to say, the supremum over \(\rho >0\) such that the map \(B_G(\rho )\rightarrow B_G(\rho )x:g\mapsto gx\) is injective.

Given \(\varepsilon > 0\), let

$$\begin{aligned} \mathcal {K}_\varepsilon = \{ \Lambda \in {\mathcal {X}}\, : \, \Vert v\Vert \geqslant \varepsilon , \hbox { for all }\ v \in \Lambda {\setminus } \{0\}\}. \end{aligned}$$
(2.9)

By Mahler’s Compactness Criterion, \(\mathcal {K}_\varepsilon \) is a compact subset of \({\mathcal {X}}\). Furthermore, using reduction theory, one can show:

Proposition 2.5

[9, Prop. 3.5] \(\iota (x)\gg \varepsilon ^{m+n}\) for any \(x\in \mathcal {K}_\varepsilon \).

An important role in our argument will be played by the one-parameter semi-group

$$\begin{aligned} b_t:=\hbox {diag}\left( e^{t/m},\ldots ,e^{t/m},e^{-t/n},\ldots , e^{-t/n}\right) ,\quad t>0, \end{aligned}$$
(2.10)

which coincides with the semi-group \((a_t)\) as defined in (2.2) with the special choice of exponents

$$\begin{aligned} w_1 = \cdots = w_m = \frac{1}{m} \quad \text {and} \quad w_{m+1} = \cdots = w_{m+n} = \frac{1}{n}. \end{aligned}$$

The submanifold \(\mathcal {Y}\subset {\mathcal {X}}\) is an unstable manifold for the flow \((b_t)\) which makes the analysis of the asymptotic behavior of \(b_t\mathcal {Y}\) significantly easier than that of \(a_t \mathcal {Y}\) for general parameters. Using Theorem 2.1, Kleinbock and Margulis proved in [7] a quantitative equidistribution result for the family \(b_t\mathcal {Y}\) as \(t\rightarrow \infty \), we shall use a version of this result from their later work [9].

Theorem 2.6

[9, Th. 2.3] There exist \(\rho _0>0\) and \(c,\gamma >0\) such that for every \(\rho \in (0,\rho _0 ),\)\(f\in C_c^\infty (U)\) satisfying \(\hbox {supp}(f)\subset B_G(\rho ),\)\(x\in {\mathcal {X}}\) with \(\iota (x)>2\rho ,\)\(\phi \in C_c^\infty ({\mathcal {X}}),\) and \(t\geqslant 0,\)

$$\begin{aligned} \int _{U} f(u) \phi (b_tux)\, du= & {} \left( \int _{U} f(u)\, du \right) \left( \int _{{\mathcal {X}}} \phi \, d\mu _{\mathcal {X}}\right) \\&+\,O\left( \rho \Vert f\Vert _{L^1(U)} \Vert \phi \Vert _{{{\,\mathrm{Lip}\,}}} + \rho ^{-c} e^{-\gamma t}\, \Vert f\Vert _{C^k}\Vert \phi \Vert _{L^2_k({\mathcal {X}})}\right) . \end{aligned}$$

Remark 2.7

Although the dependence on \(\phi \) is not stated in [9, Theorem 2.3], the estimate is explicit in the proof. Indeed, in [9, Section 2, p. 390], the authors show under the assumptions above that

$$\begin{aligned} \left| \int _{U} f(u) \phi (b_tux)\, du - \langle {\tilde{f}},b_{-t} \phi \rangle _{L^2({\mathcal {X}})} \right| \leqslant \rho \Vert \phi \Vert _{{{\,\mathrm{Lip}\,}}} \Vert f\Vert _{L^1(U)}, \end{aligned}$$

where \({\tilde{f}}\) is an (explicit) smooth function on \({\mathcal {X}}\) with compact support constructed from f. Theorem 2.6 now follows from the decay of matrix coefficients in Theorem 2.1.

We will prove Theorem 2.2 through successive uses of Theorem 2.6. In order to make things more transparent, it will be convenient to embed the flow \((a_t)\) as defined in (2.2) in a multi-parameter flow as follows. For \(\overline{s}=(s_1,\ldots ,s_{m+n})\in \mathbb {R}^{m+n}\), we set

$$\begin{aligned} a(\overline{s}):=\hbox {diag }\left( e^{s_1},\ldots ,e^{s_m},e^{-s_{m+1}},\ldots , e^{-s_{m+n}}\right) . \end{aligned}$$
(2.11)

We denote by \(S^+\) the cone in \(\mathbb {R}^{m+n}\) consisting of those \(\overline{s}=(s_1,\ldots ,s_{m+n})\) which satisfy

$$\begin{aligned} s_1,\ldots ,s_{m+n}>0\quad \hbox {and}\quad \sum _{i=1}^{m}s_i=\sum _{i=m+1}^{m+n}s_i. \end{aligned}$$

For \(\overline{s}=(s_1,\ldots ,s_{m+n})\in S^+\), we set

$$\begin{aligned} \lfloor \overline{s}\rfloor :=\min (s_1,\ldots ,s_{m+n}), \end{aligned}$$

and, with \(\overline{s}_t:=(w_1t,\ldots ,w_{m+n}t)\), we see that \(a_t=a(\overline{s}_t)\).

In addition to Theorem 2.6, we shall also need the following quantitative non-divergence estimate for unipotent flows established by Kleinbock and Margulis in [9].

Theorem 2.8

[9, Cor. 3.4] There exists \(\theta =\theta (m,n)>0\) such that for every compact \(L\subset {\mathcal {X}}\) and a Euclidean ball \(B\subset U\) centered at the identity,  there exists \(T_0>0\) such that for every \(\varepsilon \in (0,1),\)\(x\in L,\) and \(\overline{s}\in S^+\) satisfying \(\lfloor \overline{s}\rfloor \geqslant T_0,\) one has

$$\begin{aligned} |\{u\in B: a(\overline{s})ux\notin \mathcal {K}_\varepsilon \}|\ll \varepsilon ^\theta \, |B|. \end{aligned}$$

2.2 Proof of Theorem 2.2

Let us fix \(r \geqslant 1\) and a r-tuple \((t_1,\ldots ,t_r)\). Upon re-labeling, we may assume that \(t_1\leqslant \ldots \leqslant t_{r}\), so that

$$\begin{aligned} D := D(t_1,\ldots ,t_r)=\min \{t_1,t_2-t_1,\ldots , t_r-t_{r-1}\}. \end{aligned}$$
(2.12)

The next lemma provides an additional parameter \(\overline{s} \in S^{+}\), which depends on the r-tuple \((t_1,\ldots ,t_r)\). This parameter will be used throughout the proof of Theorem 2.2, and we stress that the accompanying constants \(c_1,c_2\) and \(c_3\) are independent of r and the r-tuple \((t_1,\ldots ,t_r)\).

Lemma 2.9

There exist \(c_1,c_2,c_3>0\) such that given any \(t_r>t_{r-1}>0,\) there exists \(\overline{s}\in S^+\) satisfying : 

  1. (i)

    \(\lfloor \overline{s} \rfloor \geqslant c_1 (t_r -t_{r-1}),\)

  2. (ii)

    \(\lfloor \overline{s}-\overline{s}_{{t}_{r-1}} \rfloor \geqslant c_2 (t_r-t_{r-1}),\)

  3. (iii)

    \(\overline{s}_{t_r}-\overline{s}=(\frac{z}{m},\ldots ,\frac{z}{m},\frac{z}{n},\ldots ,\frac{z}{n})\) for some \(z \geqslant c_3 \min (t_{r-1},t_r-t_{r-1})\).

Proof

We start the proof by defining \(\overline{s}\) by the formula in (iii), where the parameter z will be chosen later, that is to say, we set

$$\begin{aligned} \overline{s}=\left( w_1t_r-\frac{z}{m},\ldots ,w_mt_r-\frac{z}{m}, w_{m+1}t_r-\frac{z}{n},\ldots ,w_{m+n}t_r-\frac{z}{n}\right) . \end{aligned}$$

Then (i) holds provided that

$$\begin{aligned} A_1 t_r- \frac{z}{m}\geqslant c_1(t_r-t_{r-1})\quad \hbox {and}\quad A_2 t_r- \frac{z}{n}\geqslant c_1(t_r-t_{r-1}), \end{aligned}$$

where \(A_1:=\min (w_i:1\leqslant i\leqslant m)\) and \(A_2:=\min (w_i:m+1\leqslant i\leqslant m+n)\), so if we set \(c_1= \min (A_1,A_2)\), then (i) holds when

$$\begin{aligned} z\leqslant c_1\min (m,n) t_{r-1}. \end{aligned}$$
(2.13)

To arrange (ii), we observe that

$$\begin{aligned} \overline{s}-\overline{s}_{{t}_{r-1}}= & {} \left( w_1(t_r-t_{r-1})-\frac{z}{m},\ldots ,w_m(t_r-t_{r-1})-\frac{z}{m},\right. \\&\times \left. w_{m+1}(t_r-t_{r-1})-\frac{z}{n},\ldots ,w_{m+n}(t_r-t_{r-1})-\frac{z}{n} \right) , \end{aligned}$$

and thus (ii) holds provided that

$$\begin{aligned} A_1 (t_r-t_{r-1})- \frac{z}{m}\geqslant c_2(t_r-t_{r-1})\quad \hbox {and}\quad A_2 (t_r-t_{r-1})- \frac{z}{n}\geqslant c_2(t_r-t_{r-1}). \end{aligned}$$

If we let \(c_2= \min (A_1,A_2)/2\), then (ii) holds when

$$\begin{aligned} z\leqslant c_2\min (m,n) (t_r-t_{r-1}). \end{aligned}$$
(2.14)

So far we have arranged so that (i) and (ii) hold provided that z satisfies (2.13) and (2.14). Let \(c_3=\min (c_1,c_2)\min (m,n)\), and note that if we pick \(z = c_3 \min (t_{r-1},t_{r}-t_{r-1})\), then (i), (ii) and (iii) are all satisfied. \(\square \)

Let us now continue with the proof of Theorem 2.2. With the parameter \(\overline{s}\) provided by Lemma 2.9 above, we have

$$\begin{aligned} \lfloor \overline{s} \rfloor\geqslant & {} c_1\, D, \end{aligned}$$
(2.15)
$$\begin{aligned} \lfloor \overline{s}- \overline{s}_{t_{i}} \rfloor\geqslant & {} c_2\,D\quad \hbox {for all }i=1,\ldots ,r-1, \end{aligned}$$
(2.16)
$$\begin{aligned} a(\overline{s}_{t_{r}} - \overline{s})= & {} b_z\quad \hbox {for some } z\geqslant c_3\, D, \end{aligned}$$
(2.17)

where \(b_z\) is defined as in (2.10) and D as in (2.12) Throughout the rest of the proof we fix a compact set \(\Omega \subset U\). Our aim now is to estimate integrals of the form

$$\begin{aligned} I_r:=\int _{U} f(u)\left( \prod _{i=1}^r \phi _i(a_{t_i}ux_0)\right) \, du, \end{aligned}$$

where f ranges over \(C^\infty _c(U)\) with \({{\,\mathrm{supp}\,}}(f) \subset \Omega \). Our proof will proceed by induction over r, the case \(r=0\) being trivial.

Before we can start the induction, we need some notation. Let \(\rho _0\) and k be as in Theorem 2.6, and pick for \(0< \rho < \rho _0\), a non-negative \(\omega _\rho \in C^\infty _c({\mathcal {X}})\) such that

$$\begin{aligned} \hbox {supp}(\omega _\rho )\subset B_{G}(\rho ),\quad \Vert \omega _\rho \Vert _{C^k}\ll \rho ^{-\sigma },\quad \int _{U} \omega _\rho (v)\, dv=1, \end{aligned}$$
(2.18)

for some fixed \(\sigma =\sigma (m,n,k)>0\). The integral \(I_r\) can now be rewritten as follows:

$$\begin{aligned} I_r= & {} I_r\cdot \left( \int _{U} \omega _\rho (v)\,dv\right) = \int _{U\times U} f(u)\omega _\rho (v)\left( \prod _{i=1}^r\phi _i(a_{t_i}ux_0)\right) \,dudv\\= & {} \int _{U\times U} f(a(-\overline{s})va(\overline{s})u)\omega _\rho (v)\left( \prod _{i=1}^r\phi _i(a_{t_i}a(-\overline{s})va(\overline{s})ux_0)\right) \,dudv. \end{aligned}$$

If we set

$$\begin{aligned} f_{\overline{s},u}(v):=f(a(-\overline{s})va(\overline{s})u)\omega _\rho (v) \quad \hbox {and} \quad x_{\overline{s},u}:= a(\overline{s})ux_0, \end{aligned}$$

then

$$\begin{aligned} I_r= \int _{U} \left( \int _{U} f_{\overline{s},u}(v) \prod _{i=1}^r \phi _i(a(\overline{s}_{{t}_i}-\overline{s})v x_{\overline{s},u})\, dv \right) \, du. \end{aligned}$$

We observe that if \(f_{\overline{s},u}(v)\ne 0\), then

$$\begin{aligned} v\in \hbox {supp}(\omega _\rho )\subset B_{G}(\rho ) \quad \text {and} \quad a(-\overline{s})va(\overline{s})u\in \hbox {supp}(f), \end{aligned}$$

so that

$$\begin{aligned} u\in a(-\overline{s})v^{-1}a(\overline{s})\hbox {supp}(f)\subset a(-\overline{s})\hbox {supp}(\omega _\rho )^{-1}a(\overline{s})\Omega . \end{aligned}$$

Since \(\overline{s}\in S^+\), the linear map \(v\mapsto a(-\overline{s})v a(\overline{s})\), for \(v\in U \cong \mathbb {R}^{mn}\), is non-expanding, and thus we can conclude that there exists a fixed Euclidean ball B in U (depending only on \(\Omega \) and \(\rho _0\)), and centered at the identity, such that if \(f_{\overline{s},u}(v)\ne 0\), then \(u\in B\). This implies that the integral \(I_r\) can be written as

$$\begin{aligned} I_r=\int _{B} \left( \int _{U} f_{\overline{s},u}(v)\prod _{i=1}^r\phi _i(a(\overline{s}_{{t}_i}-\overline{s})v x_{\overline{s},u})\, dv \right) \, du, \end{aligned}$$
(2.19)

and

$$\begin{aligned} \Vert f_{\overline{s},u}\Vert _{C^k}\ll \Vert f\Vert _{C^k}\, \Vert \omega _\rho \Vert _{C^k}\ll \rho ^{-\sigma }\, \Vert f\Vert _{C^k}. \end{aligned}$$
(2.20)

Furthermore,

$$\begin{aligned} \int _{U} \Vert f_{\overline{s},u}\Vert _{L^1(U)}\, du= & {} \int _{U\times U} |f(a(-\overline{s})va(\overline{s})u)\omega _\rho (v)|\,dvdu\nonumber \\= & {} \left( \int _{U} |f(u)|\,du\right) \left( \int _{U} \omega _\rho (v)\,dv\right) = \Vert f\Vert _{L^1(U)}. \end{aligned}$$
(2.21)

We decompose the integral \(I_r\) in (2.19) as

$$\begin{aligned} I_r=I_r'(\varepsilon )+I_r''(\varepsilon ), \end{aligned}$$

where \(I_r'(\varepsilon )\) is the integral over

$$\begin{aligned} B_\varepsilon :=\{u\in B:\, x_{\overline{s},u}\notin \mathcal {K}_\varepsilon \}, \end{aligned}$$

and \(I_r''(\varepsilon )\) is the integral over \(B\backslash B_\varepsilon \).

To estimate \(I_r'(\varepsilon )\), we first recall that \(\overline{s} \geqslant c_1 D\) by (2.15), so if \(D \geqslant T_0/c_1\), where \(T_0\) is as in Theorem 2.8 applied to \(\mathcal {L}= \mathcal {K}_\varepsilon \) and B, then the same theorem implies that there exists \(\theta > 0\) such that

$$\begin{aligned} |B_\varepsilon |\ll \varepsilon ^\theta |B| \end{aligned}$$
(2.22)

for every \(\varepsilon \in (0,1)\), and thus

$$\begin{aligned} I_r'(\varepsilon )\ll _{\Omega } \varepsilon ^\theta |B| \Vert f\Vert _{C^0}\left( \int _{U} \omega _\rho (v)\,dv\right) \prod _{i=1}^r\Vert \phi _i\Vert _{C^0} \ll _\Omega \varepsilon ^\theta \Vert f\Vert _{C^0} \prod _{i=1}^r \Vert \phi _i\Vert _{C^0}.\nonumber \\ \end{aligned}$$
(2.23)

Let us now turn to the problem of estimating \(I_{r}''(\varepsilon )\). Since the Riemannian distance d on G, restricted to U, and the Euclidean distance on U are equivalent on a small open identity neighborhood, we see that

$$\begin{aligned} d\left( a(-\overline{t})va(\overline{t}),e\right) \ll e^{-\lfloor \overline{t}\rfloor }\, d(v,e). \end{aligned}$$

for \(v\in B_{G}(\rho _0)\) and any \(\overline{t}\in S^+\). Hence, using (2.16), we obtain that for all \(i=1,\ldots ,r-1\),

$$\begin{aligned} \phi _i(a(\overline{s}_{{t}_i}-\overline{s})v x_{\overline{s},u}) =\phi _i(a(\overline{s}_{{t}_i}-\overline{s}) x_{\overline{s},u})+O\left( e^{-c_2 D}\, \Vert \phi _i\Vert _{Lip}\right) , \end{aligned}$$

and thus, for all \(v\in B_G(\rho _0)\),

$$\begin{aligned} \prod _{i=1}^{r-1}\phi _i(a(\overline{s}_{{t}_i}-\overline{s})v x_{\overline{s},u}) =\prod _{i=1}^{r-1}\phi _i(a(\overline{s}_{{t}_i}-\overline{s}) x_{\overline{s},u})+O_r\left( e^{-c_2 D}\, \prod _{i=1}^{r-1}\Vert \phi _i\Vert ^*_{Lip}\right) ,\nonumber \\ \end{aligned}$$
(2.24)

where \(\Vert \phi \Vert ^*_{Lip}:=\max (\Vert \phi \Vert _{C^0},\Vert \phi \Vert _{Lip})\). This leads to the estimate

$$\begin{aligned} I_r''(\varepsilon )= & {} \int _{B\backslash B_\varepsilon } \prod _{i=1}^{r-1} \phi _i(a(\overline{s}_{{t}_i}-\overline{s}) x_{\overline{s},u}) \left( \int _{U} f_{\overline{s},u}(v) \phi _r(a(\overline{s}_{{t}_r}-\overline{s})v x_{\overline{s},u})\, dv \right) \, du \\&+\,O_r\left( e^{-c_2 D} \prod _{i=1}^{r-1}\Vert \phi _i\Vert ^*_{Lip} \left( \int _{B\backslash B_\varepsilon } \Vert f_{\overline{s},u}\Vert _{L^1(U)}\, du\right) \Vert \phi _r\Vert _{C^0}\right) . \end{aligned}$$

Hence, using (2.21), we obtain that

$$\begin{aligned} I_r''(\varepsilon )= & {} \int _{B\backslash B_\varepsilon } \prod _{i=1}^{r-1} \phi _i(a(\overline{s}_{{t}_i}-\overline{s}) x_{\overline{s},u}) \left( \int _{U} f_{\overline{s},u}(v) \phi _r\left( a(\overline{s}_{{t}_r}-\overline{s})v x_{\overline{s},u}\right) \, dv \right) \, du \\&+\,O_r\left( e^{-c_2 D} \Vert f\Vert _{L^1(U)}\prod _{i=1}^{r}\Vert \phi _i\Vert ^*_{Lip}\right) . \end{aligned}$$

Since by (2.17),

$$\begin{aligned} \int _{U} f_{\overline{s},u}(v) \phi _r(a(\overline{s}_{{t}_r}-\overline{s})v x_{\overline{s},u})\, dv= \int _{U} f_{\overline{s},u}(v) \phi _r(b_z v x_{\overline{s},u})\, dv, \end{aligned}$$

we apply Theorem 2.6 to estimate this integral. We recall that \(\hbox {supp}(f_{\overline{s},u})\subset B_{G}(\rho )\). For \(u\in B\backslash B_\varepsilon \), we have \(x_{\overline{s},u}\in \mathcal {K}_\varepsilon \), so that \(\iota (x_{\overline{s},u})\gg \varepsilon ^{m+n}\) by Proposition 2.5. In particular, we may take

$$\begin{aligned} \varepsilon \gg \rho ^{1/(m+n)} \end{aligned}$$

to arrange that \(\iota (x_{\overline{s},u})>2\rho \). Hence, by applying Theorem 2.6, we deduce that there exist \(c, \gamma > 0\) such that

$$\begin{aligned}&\int _{U} f_{\overline{s},u}(v) \phi _r(b_z v x_{\overline{s},u})\, dv =\left( \int _{U} f_{\overline{s},u}(v)\, dv \right) \left( \int _{{\mathcal {X}}} \phi _r\, d\mu _{\mathcal {X}}\right) \\&\quad +\,O\left( \rho \Vert f_{\overline{s},u}\Vert _{L^1(U)} \Vert \phi _r\Vert _{Lip} + \rho ^{-c} e^{-\gamma z}\, \Vert f_{\overline{s},u}\Vert _{C^k} \Vert \phi _r\Vert _{L^2_k({\mathcal {X}})}\right) , \end{aligned}$$

for all \(u\in B\backslash B_\varepsilon \). Using (2.20) and (2.21) and \(z\geqslant c_3\,D\), we deduce that

$$\begin{aligned}&I_r''(\varepsilon )= \left( \int _{B\backslash B_\varepsilon } \prod _{i=1}^{r-1} \phi _i(a(\overline{s}_{{t}_i}-\overline{s}) x_{\overline{s},u}) \left( \int _{U} f_{\overline{s},u}(v)\, dv \right) \, du \right) \left( \int _{{\mathcal {X}}} \phi _r\, d\mu _{\mathcal {X}}\right) \\&\quad +\,O_{r,B}\left( \prod _{i=1}^{r-1}\Vert \phi _i\Vert _{C^0} \left( \rho \Vert f\Vert _{L^1(U)} \Vert \phi _r\Vert _{Lip} + \rho ^{-(c+\sigma )} e^{-\gamma c_3\,D}\, \Vert f\Vert _{C^k}\Vert \phi _r\Vert _{L^2_k({\mathcal {X}})}\right) \right) \\&\quad +\,O_r\left( e^{-c_2 D} \Vert f\Vert _{L^1(U)}\prod _{i=1}^{r}\Vert \phi _i\Vert ^*_{Lip}\right) . \end{aligned}$$

Since \(\Vert f\Vert _{L^1(U)}\ll _\Omega \Vert f\Vert _{C^k}\),

$$\begin{aligned} I_r''(\varepsilon )= & {} \left( \int _{B\backslash B_\varepsilon } \prod _{i=1}^{r-1} \phi _i(a(\overline{s}_{{t}_i}-\overline{s}) x_{\overline{s},u}) \left( \int _{U} f_{\overline{s},u}(v)\, dv \right) \, du \right) \left( \int _{{\mathcal {X}}} \phi _r\, d\mu _{\mathcal {X}}\right) \nonumber \\&+\,O_{r,\Omega }\left( \left( \rho + \rho ^{-(c+\sigma )} e^{-\gamma c_3\,D} +e^{-c_2 D}\right) \Vert f\Vert _{C^k}\prod _{i=1}^{r} \mathcal {N}_k(\phi _i) \right) . \end{aligned}$$
(2.25)

Applying (2.24) one more time (in the backward direction), we get

$$\begin{aligned}&\int _{B\backslash B_\varepsilon } \prod _{i=1}^{r-1} \phi _i(a(\overline{s}_{{t}_i}-\overline{s}) x_{\overline{s},u}) \left( \int _{U} f_{\overline{s},u}(v)\, dv\right) \, du\\&\quad = \int _{B\backslash B_\varepsilon } \left( \int _{U} f_{\overline{s},u}(v)\prod _{i=1}^{r-1} \phi _i(a(\overline{s}_{{t}_i}-\overline{s}) vx_{\overline{s},u})\, dv\right) \, du\\&\qquad +\,O_r\left( e^{-c_2 D}\, \left( \int _B \Vert f_{\overline{s},u}\Vert _{L^1(U)}\,du\right) \prod _{i=1}^{r-1}\Vert \phi _i\Vert ^*_{Lip}\right) . \end{aligned}$$

It follows from (2.22) that

$$\begin{aligned}&\int _{B\backslash B_\varepsilon } \left( \int _{U} f_{\overline{s},u}(v)\prod _{i=1}^{r-1} \phi _i(a(\overline{s}_{{t}_i}-\overline{s}) vx_{\overline{s},u})\, dv\right) \, du\\&\quad = \int _{B} \left( \int _{U} f_{\overline{s},u}(v)\prod _{i=1}^{r-1} \phi _i(a(\overline{s}_{{t}_i}-\overline{s}) vx_{\overline{s},u})\, dv\right) \, du\\&\qquad +\,O_r \left( \varepsilon ^\theta \left( \int _B \Vert f_{\overline{s},u}\Vert _{L^1(U)}\,du\right) \prod _{i=1}^{r-1}\Vert \phi _i\Vert _{C^0}\right) , \end{aligned}$$

where we recognize the first term as \(I_{r-1}\). Using (2.19) and (2.21), we now conclude that

$$\begin{aligned}&\int _{B\backslash B_\varepsilon } \prod _{i=1}^{r-1} \phi _i(a(\overline{s}_{{t}_i}-\overline{s}) x_{\overline{s},u}) \left( \int _{U} f_{\overline{s},u}(v)\, dv\right) \, du\\&\quad =I_{r-1} +O_{r} \left( \left( \varepsilon ^\theta + e^{-c_2 D}\right) \Vert f\Vert _{L^1(U)} \prod _{i=1}^{r-1}\Vert \phi _i\Vert ^*_{Lip}\right) . \end{aligned}$$

Hence, combining this estimate with (2.25), we deduce that

$$\begin{aligned} I_r''(\varepsilon )= & {} I_{r-1}\left( \int _{{\mathcal {X}}} \phi _r\, d\mu _{\mathcal {X}}\right) \\&+\,O_{r,\Omega }\left( \left( \varepsilon ^\theta +\rho + \rho ^{-(c+\sigma )} e^{-\gamma c_3\,D} +e^{-c_2 D}\right) \Vert f\Vert _{C^k}\prod _{i=1}^{r} \mathcal {N}_k(\phi _i) \right) , \end{aligned}$$

and thus, in view of (2.23),

$$\begin{aligned} I_r= & {} I_{r}'(\varepsilon )+I_{r}''(\varepsilon )=I_{r-1}\left( \int _{{\mathcal {X}}} \phi _r\, d\mu _{\mathcal {X}}\right) \\&+\,O_{r,\Omega }\left( \left( \varepsilon ^\theta +\rho + \rho ^{-(c+\sigma )} e^{-\gamma c_3\,D} +e^{-c_2 D}\right) \Vert f\Vert _{C^k}\prod _{i=1}^{r} \mathcal {N}_k(\phi _i) \right) . \end{aligned}$$

This estimate holds whenever \(\rho <\rho _0\) and \(\varepsilon \gg \rho ^{1/(m+n)}\). Taking \(\rho =e^{-c_4 D}\) for sufficiently small \(c_4>0\) and \(\varepsilon \ll \rho ^{1/(m+n)}\), we conclude that there exists \(\delta >0\) such that for all sufficiently large D,

$$\begin{aligned} I_r= I_{r-1}\left( \int _{{\mathcal {X}}} \phi _r\, d\mu _{\mathcal {X}}\right) + O_{r,\Omega }\left( e^{-\delta D}\, \Vert f\Vert _{C^k}\prod _{i=1}^{r} \mathcal {N}_k(\phi _i) \right) . \end{aligned}$$
(2.26)

The exponent \(\delta \) depends on the constants \(c_2\) and \(c_3\) given by Lemma 2.9 and the parameters \(\theta ,c,\sigma ,\gamma \). In particular, \(\delta \) is independent of r. By possibly enlarging the implicit constants we can ensure that the estimate (2.26) also holds for all r-tuples \((t_1,\ldots ,t_r)\), and not just the ones with sufficiently large \(D(t_1,\ldots ,t_r)\). By iterating the estimate (2.26), using that \(I_0\) is a constant, the proof of Theorem 2.2 is finished.

3 CLT for functions with compact support

Let \(a=\hbox {diag}(a_1,\ldots ,a_{m+n})\) be a diagonal linear map of \(\mathbb {R}^{m+n}\) with

$$\begin{aligned} a_1,\ldots ,a_m>1,\quad 0<a_{m+1},\ldots ,a_{m+n} <1,\quad \hbox {and}\quad a_1\cdots a_{m+n}=1. \end{aligned}$$

The map a defines a continuous self-map of the space \({\mathcal {X}}\), which preserves \(\mu _{{\mathcal {X}}}\). We recall that the torus \(\mathcal {Y}=U\mathbb {Z}^{m+n}\subset {\mathcal {X}}\) is equipped with the probability measure \(\mu _\mathcal {Y}\). In this section, we shall prove a central limit theorem for the averages

$$\begin{aligned} F_N:=\frac{1}{\sqrt{N}} \sum _{s=0}^{N-1} \left( \phi \circ a^s-\mu _\mathcal {Y}(\phi \circ a^s)\right) , \end{aligned}$$
(3.1)

restricted to \(\mathcal {Y}\), for a fixed function \(\phi \in C^\infty _c({\mathcal {X}})\). We stress that this result is not needed in the proof of Theorem 1.2, but we nevertheless include it here because we feel that its proof might be instructive before entering the proof of the similar, but far more technical, Theorem 6.1.

Theorem 3.1

For every \(\xi \in \mathbb {R},\)

$$\begin{aligned} \mu _\mathcal {Y}\left( \{y\in \mathcal {Y}:\, F_N(y)<\xi \}\right) \rightarrow \hbox {Norm}_{\sigma _\phi }(\xi ) \end{aligned}$$

as \(N\rightarrow \infty ,\) where

$$\begin{aligned} \sigma _\phi ^2:=\sum _{s=-\infty }^\infty \left( \int _{{\mathcal {X}}} (\phi \circ a^s)\phi \, d\mu _{\mathcal {X}}-\mu _{\mathcal {X}}(\phi )^2\right) . \end{aligned}$$

Remark 3.2

It follows from Theorem 2.1 that the variance \(\sigma _\phi \) is finite.

Our main tool in the proof of Theorem 3.1 will be the estimates on higher-order correlations established in Sect. 2. To make notations less heavy, we shall use a simplified version of Corollary 2.4 stated in terms of \(C^k\)-norms (note that \(\mathcal {N}_k \ll \Vert \cdot \Vert _{C^k}\)):

Corollary 3.3

There exists \(\delta >0\) such that for every \(\phi _0,\ldots ,\phi _r\in C_c^\infty ({\mathcal {X}})\) and \(t_1,\ldots ,t_r>0,\) we have

$$\begin{aligned} \int _{\mathcal {Y}} \phi _0(y) \left( \prod _{i=1}^r \phi _i(a_{t_i}y) \right) d\mu _\mathcal {Y}(y)= & {} \left( \int _{\mathcal {Y}}\phi _0 \, d\mu _\mathcal {Y}\right) \prod _{i=1}^r \left( \int _{{\mathcal {X}}}\phi _i \, d\mu _{\mathcal {X}}\right) \\&+\,O_{r}\left( e^{-\delta D(t_1,\ldots ,t_r)}\, \prod _{i=0}^r \Vert \phi _i\Vert _{C^k}\right) . \end{aligned}$$

3.1 The method of cumulants

Let \(({\mathcal {X}},\mu )\) be a probability space. Given bounded measurable functions \(\phi _1,\ldots ,\phi _r\) on \({\mathcal {X}}\), we define their joint cumulant as

$$\begin{aligned} {{\,\mathrm{Cum}\,}}^{(r)}_\mu (\phi _1,\ldots ,\phi _r) = \sum _{\mathcal {P}} (-1)^{|\mathcal {P}|-1} (|P|-1)!\prod _{I \in \mathcal {P}} \int _{{\mathcal {X}}}\left( \prod _{i\in I} \phi _i\right) \, d\mu , \end{aligned}$$

where the sum is taken over all partitions \(\mathcal {P}\) of the set \(\{1,\ldots ,r\}\). When it is clear from the context, we skip the subscript \(\mu \). For a bounded measurable function \(\phi \) on \({\mathcal {X}}\), we also set

$$\begin{aligned} {{\,\mathrm{Cum}\,}}^{(r)}(\phi ) = {{\,\mathrm{Cum}\,}}^{(r)}(\phi ,\ldots ,\phi ). \end{aligned}$$

We shall use the following classical CLT-criterion (see, for instance, [5]).

Proposition 3.4

Let \((F_T)\) be a sequence of real-valued bounded measurable functions such that

$$\begin{aligned} \int _{\mathcal {X}}F_T\, d\mu =0 \quad \text {and} \quad \sigma ^2 := \lim _{T\rightarrow \infty } \int _{{\mathcal {X}}}F_T^2\, d\mu < \infty \end{aligned}$$
(3.2)

and

$$\begin{aligned} \lim _{T\rightarrow \infty } {{\,\mathrm{Cum}\,}}^{(r)}(F_T) = 0, \quad \hbox {for all}\ r \geqslant 3. \end{aligned}$$
(3.3)

Then for every \(\xi \in \mathbb {R},\)

$$\begin{aligned} \mu (\{F_T<\xi \}) \rightarrow \hbox {Norm}_\sigma (\xi )\quad \hbox {as } T\rightarrow \infty . \end{aligned}$$

Since all the moments of a random variable can be expressed in terms of its cumulants, this criterion is equivalent to the more widely known “Method of Moments”. However, the cumulants have curious, and very convenient, cancellation properties that will play an important role in our proof of Theorem 3.1.

For a partition \(\mathcal {Q}\) of \(\{1,\ldots ,r\}\), we define the conditional joint cumulant with respect to \(\mathcal {Q}\) as

$$\begin{aligned} {{\,\mathrm{Cum}\,}}^{(r)}_\mu (\phi _1,\ldots ,\phi _r|\mathcal {Q}) = \sum _{\mathcal {P}} (-1)^{|\mathcal {P}|-1} (|\mathcal {P}|-1)!\prod _{I \in \mathcal {P}} \prod _{J \in \mathcal {Q}} \int _{{\mathcal {X}}}\left( \prod _{i\in I\cap J} \phi _i\right) \, d\mu , \end{aligned}$$

In what follows, we shall make frequent use of the following proposition.

Proposition 3.5

[2, Prop. 8.1] For any partition \(\mathcal {Q}\) with \(|\mathcal {Q}|\geqslant 2,\)

$$\begin{aligned} {{\,\mathrm{Cum}\,}}^{(r)}_\mu (\phi _1,\ldots ,\phi _r|\mathcal {Q})=0, \end{aligned}$$
(3.4)

for all \(\phi _1,\ldots ,\phi _r \in L^\infty ({\mathcal {X}},\mu )\).

3.2 Estimating cumulants

Fix \(\phi \in C_c^\infty ({\mathcal {X}})\). It will be convenient to write \(\psi _s(y):=\phi (a^s y)-\mu _\mathcal {Y}(\phi \circ a^s)\), so that

$$\begin{aligned} F_N:=\frac{1}{\sqrt{N}} \sum _{s=0}^{N-1} \psi _s \quad \text {and} \quad \int _{\mathcal {Y}} \psi _s \, d\mu _\mathcal {Y}=0. \end{aligned}$$

In this section, we shall estimate cumulants of the form

$$\begin{aligned} \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(F_N)=\frac{1}{N^{r/2}} \sum _{s_1,\ldots ,s_r=0}^{N-1} \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(\psi _{s_1},\ldots ,\psi _{s_r}) \end{aligned}$$
(3.5)

for \(r\geqslant 3\). Since we shall later need to apply these estimate in cases when the function \(\phi \) is allowed to vary with N, it will be important to keep track of the dependence on \(\phi \) in our estimates.

We shall decompose (3.5) into sub-sums where the parameters \(s_1,\ldots ,s_r\) are either “separated” or “clustered”, and it will also be important to control their sizes. For this purpose, it will be convenient to consider the set \(\{0,\ldots ,N-1\}^r\) as a subset of \(\mathbb {R}_+^{r+1}\) via the embedding \((s_1,\ldots ,s_r)\mapsto (0,s_1,\ldots ,s_r)\). Following the ideas developed in the paper [2], we define for non-empty subsets I and J of \(\{0,\ldots , r\}\) and \(\overline{s} = (s_0,\ldots ,s_r) \in \mathbb {R}_+^{r+1}\),

$$\begin{aligned} \rho ^{I}(\overline{s}) := \max \{ |s_i-s_j| \, : \, i,j \in I \}\quad \hbox {and} \quad \rho _{I,J}(\overline{s}):= \min \{ |s_i-s_j| \, : \, i \in I, j \in J\}, \end{aligned}$$

and if \(\mathcal {Q}\) is a partition of \(\{0,\ldots ,r\}\), we set

$$\begin{aligned} \rho ^{\mathcal {Q}}(\overline{s}) := \max \{ \rho ^{I}(\overline{s}) \, : \, I \in \mathcal {Q}\} \quad \hbox {and}\quad \rho _{\mathcal {Q}}(\overline{s}) := \min \{ \rho _{I,J}(\overline{s}) \, : \, I \ne J, I, J \in \mathcal {Q}\}. \end{aligned}$$

For \(0 \leqslant \alpha < \beta \), we define

$$\begin{aligned} \Delta _{\mathcal {Q}}(\alpha ,\beta ) := \{ \overline{s} \in \mathbb {R}_+^{r+1} \, : \, \rho ^{\mathcal {Q}}(\overline{s}) \leqslant \alpha , \ \text {and}\ \rho _{\mathcal {Q}}(\overline{s}) > \beta \} \end{aligned}$$

and

$$\begin{aligned} \Delta (\alpha ):= \{ \overline{s} \in \mathbb {R}_+^{r+1} \, : \, |s_i - s_j| \leqslant \alpha \hbox { for all }i,j\}. \end{aligned}$$

The following decomposition of \(\mathbb {R}_+^{r+1}\) was established in our paper [2, Prop. 6.2]: given

$$\begin{aligned} 0=\alpha _0<\beta _1<\alpha _1=(3+r)\beta _1<\beta _2<\cdots<\beta _r<\alpha _r=(3+r)\beta _r<\beta _{r+1},\nonumber \\ \end{aligned}$$
(3.6)

we have

$$\begin{aligned} \mathbb {R}_+^{r+1} = \Delta (\beta _{r+1}) \cup \left( \bigcup _{j=0}^{r} \bigcup _{|\mathcal {Q}| \geqslant 2} \Delta _{\mathcal {Q}}(\alpha _j,\beta _{j+1}) \right) , \end{aligned}$$
(3.7)

where the union is taken over the partitions \(\mathcal {Q}\) of \(\{0,\ldots ,r\}\) with \(|\mathcal {Q}|\geqslant 2\). Intersecting with \(\{0,1,\ldots ,N-1\}^{r}\), we see that

$$\begin{aligned} \{0,\ldots ,N-1\}^r =\Omega (\beta _{r+1};N) \cup \left( \bigcup _{j=0}^{r} \bigcup _{|\mathcal {Q}| \geqslant 2} \Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};N) \right) , \end{aligned}$$
(3.8)

for all \(N \geqslant 2\), where

$$\begin{aligned} \Omega (\beta _{r+1};N):= & {} \{0,\ldots ,N-1\}^r \cap \Delta (\beta _{r+1}),\\ \Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};N):= & {} \{0,\ldots ,N-1\}^r \cap \Delta _{\mathcal {Q}}(\alpha _j,\beta _{j+1}). \end{aligned}$$

In order to estimate the cumulant (3.5), we shall separately estimate the sums over \(\Omega (\beta _{r+1};N)\) and \(\Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};N)\), the exact choices of the sequences \((\alpha _j)\) and \((\beta _j)\) will be fixed at the very end of our argument.

3.2.1 Case 0: Summing over \(\Omega (\beta _{r+1};N)\)

In this case, \(s_i\leqslant \beta _{r+1}\) for all i, and thus

$$\begin{aligned} |\Omega (\beta _{r+1};N)|\leqslant (\beta _{r+1}+1)^r. \end{aligned}$$

Hence,

$$\begin{aligned} \sum _{(s_1,\ldots ,s_r)\in \Omega (\beta _{r+1};N)} \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(\psi _{s_1},\ldots ,\psi _{s_r})\ll (\beta _{r+1}+1)^r \Vert \phi \Vert _{C^0}^r, \end{aligned}$$
(3.9)

where the implied constants may depend on r, but we shall henceforth omit this subscript to simplify notations.

3.2.2 Case 1: Summing over \(\Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};N)\) with \(\mathcal {Q}=\{\{0\},\{1,\ldots ,r\} \}\)

In this case, we have

$$\begin{aligned} |s_{i_1}-s_{i_2}|\leqslant \alpha _j\quad \hbox {for all }i_1,i_2, \end{aligned}$$

so that it follows that

$$\begin{aligned} |\Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};N)|\ll N(1+\alpha _{j})^{r-1}. \end{aligned}$$

Hence,

$$\begin{aligned} \frac{1}{N^{r/2}} \sum _{(s_1,\ldots ,s_r)\in \Omega _Q(\alpha _j,\beta _{j+1};N)} |\hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(\psi _{s_1},\ldots ,\psi _{s_r})| \ll N^{1-r/2}\alpha _j^{r-1}\Vert \phi \Vert _{C^0}^r.\nonumber \\ \end{aligned}$$
(3.10)

3.2.3 Case 2: Summing over \(\Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};N)\) with \(|\mathcal {Q}|\geqslant 2\) and \(\mathcal {Q}\ne \{\{0\},\{1,\ldots ,r\} \}\)

In this case, the partition \(\mathcal {Q}\) defines a non-trivial partition \(\mathcal {Q}'=\{I_0,\ldots ,I_\ell \}\) of \(\{1,\ldots ,r\}\) such that for all \((s_1,\ldots , s_r)\in \Omega _Q(\alpha _j,\beta _{j+1};N)\), we have

$$\begin{aligned} |s_{i_1}-s_{i_2}|\leqslant \alpha _j \;\; \hbox {if }i_1\sim _{\mathcal {Q}'} i_2 \quad \text {and} \quad |s_{i_1}-s_{i_2}|> \beta _{j+1} \;\; \hbox {if } i_1\not \sim _{\mathcal {Q}'} i_2, \end{aligned}$$
(3.11)

and

$$\begin{aligned} s_i\leqslant \alpha _j\ \hbox {for all }i\in I_0, \quad \text {and} \quad s_i>\beta _{j+1} \ \hbox {for all }i\notin I_0. \end{aligned}$$

In particular,

$$\begin{aligned} D(s_{i_1},\ldots , s_{i_\ell })\geqslant \beta _{j+1}. \end{aligned}$$
(3.12)

Let I be an arbitrary subset of \(\{1,\ldots ,r\}\). In what follows, we shall show a precise version of the “decoupling”:

$$\begin{aligned} \int _{\mathcal {Y}} \left( \prod _{i\in I} \psi _{s_i}\right) \, d\mu _\mathcal {Y}\approx \prod _{h=0}^\ell \left( \int _{\mathcal {Y}} \left( \prod _{i\in I\cap I_h} \psi _{s_i} \right) \,d\mu _\mathcal {Y}\right) , \end{aligned}$$
(3.13)

where we henceforth shall use the convention that the product is equal to one when \(I\cap I_h=\emptyset \).

Let us estimate the left hand side of (3.13). We begin by setting

$$\begin{aligned} \Phi _0:=\prod _{i\in I\cap I_0} \psi _{s_i}. \end{aligned}$$

By (2.7), there exists \(\xi =\xi (m,n,k)>0\) such that

$$\begin{aligned} \Vert \Phi _0\Vert _{C^k}\ll \prod _{i\in I\cap I_0} \Vert \phi \circ a^{s_i}-\mu _\mathcal {Y}(\phi \circ a^{s_i})\Vert _{C^k} \ll e^{|I\cap I_0|\xi \, \alpha _j}\,\Vert \phi \Vert _{C^k}^{|I\cap I_0|}. \end{aligned}$$
(3.14)

To prove (3.13), we expand \(\psi _{s_i}=\phi \circ a^{s_i}-\mu _\mathcal {Y}(\phi \circ a^{s_i})\) for \(i\in I\backslash I_0\) and get

$$\begin{aligned}&\int _{\mathcal {Y}} \left( \prod _{i\in I} \psi _{s_i}\right) \,d\mu _\mathcal {Y}= \sum _{J\subset I\backslash I_0} (-1)^{|I\backslash (J\cup I_0)|} \cdot \nonumber \\&\quad \cdot \left( \int _{\mathcal {Y}} \Phi _0 \left( \prod _{i\in J} \phi \circ a^{s_i}\right) \,d\mu _\mathcal {Y}\right) \prod _{i\in I\backslash (J\cup I_0)} \left( \int _{\mathcal {Y}} (\phi \circ a^{s_i})\,d\mu _\mathcal {Y}\right) . \end{aligned}$$
(3.15)

We recall that when \(i\notin I_0\), we have \(s_i\geqslant \beta _{j+1}\), and thus it follows from Corollary 3.3 with \(r=1\) that

$$\begin{aligned} \int _{\mathcal {Y}} (\phi \circ a^{s_i})\,d\mu _\mathcal {Y}=\mu _{{\mathcal {X}}}(\phi )+O\left( e^{-\delta \beta _{j+1} }\,\Vert \phi \Vert _{C^k}\right) , \quad \hbox {with }i\notin I_0. \end{aligned}$$
(3.16)

To estimate the other integrals in (3.15), we also apply Corollary 3.3. Let us first fix a subset \(J \subset I {\setminus } I_0\) and for each \(1 \leqslant h \leqslant l\), we pick \(i_h\in I_h\), and set

$$\begin{aligned} \Phi _h:=\prod _{i\in J\cap I_h} \phi \circ a^{s_i-s_{i_h}}. \end{aligned}$$

Then

$$\begin{aligned} \int _{\mathcal {Y}} \Phi _0 \left( \prod _{i\in J} \phi \circ a^{s_i}\right) \,d\mu _\mathcal {Y}= \int _{\mathcal {Y}} \Phi _0 \left( \prod _{h=1}^\ell \Phi _h\circ a^{s_{i_h}} \right) \,d\mu _\mathcal {Y}. \end{aligned}$$

We note that for \(i\in I_h\), we have \(|s_i-s_{i_h}|\leqslant \alpha _j\), and thus, by (2.7), there exists \(\xi =\xi (m,n,k)>0\) such that

$$\begin{aligned} \Vert \Phi _h\Vert _{C^k}\ll \prod _{i\in J\cap I_h} \Vert \phi \circ a^{s_i-s_{i_h}}\Vert _{C^k} \ll e^{|J\cap I_h|\xi \, \alpha _j}\,\Vert \phi \Vert _{C^k}^{|J\cap I_h|}. \end{aligned}$$
(3.17)

Using (3.12), Corollary 3.3 implies that

$$\begin{aligned} \int _{\mathcal {Y}} \Phi _0 \left( \prod _{h=1}^\ell \Phi _h\circ a^{s_{i_h}}\right) \,d\mu _\mathcal {Y}= & {} \left( \int _{\mathcal {Y}}\Phi _0 \, d\mu _\mathcal {Y}\right) \prod _{h=1}^\ell \left( \int _{{\mathcal {X}}}\Phi _h \, d\mu _{\mathcal {X}}\right) \\&+\,O\left( e^{-\delta \beta _{j+1}}\, \prod _{h=0}^\ell \Vert \Phi _h\Vert _{C^k}\right) . \end{aligned}$$

Using (3.14) and (3.17) and invariance of the measure \(\mu _{\mathcal {X}}\), we deduce that

$$\begin{aligned} \int _{\mathcal {Y}} \Phi _0 \left( \prod _{h=1}^\ell \Phi _h\circ a^{s_{i_h}}\right) \,d\mu _\mathcal {Y}= & {} \left( \int _{\mathcal {Y}}\Phi _0 \, d\mu _\mathcal {Y}\right) \prod _{h=1}^\ell \left( \int _{{\mathcal {X}}}\left( \prod _{i\in J\cap I_h} \phi \circ a^{s_i} \right) \, d\mu _{\mathcal {X}}\right) \\&+\,O\left( e^{-(\delta \beta _{j+1}-r\xi \alpha _j)}\, \Vert \phi \Vert _{C^k}^{|(I\cap I_0)\cup J|}\right) . \end{aligned}$$

Hence, we conclude that

$$\begin{aligned} \int _{\mathcal {Y}} \Phi _0 \left( \prod _{i\in J} \phi \circ a^{s_i}\right) \,d\mu _\mathcal {Y}= & {} \left( \int _{\mathcal {Y}}\Phi _0 \, d\mu _\mathcal {Y}\right) \prod _{h=1}^\ell \left( \int _{{\mathcal {X}}}\left( \prod _{i\in J\cap I_h} \phi \circ a^{s_i} \right) \, d\mu _{\mathcal {X}}\right) \nonumber \\&+\,O\left( e^{-(\delta \beta _{j+1}-r\xi \alpha _j)}\, \Vert \phi \Vert _{C^k}^{|(I\cap I_0)\cup J|}\right) . \end{aligned}$$
(3.18)

We shall choose the parameters \(\alpha _j\) and \(\beta _{j+1}\) so that

$$\begin{aligned} \delta \beta _{j+1}-r\xi \alpha _j>0. \end{aligned}$$
(3.19)

Substituting (3.16) and (3.18) in (3.15), we deduce that

$$\begin{aligned}&\int _{\mathcal {Y}} \left( \prod _{i\in I} \psi _{s_i}\right) \,d\mu _\mathcal {Y}\nonumber \\&\quad = \sum _{J\subset I\backslash I_0} (-1)^{|I\backslash (J\cup I_0)|} \left( \int _{\mathcal {Y}}\Phi _0 \, d\mu _\mathcal {Y}\right) \prod _{h=1}^\ell \left( \int _{{\mathcal {X}}}\left( \prod _{i\in J\cap I_h} \phi \circ a^{s_i} \right) \, d\mu _{\mathcal {X}}\right) \nonumber \\&\qquad \times \mu _{\mathcal {X}}(\phi )^{|I\backslash (J\cup I_0)|} +\,O \left( e^{-(\delta \beta _{j+1}-r\xi \alpha _j)}\, \Vert \phi \Vert _{C^k}^{|I|}\right) . \end{aligned}$$
(3.20)

Next, we estimate the right hand side of (3.13). Let us fix \(1 \leqslant h \leqslant l\) and for a subset \(J \subset I \cap I_h\), we define

$$\begin{aligned} \Phi _J:=\prod _{i\in J} \phi \circ a^{s_i-s_{i_h}}. \end{aligned}$$

As in (3.17), for some \(\xi >0\),

$$\begin{aligned} \Vert \Phi _J\Vert _{C^k}\ll \prod _{i\in J} \Vert \phi \circ a^{s_i-s_{i_h}}\Vert _{C^k} \ll e^{|J|\xi \, \alpha _j}\,\Vert \phi \Vert _{C^k}^{|J|}. \end{aligned}$$

Applying Corollary 3.3 to the function \(\Phi _J\) and using that \(s_{i_h}>\beta _{j+1}\), we get

$$\begin{aligned} \int _{\mathcal {Y}} \left( \prod _{i\in J} \phi \circ a^{s_i}\right) \,d\mu _\mathcal {Y}= & {} \int _{\mathcal {Y}} (\Phi _J\circ a^{s_{i_h}})\,d\mu _\mathcal {Y}\nonumber \\= & {} \int _{{\mathcal {X}}}\Phi _J\, d\mu _{\mathcal {X}}+O\left( e^{-\delta \beta _{j+1} }\,\Vert \Phi _J\Vert _{C^k}\right) \nonumber \\= & {} \int _{{\mathcal {X}}}\left( \prod _{i\in J} \phi \circ a^{s_i}\right) \, d\mu _{\mathcal {X}}+O\left( e^{-\delta \beta _{j+1} } e^{r\xi \, \alpha _j}\,\Vert \phi \Vert _{C^k}^{|J|}\right) ,\nonumber \\ \end{aligned}$$
(3.21)

where we have used a-invariance of \(\mu _{\mathcal {X}}\). Combining (3.16) and (3.21), we deduce that

$$\begin{aligned} \int _{\mathcal {Y}} \left( \prod _{i\in I\cap I_h} \psi _{s_i}\right) \,d\mu _\mathcal {Y}= & {} \sum _{J\subset I\cap I_h} (-1)^{|(I\cap I_h)\backslash J|} \left( \int _{{\mathcal {X}}} \left( \prod _{i\in J} \phi \circ a^{s_i}\right) \,d\mu _{\mathcal {X}}\right) \mu _{{\mathcal {X}}}(\phi )^{|(I\cap I_h)\backslash J|}\nonumber \\&+\,O\left( e^{-\delta \beta _{j+1} } e^{r\xi \, \alpha _j}\,\Vert \phi \Vert _{C^k}^{|I\cap I_h|}\right) \nonumber \\= & {} \int _{{\mathcal {X}}} \prod _{i\in I\cap I_h} \left( \phi \circ a^{s_i}-\mu _{\mathcal {X}}(\phi )\right) \,d\mu _{\mathcal {X}}\nonumber \\&+\,O\left( e^{-(\delta \beta _{j+1}-r\xi \alpha _j )}\,\Vert \phi \Vert _{C^k}^{|I\cap I_h|}\right) , \end{aligned}$$
(3.22)

which implies

$$\begin{aligned}&\prod _{h=0}^\ell \left( \int _{\mathcal {Y}} \left( \prod _{i\in I\cap I_h} \psi _{s_i} \right) \,d\mu _\mathcal {Y}\right) \\&\quad = \left( \int _{\mathcal {Y}}\Phi _0 \, d\mu _\mathcal {Y}\right) \prod _{h=1}^\ell \left( \int _{{\mathcal {X}}} \prod _{i\in I\cap I_h} \left( \phi \circ a^{s_i}-\mu _{\mathcal {X}}(\phi )\right) \,d\mu _{\mathcal {X}}\right) \\&\qquad +\,O\left( e^{-(\delta \beta _{j+1} - r\xi \alpha _j)} \,\Vert \phi \Vert _{C^k}^{r}\right) . \end{aligned}$$

Furthermore, multiplying out the products over \(I\cap I_h\), we get

$$\begin{aligned}&\prod _{h=0}^\ell \left( \int _{\mathcal {Y}} \left( \prod _{i\in I\cap I_h} \psi _{s_i} \right) \,d\mu _\mathcal {Y}\right) \nonumber \\&\quad = \left( \int _{\mathcal {Y}}\Phi _0 \, d\mu _\mathcal {Y}\right) \sum _{J\subset I\backslash I_0} (-1)^{|I\backslash (I_0\cup J)|}\prod _{h=1}^\ell \left( \int _{{\mathcal {X}}} \prod _{i\in I_h\cap J} \phi \circ a^{s_i}\,d\mu _{\mathcal {X}}\right) \mu _{\mathcal {X}}(\phi )^{|I\backslash (I_0\cup J)|} \nonumber \\&\qquad +O\left( e^{-(\delta \beta _{j+1} - r\xi \alpha _j)} \,\Vert \phi \Vert _{C^k}^{|I|}\right) . \end{aligned}$$
(3.23)

Comparing (3.20) and (3.23), we finally conclude that

$$\begin{aligned} \int _{\mathcal {Y}} \left( \prod _{i\in I} \psi _{s_i}\right) \,d\mu _\mathcal {Y}= & {} \prod _{h=0}^\ell \left( \int _{\mathcal {Y}} \left( \prod _{i\in I\cap I_h} \psi _{s_i} \right) \,d\mu _\mathcal {Y}\right) \\&+\,O\left( e^{-(\delta \beta _{j+1} - r\xi \alpha _j)} \,\Vert \phi \Vert _{C^k}^{|I|}\right) \end{aligned}$$

when \((s_1,\ldots ,s_r)\in \Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};N)\). This establishes (3.13) with an explicit error term. This estimate implies that for the partition \(\mathcal {Q}'=\{I_0,\ldots , I_\ell \}\),

$$\begin{aligned} \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(\psi _{s_1},\ldots ,\psi _{s_r})= \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(\psi _{s_1},\ldots ,\psi _{s_r}|\mathcal {Q}')+ O\left( e^{-(\delta \beta _{j+1} - r\xi \alpha _j)} \,\Vert \phi \Vert _{C^k}^{r}\right) \end{aligned}$$

By Proposition 3.5,

$$\begin{aligned} \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(\psi _{s_1},\ldots ,\psi _{s_r}|\mathcal {Q}')=0, \end{aligned}$$

so it follows that for all \((s_1,\ldots , s_r)\in \Omega _Q(\alpha _j,\beta _{j+1};N)\),

$$\begin{aligned} \left| \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(\psi _{s_1},\ldots ,\psi _{s_r})\right| \ll e^{-(\delta \beta _{j+1} - r\xi \alpha _j)} \,\Vert \phi \Vert _{C^k}^{r}. \end{aligned}$$
(3.24)

3.2.4 Final estimates on the cumulants

Let us now return to (3.5). Upon decomposing this sum into the regions discussed above, and applying the estimates (3.9), (3.10) and (3.24) to respective region, we get the bound

$$\begin{aligned}&\left| \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(F_N)\right| \ll \left( (\beta _{r+1}+1)^r N^{-r/2}+ \left( {\max }_j\,\, \alpha _j^{r-1}\right) N^{1-r/2} \right) \Vert \phi \Vert _{C^0}^r \nonumber \\&\quad + N^{r/2} \left( {\max }_{j}\,\, e^{-(\delta \beta _{j+1} - r\xi \alpha _j)}\right) \,\Vert \phi \Vert _{C^k}^{r}. \end{aligned}$$
(3.25)

This estimate holds provided that (3.6) and (3.19) hold, namely when

$$\begin{aligned} \alpha _j=(3+r)\beta _{j}<\beta _{j+1}\quad \hbox {and}\quad \delta \beta _{j+1}-r\xi \alpha _j>0\quad \hbox { for } j=1,\ldots ,r. \end{aligned}$$
(3.26)

Given any \(\gamma >0\), we define the parameters \(\beta _j\) inductively by the formula

$$\begin{aligned} \beta _1=\gamma \quad \hbox {and}\quad \beta _{j+1}=\max (\gamma +(3+r)\beta _{j}, \gamma +\delta ^{-1}r(3+r)\xi \beta _{j}). \end{aligned}$$
(3.27)

It easily follows by induction that \(\beta _{r+1}\ll _r \gamma \), and we deduce from (3.25) that

$$\begin{aligned} |\hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(F_N)|\ll ( (\gamma +1)^r N^{-r/2}+\gamma ^{r-1}N^{1-r/2}) \,\Vert \phi \Vert _{C^0}^{r} + N^{r/2} e^{-\delta \gamma } \,\Vert \phi \Vert _{C^k}^{r}. \end{aligned}$$

Taking \(\gamma =(r/\delta )\log N\), we conclude that when \(r\geqslant 3\),

$$\begin{aligned} \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(F_N)\rightarrow 0\quad \hbox {as }N\rightarrow \infty . \end{aligned}$$
(3.28)

3.3 Estimating the variance

In this section, we wish to compute the limit

$$\begin{aligned} \Vert F_N\Vert _{L^2(\mathcal {Y})}^2 =\frac{1}{N}\sum _{s_1=0}^{N-1}\sum _{s_2=0}^{N-1} \int _{\mathcal {Y}} \psi _{s_1}\psi _{s_2}\, d\mu _\mathcal {Y}. \end{aligned}$$

Setting \(s_1=s+t\) and \(s_2=t\), we rewrite the above sums as

$$\begin{aligned} \Vert F_N\Vert _{L^2(\mathcal {Y})}^2 =\Theta _N(0) +2\sum _{s=1}^{N-1}\Theta _N(s), \end{aligned}$$
(3.29)

where

$$\begin{aligned} \Theta _N(s):=\frac{1}{N}\sum _{t=1}^{N-1-s} \int _{\mathcal {Y}}\psi _{s+t}\psi _t\, d\mu _\mathcal {Y}. \end{aligned}$$

Since \(\psi _t=\phi \circ a^t-\mu _\mathcal {Y}(\phi \circ a^t)\),

$$\begin{aligned} \int _{\mathcal {Y}}\psi _{s+t}\psi _t\, d\mu _\mathcal {Y}=\int _{\mathcal {Y}}(\phi \circ a^{s+t})(\phi \circ a^{t})\, d\mu _\mathcal {Y}-\mu _{\mathcal {Y}}(\phi \circ a^{s+t}) \mu _{\mathcal {Y}}(\phi \circ a^{t}).\nonumber \\ \end{aligned}$$
(3.30)

It follows from Corollary 3.3 that for fixeds and as \(t\rightarrow \infty \),

$$\begin{aligned} \int _{\mathcal {Y}}(\phi \circ a^{s+t})(\phi \circ a^{t})\, d\mu _\mathcal {Y}\rightarrow \int _{{\mathcal {X}}} (\phi \circ a^{s})\phi \, d\mu _{\mathcal {X}}, \quad \text {and} \quad \mu _{\mathcal {Y}}(\phi \circ a^{t})\rightarrow \mu _{{\mathcal {X}}}(\phi ). \end{aligned}$$

We conclude that

$$\begin{aligned} \int _{\mathcal {Y}}\psi _{s+t}\psi _t\, d\mu _\mathcal {Y}\rightarrow \Theta _\infty (s):=\int _{{\mathcal {X}}} (\phi \circ a^{s})\phi \, d\mu _{\mathcal {X}}-\mu _{\mathcal {X}}(\phi )^2, \end{aligned}$$

as \(t \rightarrow \infty \), and for fixed s,

$$\begin{aligned} \Theta _N(s)\rightarrow \Theta _\infty (s)\quad \hbox {as }N\rightarrow \infty . \end{aligned}$$

If one carelessly interchange limits above, one expects that as \(N\rightarrow \infty \),

$$\begin{aligned} \Vert F_N\Vert _{L^2(\mathcal {Y})}^2 \rightarrow \Theta _\infty (0) +2\sum _{s=1}^{\infty }\Theta _\infty (s) =\sum _{s=-\infty }^{\infty } \Theta _\infty (s). \end{aligned}$$
(3.31)

To prove this limit rigorously, we need to say a bit more to ensure, say, dominated convergence.

It follows from Corollary 3.3 that

$$\begin{aligned} \int _{\mathcal {Y}}(\phi \circ a^{s+t})(\phi \circ a^{t})\, d\mu _\mathcal {Y}= & {} \mu _{{\mathcal {X}}}(\phi )^2+ O\left( e^{-\delta \min (s,t)}\, \Vert \phi \Vert _{C^k}^2\right) ,\\ \int _{\mathcal {Y}}(\phi \circ a^{s+t})\, d\mu _\mathcal {Y}= & {} \mu _{{\mathcal {X}}}(\phi )+ O\left( e^{-\delta (s+t)} \, \Vert \phi \Vert _{C^k}\right) ,\\ \int _{\mathcal {Y}}(\phi \circ a^{t})\, d\mu _\mathcal {Y}= & {} \mu _{{\mathcal {X}}}(\phi )+ O\left( e^{-\delta t} \, \Vert \phi \Vert _{C^k}\right) . \end{aligned}$$

and thus, in combination with (3.30),

$$\begin{aligned} \left| \int _{\mathcal {Y}}\psi _{s+t}\psi _t\, d\mu _\mathcal {Y}\right| \ll e^{-\delta \min (s,t)}\, \Vert \phi \Vert _{C^k}^2. \end{aligned}$$
(3.32)

This integral can also be estimated in a different way. If we set \(\phi _t=\phi \circ a^{t}\), then we deduce from Corollary 3.3 that

$$\begin{aligned} \int _{\mathcal {Y}}(\phi \circ a^{s+t}) (\phi \circ a^{t})\, d\mu _\mathcal {Y}= & {} \int _{\mathcal {Y}}(\phi _{t}\circ a^{s}) \phi _{t}\, d\mu _\mathcal {Y}\\= & {} \mu _{\mathcal {Y}}(\phi _t)\mu _{{\mathcal {X}}}(\phi _t)+O\left( e^{-\delta s}\, \Vert \phi _t\Vert _{C^k}^2\right) , \end{aligned}$$

and

$$\begin{aligned} \mu _{\mathcal {Y}}(\phi \circ a^{s+t})=\mu _{\mathcal {Y}}(\phi _t\circ a^{s}) = \mu _{{\mathcal {X}}} (\phi _t)+ O\left( e^{-\delta s}\, \Vert \phi _t\Vert _{C^k}\right) . \end{aligned}$$

By (2.7), there exists \(\xi =\xi (m,n,k)>0\), such that

$$\begin{aligned} \Vert \phi _t\Vert _{C^k}\ll e^{\xi t}\,\Vert \phi \Vert _{C^k}\quad \hbox {and} \quad |\mu _\mathcal {Y}(\phi _t)|\leqslant \Vert \phi \Vert _{C^k}, \end{aligned}$$

and thus

$$\begin{aligned} \left| \int _{\mathcal {Y}}\psi _{s+t}\psi _t\, d\mu _\mathcal {Y}\right| \ll e^{-\delta s} e^{\xi t}\, \Vert \phi \Vert _{C^k}^2. \end{aligned}$$
(3.33)

Let us now combine (3.32) and (3.33): When \(t\leqslant \delta /(2\xi )\, s\), we use (3.33), and when \(t\geqslant \delta /(2\xi )\, s\), we use (3.32). If we set \(\delta '=\min (\delta /2,\delta ^2/(2\xi ))>0\), then

$$\begin{aligned} \left| \int _{\mathcal {Y}}\psi _{s+t}\psi _t\, d\mu _\mathcal {Y}\right| \ll e^{-\delta ' s}\, \Vert \phi \Vert _{C^k}^2 \end{aligned}$$

for all \(s\geqslant 0\), whence

$$\begin{aligned} |\Theta _N(s)|\leqslant \frac{1}{N}\sum _{t=1}^{N-1-s} \left| \int _{\mathcal {Y}}\psi _{s+t}\psi _t\, d\mu _\mathcal {Y}\right| \ll e^{-\delta ' s}\, \Vert \phi \Vert _{C^k}^2 \end{aligned}$$

uniformly in N. Hence, the Dominated Convergence Theorem applied to (3.29) yields (3.31).

3.4 Proof of Theorem 3.1

In this subsection we shall check that the conditions of Proposition 3.4 hold for the sequence \((F_N)\) defined in (3.1). First, by construction, it is easy to check that

$$\begin{aligned} \int _\mathcal {Y}F_T\, d\mu _\mathcal {Y}=0, \end{aligned}$$

and by (3.31),

$$\begin{aligned} \int _{\mathcal {Y}}F_N^2\, d\mu _\mathcal {Y}\rightarrow \sum _{s=-\infty }^{\infty } \Theta _\infty (s)\quad \hbox {as }N\rightarrow \infty . \end{aligned}$$

Furthermore, by (3.28),

$$\begin{aligned} {{\,\mathrm{Cum}\,}}_{\mu _\mathcal {Y}}^{(r)}(F_N) \rightarrow 0\quad \hbox {as }N\rightarrow \infty , \end{aligned}$$

for every \(r \geqslant 3\), which finishes the proof.

4 Non-divergence estimates for Siegel transforms

4.1 Siegel transforms

We recall that the space \({\mathcal {X}}\) of unimodular lattices in \(\mathbb {R}^{m+n}\) can be identified with the quotient space \(G/\Gamma \), where \(G = {\text {SL}}_{m+n}(\mathbb {R})\) and \(\Gamma = {\text {SL}}_{m+n}(\mathbb {Z})\), which is endowed with the G-invariant probability measures \(\mu _{{\mathcal {X}}}\). We denote by \(m_G\) a bi-G-invariant Radon measure on G. Given a bounded measurable function \(f:\mathbb {R}^{m+n}\rightarrow \mathbb {R}\) with compact support, we define its Siegel transform\({\hat{f}}:{\mathcal {X}}\rightarrow \mathbb {R}\) by

$$\begin{aligned} {\hat{f}}(\Lambda ):=\sum _{z\in \Lambda \backslash \{0\}} f(z)\quad \hbox {for }\Lambda \in {\mathcal {X}}. \end{aligned}$$

We stress that \(\hat{f}\) is unbounded on \({\mathcal {X}}\), its growth is controlled by an explicit function \(\alpha \) which we now introduce. Given a lattice \(\Lambda \) in \(\mathbb {R}^{m+n}\), we say that a subspace V of \(\mathbb {R}^{m+n}\) is \(\Lambda \)-rational if the intersection \(V\cap \Lambda \) is a lattice in V. If V is \(\Lambda \)-rational, we denote by \(d_\Lambda (V)\) the volume of \(V/(V\cap \Lambda )\), and define

$$\begin{aligned} \alpha (\Lambda ):=\sup \left\{ d_\Lambda (V)^{-1}:\, V\hbox { is a } \Lambda \text {-rational subspace of }\mathbb {R}^{m+n}\right\} . \end{aligned}$$

It follows from the Mahler Compactness Criterion that \(\alpha \) is a proper function on \({\mathcal {X}}\).

Proposition 4.1

[16, Lem. 2] If \(f:\mathbb {R}^{m+n}\rightarrow \mathbb {R}\) is a bounded function with compact support,  then

$$\begin{aligned} |{\hat{f}}(\Lambda )|\ll _{\mathrm {supp}(f)} \Vert f\Vert _{C^0}\, \alpha (\Lambda )\quad \hbox {for all }\Lambda \in {\mathcal {X}}. \end{aligned}$$

Using reduction theory, it is not hard to derive the following integrability of \(\alpha \), which is well-known in Geometry of Numbers (see e.g. [4, Lemma 3.10]).

Proposition 4.2

\(\alpha \in L^p({\mathcal {X}})\) for \(1\leqslant p<m+n\). In particular, 

$$\begin{aligned} \mu _{\mathcal {X}}\left( \{\alpha \geqslant L\}\right) \ll _p L^{-p}\quad \hbox {for all } p<m+n. \end{aligned}$$

In what follows, \(d\overline{z}\) denotes the volume element on \(\mathbb {R}^{m+n}\) which assigns volume one to the unit cube. In our arguments below, we will make heavy use of the following two integral formulas:

Proposition 4.3

(Siegel Mean Value Theorem; [17]) If \(f:\mathbb {R}^{m+n}\rightarrow \mathbb {R}\) is a bounded Riemann integrable function with compact support,  then

$$\begin{aligned} \int _{\mathcal {X}}{\hat{f}}\, d\mu _{\mathcal {X}}= \int _{\mathbb {R}^{m+n}} f(\overline{z})\, d\overline{z}. \end{aligned}$$

Proposition 4.4

(Rogers Formula; [14, Theorem 5]) If \(F : \mathbb {R}^{m+n}\times \mathbb {R}^{m+n} \rightarrow \mathbb {R}\) is a non-negative measurable function,  then

$$\begin{aligned}&\int _{{\mathcal {X}}} \left( \sum _{\overline{z}_1,\overline{z}_2\in P(\mathbb {Z}^{n+m})} F(g\overline{z}_1,g\overline{z}_2)\right) d\mu _{\mathcal {X}}(g\Gamma )\\&\quad =\zeta (m+n)^{-2}\int _{\mathbb {R}^{m+n}\times \mathbb {R}^{m+n}} F(\overline{z}_1,\overline{z}_2)\,d\overline{z}_1d\overline{z}_2\\&\qquad +\,\zeta (m+n)^{-1}\int _{\mathbb {R}^{m+n}} F(\overline{z},\overline{z})\,d\overline{z}\\&\qquad +\,\zeta (m+n)^{-1}\int _{\mathbb {R}^{m+n}} F(\overline{z},-\overline{z})\,d\overline{z}, \end{aligned}$$

where \(P(\mathbb {Z}^{m+n})\) denotes the set of primitive integral vectors in \(\mathbb {Z}^{m+n},\) and \(\zeta \) denotes Riemann’s \(\zeta \)-function.

4.2 Non-divergence estimates

We retain the notation from Sect. 2. Given

$$\begin{aligned} 0<w_1,\ldots , w_m<n \quad \text {and} \quad w_1+\cdots +w_m=n, \end{aligned}$$

we denote by a the self-map on \({\mathcal {X}}\) induced by

$$\begin{aligned} a=\hbox {diag}(e^{w_1},\ldots ,e^{w_m},e^{-1},\ldots ,e^{-1}). \end{aligned}$$
(4.1)

Our goal in this subsection is to analyze the escape of mass for the submanifolds \(a^s\mathcal {Y}\) and bound the Siegel transforms \({\hat{f}}(a^s y)\) for \(y\in \mathcal {Y}\). The following proposition will play a very important role in our arguments.

Proposition 4.5

There exists \(\kappa >0\) such that for every \(L\geqslant 1\) and \(s\geqslant \kappa \log L,\)

$$\begin{aligned} \mu _\mathcal {Y}\left( \{y\in \mathcal {Y}:\, \alpha (a^sy)\geqslant L\}\right) \ll _p L^{-p}\quad \hbox {for all }p<m+n. \end{aligned}$$

Proof

Let \(\chi _L\) be the characteristic function of the subset \(\{\alpha < L\}\) of \({\mathcal {X}}\). By Mahler’s Compactness Criterion, \(\chi _L\) has a compact support. We further pick a non-negative \(\rho \in C_c^\infty (G)\) with \(\int _G \rho \, dm_G=1\). Let

$$\begin{aligned} \eta _L(x):=(\rho *\chi _L)(x)=\int _G \rho (g) \chi _L(g^{-1}x)\, dm_G(g), \quad x\in {\mathcal {X}}. \end{aligned}$$

Since \(\mu _{\mathcal {X}}\) is G-invariant,

$$\begin{aligned} \int _{{\mathcal {X}}} \eta _{L}\, d\mu _{\mathcal {X}}=\int _{{\mathcal {X}}} \chi _{L}\, d\mu _{\mathcal {X}}=\mu _{\mathcal {X}}(\{\alpha <L\}). \end{aligned}$$

It follows from invariance of \(m_G\) that if \(\mathcal {D}_Z\) is a differential operator as defined as in (2.4), then \(\mathcal {D}_Z\eta _L=(\mathcal {D}_Z\rho )*\chi _L\). Hence, \(\eta _L\in C^\infty _c({\mathcal {X}})\), and \(\Vert \eta _L\Vert _{C^k}\ll \Vert \rho \Vert _{C^k}\).

Note that there exists \(c>1\) such that for every \(g\in \hbox {supp}(\rho )\) and all \(x\in {\mathcal {X}}\), we have \(\alpha (g^{-1}x)\geqslant c^{-1}\,\alpha (x)\), and thus \(\{\alpha \circ g^{-1}<L\}\subset \{\alpha <c L\}\) and \(\eta _{L}\leqslant \chi _{cL}\). This implies the lower bound

$$\begin{aligned} \mu _\mathcal {Y}\left( \{y\in \mathcal {Y}:\, \alpha (a^sy)< cL\}\right) =\int _\mathcal {Y}\chi _{cL}(a^sy)\, d\mu _\mathcal {Y}(y)\geqslant \int _\mathcal {Y}\eta _{L}(a^sy)\, d\mu _\mathcal {Y}(y). \end{aligned}$$

By Corollary 3.3, there exists \(\delta > 0\) and \(k \geqslant 1\) such that

$$\begin{aligned} \int _\mathcal {Y}\eta _{L}(a^sy)\, d\mu _\mathcal {Y}(y)= & {} \int _{{\mathcal {X}}} \eta _{L}\, d\mu _{\mathcal {X}}+O\left( e^{-\delta s}\, \Vert \eta _{L}\Vert _{C^k} \right) \\= & {} \mu _{\mathcal {X}}\left( \{\alpha <L\}\right) +O\left( e^{-\delta s}\right) , \end{aligned}$$

and by Proposition 4.2,

$$\begin{aligned} \mu _{\mathcal {X}}\left( \{\alpha \geqslant L \}\right) \ll _p L^{-p} \quad \hbox {for all } p<m+n. \end{aligned}$$

Combining these bounds, we get

$$\begin{aligned} \mu _\mathcal {Y}(\{y\in \mathcal {Y}:\, \alpha (a^sy)< cL\})\geqslant \mu _{\mathcal {X}}(\{\alpha < L \})+O\left( e^{-\delta s}\right) =1+O_{p}\left( L^{-p} +e^{-\delta s}\right) , \end{aligned}$$

and thus

$$\begin{aligned} \mu _\mathcal {Y}\left( \{y\in \mathcal {Y}:\, \alpha (a^sy)\geqslant c L\}\right) \ll _p L^{-p} +e^{-\delta s}. \end{aligned}$$

By taking \(s \geqslant \kappa \log L\) where \(\kappa = \frac{p}{\delta }\), the proof is finished. \(\square \)

Proposition 4.6

Let f be a bounded measurable function on \(\mathbb {R}^{m+n}\) with compact support contained in the open set \(\{(x_{m+1},\ldots , x_{m+n})\ne 0\}\). Then,  with a as in (4.1),

$$\begin{aligned} \sup _{s\geqslant 0} \int _{\mathcal {Y}} |{\hat{f}}\circ a^s|\, d\mu _\mathcal {Y}<\infty . \end{aligned}$$

Remark 4.7

In [4, Theorem 3.2], the authors (implicitly) show a similar uniform estimate for integrals of smooth Siegel transforms over \({\text {SO}}(m+n)\)-orbits of a lattice. Their proof is quite different from ours.

Proof

We note that there exist \(0<\upsilon _1<\upsilon _2\) and \(\vartheta >0\) such that the support of f is contained in the set

$$\begin{aligned} \left\{ (\overline{x},\overline{y})\in \mathbb {R}^{m+n}:\; \upsilon _1\leqslant \Vert \overline{y}\Vert \leqslant \upsilon _2,\,\; |x_i|\leqslant \vartheta \, \Vert \overline{y}\Vert ^{-w_i},\,\; i=1,\ldots ,n\right\} , \end{aligned}$$
(4.2)

and without loss of generality we may assume that f is the characteristic function of this set. We recall that \(\mathcal {Y}\) can be identified with the collection of lattices

$$\begin{aligned} \left\{ \Lambda _u:\; u=\left( u_{ij}:i=1,\ldots m, j=1,\ldots ,n\right) \in [0,1)^{m\times n}\right\} . \end{aligned}$$

We set \(\overline{u}_i=(u_{i1},\ldots ,u_{in})\). Then by the definition of the Siegel transform,

$$\begin{aligned} {\hat{f}}(a^s\Lambda _u) =\sum _{(\overline{p},\overline{q})\in \mathbb {Z}^{m+n}\backslash \{0\}} f\left( e^{w_1 s}(p_1+\left<\overline{u}_1,\overline{q}\right>),\ldots , e^{w_m s}(p_m+\left<\overline{u}_m,\overline{q}\right>), e^{-s}\overline{q} \right) . \end{aligned}$$

Denoting by \(\chi ^{(i)}_{\overline{q}}\) the characteristic function of the interval \(\left[ -\vartheta \, \Vert \overline{q}\Vert ^{-w_i},\vartheta \, \Vert \overline{q}\Vert ^{-w_i}\right] \), we rewrite this sum as

$$\begin{aligned} {\hat{f}}(a^s\Lambda _u)= & {} \sum _{\upsilon _1 e^{s}\leqslant \Vert \overline{q}\Vert \leqslant \upsilon _2 e^s} \sum _{\overline{p}\in \mathbb {Z}^m} \prod _{i=1}^m \chi ^{(i)}_{\overline{q}}(p_i+\left<\overline{u}_i,\overline{q}\right>)\nonumber \\= & {} \sum _{\upsilon _1 e^{s}\leqslant \Vert \overline{q}\Vert \leqslant \upsilon _2 e^s} \prod _{i=1}^m \left( \sum _{p_i\in \mathbb {Z}} \chi ^{(i)}_{\overline{q}}(p_i+\left<\overline{u}_i,\overline{q}\right>) \right) . \end{aligned}$$
(4.3)

Hence,

$$\begin{aligned} \int _{\mathcal {Y}} ({\hat{f}}\circ a^s)\, d\mu _\mathcal {Y}=\sum _{\upsilon _1 e^{s}\leqslant \Vert \overline{q}\Vert \leqslant \upsilon _2 e^s} \prod _{i=1}^m \left( \sum _{p_i\in \mathbb {Z}} \int _{[0,1]^n} \chi ^{(i)}_{\overline{q}}(p_i+\left<\overline{u}_i,\overline{q}\right>) d\overline{u}_i\right) . \end{aligned}$$

We observe that for each i and \(p_i \in \mathbb {Z}\), the volume of the set

$$\begin{aligned} \left\{ \overline{u}\in [0,1]^n:\, |p+\left<\overline{u},\overline{q}\right>|\leqslant \vartheta \, \Vert \overline{q}\Vert ^{-w_i}\right\} \end{aligned}$$

is estimated from above by

$$\begin{aligned} \frac{2\vartheta \,\Vert \overline{q}\Vert ^{-w_i}}{\max _j |q_j|}\ll \Vert \overline{q}\Vert ^{-1-w_i}, \end{aligned}$$

and we note that the set is empty whenever \(|p|> \sum _j |q_j|+\vartheta \, \Vert \overline{q}\Vert ^{-w_i }\). In particular, it is non-empty for at most \(O(\Vert \overline{q}\Vert )\) choices of \(p\in \mathbb {Z}\). Hence, we deduce that

$$\begin{aligned} \int _{\mathcal {Y}} ({\hat{f}}\circ a^s)\, d\mu _\mathcal {Y}\ll \sum _{\upsilon _1 e^{s}\leqslant \Vert \overline{q}\Vert \leqslant \upsilon _2 e^s} \prod _{i=1}^m \Vert \overline{q}\Vert ^{-w_i}=\sum _{\upsilon _1 e^{s}\leqslant \Vert \overline{q}\Vert \leqslant \upsilon _2 e^s} \Vert \overline{q}\Vert ^{-n}\ll 1, \end{aligned}$$

uniformly in s. This completes the proof. \(\square \)

Proposition 4.8

Let f be a bounded measurable function on \(\mathbb {R}^{m+n}\) with compact support contained in the open set \(\{(x_{m+1},\ldots ,x_{m+n})\ne 0\}\). Then

$$\begin{aligned} \sup _{s\geqslant 0} (1+s)^{-\nu _m}\left\| {\hat{f}}\circ a^s\right\| _{L^2(\mathcal {Y})}<\infty , \end{aligned}$$

where \(\nu _1=1\) and \(\nu _m=0\) when \(m\geqslant 2\).

Proof

As in the proof of Proposition 4.6, it is sufficient to consider the case when f is the characteristic function of the set (4.2). Then \({\hat{f}}(a^sy)\) is given by (4.3), and we get

$$\begin{aligned}&\left\| {\hat{f}}\circ a^s\right\| _{L^2(\mathcal {Y})}^2 =\int _{\mathcal {Y}} {\hat{f}}( a^s y){\hat{f}}( a^s y)\, d\mu _\mathcal {Y}(y)\\&\quad =\sum _{\upsilon _1\, e^{s}\leqslant \Vert \overline{q}\Vert ,\Vert \overline{\ell }\Vert \leqslant \upsilon _2\, e^s} \prod _{i=1}^m \left( \sum _{p_i,r_i\in \mathbb {Z}} \int _{[0,1]^n}\chi ^{(i)}_{\overline{q}}\left( p_i+ \left<\overline{u}_i,\overline{q}\right>\right) \chi ^{(i)}_{\overline{\ell }}\left( r_i+\left< \overline{u}_i,\overline{\ell }\right>\right) \, d\overline{u}_i\right) . \end{aligned}$$

For fixed \(\overline{q}, \overline{l} \in \mathbb {Z}^n\), we wish to estimate

$$\begin{aligned} I_i(\overline{q},\overline{\ell }):= \sum _{p,r\in \mathbb {Z}} \int _{[0,1]^n}\chi ^{(i)}_{\overline{q}}\left( p+\left<\overline{u}, \overline{q}\right>\right) \chi ^{(i)}_{\overline{\ell }} \left( r+\left<\overline{u},\overline{\ell }\right>\right) \, d\overline{u}. \end{aligned}$$

First, we consider the case when \(\overline{q}\) and \(\overline{\ell }\) are linearly independent. Then there exist indices \(j,k=1,\ldots ,n\) such that \(q_j\ell _k-q_k\ell _j\ne 0.\) Let us consider the function \(\psi \) on \(\mathbb {R}^2\) defined by \(\psi (x_1,x_2)=\chi ^{(i)}_{\overline{q}}(x_1)\chi ^{(i)}_{\overline{\ell }}(x_2)\) as well as the periodized function \({\bar{\psi }}\) on \(\mathbb {R}^2/\mathbb {Z}^2\) defined by \({\bar{\psi }}(x)=\sum _{z\in \mathbb {Z}^2} \psi (z+x)\). If we set

$$\begin{aligned} \omega :=\sum _{\zeta \ne j,k} q_\zeta u_\zeta \quad \text {and} \quad \rho :=\sum _{\zeta \ne j,k} \ell _\zeta u_\zeta , \end{aligned}$$

then we denote by S the affine map

$$\begin{aligned} S:(x_1,x_2)\mapsto (q_jx_1 +q_kx_2+\omega , \ell _j x_1+\ell _k x_2+\rho ), \end{aligned}$$

which induces an affine endomorphism of the torus \(\mathbb {R}^2/\mathbb {Z}^2\). We note that

$$\begin{aligned} \sum _{p,r\in \mathbb {Z}}\int _{[0,1]^2}\chi ^{(i)}_{\overline{q}}\left( p+\left<\overline{u},\overline{q}\right>\right) \chi ^{(i)}_{\overline{\ell }}\left( r+\left<\overline{u},\overline{\ell }\right>\right) \, du_jdu_k=\int _{\mathbb {R}^2/\mathbb {Z}^2} \bar{\psi }(Sx)\, d\mu (x), \end{aligned}$$

where \(\mu \) denotes the Lebesgue probability measure on the torus \(\mathbb {R}^2/\mathbb {Z}^2\). Since the endomorphism S preserves \(\mu \), we see that

$$\begin{aligned} \int _{\mathbb {R}^2/\mathbb {Z}^2} \bar{\psi }(Sx)\, d\mu (x)=\int _{\mathbb {R}^2/\mathbb {Z}^2} \bar{\psi }(x)\, d\mu (x) =\int _{\mathbb {R}^2} \psi (x)\, dx=4\vartheta ^2\, \Vert \overline{q}\Vert ^{-w_i} \Vert \overline{\ell }\Vert ^{-w_i}. \end{aligned}$$

Therefore, we conclude that in this case,

$$\begin{aligned} I_i(\overline{q},\overline{\ell })\ll \Vert \overline{q}\Vert ^{-w_i} \Vert \overline{\ell }\Vert ^{-w_i}. \end{aligned}$$
(4.4)

Let us now we consider the second case when \(\overline{q}\) and \(\overline{\ell }\) are linearly dependent. Upon re-arranging indices if needed, we may assume that

$$\begin{aligned} |q_1|=\max (|q_1|,\ldots ,|q_n|,|\ell _1|,\ldots , |\ell _n|). \end{aligned}$$
(4.5)

In particular, \(q_1\ne 0\), and thus \(\ell _1 \ne 0\), since \(\overline{q}\) and \(\overline{\ell }\) are linearly dependent, so we can define the new variables

$$\begin{aligned} v_1=\sum _{j=1}^n (q_j/q_1)u_j = \sum _{j=1}^n (\ell _j/\ell _1)u_j \quad \text {and} \quad v_2=u_2,\ldots , v_n=u_n, \end{aligned}$$

and thus

$$\begin{aligned} I_i(\overline{q},\overline{\ell })\leqslant J_i(\overline{q},\overline{\ell }) \end{aligned}$$

where

$$\begin{aligned} J_i(\overline{q},\overline{\ell }):= \sum _{p,r\in \mathbb {Z}} \int _{-n}^n\chi ^{(i)}_{\overline{q}}\left( p+q_1 v_1\right) \,\chi ^{(i)}_{\overline{\ell }}\left( r+\ell _1v_1\right) \, dv_1. \end{aligned}$$

We note that the last integral is non-zero only when \(|p|\ll |q_1|\) and \(|r|\ll |\ell _1|\). We set \(q_1=q'd\) and \(\ell _1=\ell 'd\) where \(d=\gcd (q_1,\ell _1)\). Then \(q_1r-\ell _1 p=j d\) for some \(j\in \mathbb {Z}\). We observe that when j is fixed, then the integers p and r satisfy the equation \(q'r-\ell 'p=j\). Since \(\gcd (q',\ell ')=1\), all the solutions of this equation are of the form \(p=p_0+kq'\), \(r=r_0+k\ell '\) for \(k\in \mathbb {Z}\). In particular, it follows that the number of such solutions satisfying \(|p|\ll |q_1|\) and \(|r|\ll |\ell _1|\) is at most O(d). We write

$$\begin{aligned} J_i(\overline{q},\overline{\ell })=J^{(1)}_i(\overline{q},\overline{\ell })+J^{(2)}_i(\overline{q},\overline{\ell }), \end{aligned}$$

where the first sum is taken over those pr with \(q_1r-\ell _1 p\ne 0\), and the second sum is taken over those pr with \(q_1r-\ell _1 p=0\).

Upon applying a linear change of variables, we obtain

$$\begin{aligned} J^{(1)}_i(\overline{q},\overline{\ell })= & {} \sum _{p,r:\, q_1r-\ell _1 p\ne 0} \int _{-n+p/q_1}^{n+p/q_1}\chi ^{(i)}_{\overline{q}}(q_1v_1)\,\chi ^{(i)}_{\overline{\ell }}\left( (q_1r-\ell _1 p)/q_1+\ell _1v_1\right) \, dv_1\\\ll & {} d \sum _{j\in \mathbb {Z}\backslash \{0\}} \int _{-\infty }^{\infty }\chi ^{(i)}_{\overline{q}}(q_1 v_1)\,\chi ^{(i)}_{\overline{\ell }}\left( jd/q_1+\ell _1 v_1\right) \, dv_1. \end{aligned}$$

Let us consider the function

$$\begin{aligned} \rho _i(x):= \int _{-\infty }^{\infty } \chi ^{(i)}_{\overline{q}}(q_1 v_1)\,\chi ^{(i)}_{\overline{\ell }}\left( xd/q_1+\ell _1 v_1\right) \, dv_1. \end{aligned}$$

We note that the integrand equals the indicator function of the intersection of the intervals

$$\begin{aligned} \left[ -\vartheta \,|q_1|^{-1}\Vert \overline{q}\Vert ^{-w_i},\vartheta \, |q_1|^{-1}\Vert \overline{q}\Vert ^{-w_i}\right] \end{aligned}$$

and

$$\begin{aligned} \left[ -xd/(q_1\ell _1)-\vartheta \,|\ell _1|^{-1}\Vert \overline{\ell }\Vert ^{-w_i},-xd/(q_1\ell _1)+\vartheta \,|\ell _1|^{-1}\Vert \overline{\ell }\Vert ^{-w_i}\right] , \end{aligned}$$

and thus it follows that \(\rho _i\) is non-increasing when \(x\geqslant 0\), and non-decreasing when \(x\leqslant 0\). This implies that

$$\begin{aligned} \sum _{j\in \mathbb {Z}\backslash \{0\}} \rho _i(j) \leqslant \int _{-\infty }^{\infty } \rho _i(x)\, dx. \end{aligned}$$

Since

$$\begin{aligned} \int _{-\infty }^{\infty } \rho _i(x)\, dx= \left( \int _{-\infty }^{\infty } \chi ^{(i)}_{\overline{q}}(q_1 v_1)\,dv_1\right) \left( \int _{-\infty }^{\infty }\chi ^{(i)}_{\overline{\ell }}\left( xd/q_1\right) \, dx\right) \ll d^{-1} \Vert \overline{q}\Vert ^{-w_i} \Vert \overline{\ell }\Vert ^{-w_i}, \end{aligned}$$

we conclude that

$$\begin{aligned} J_i^{(1)}(\overline{q},\overline{\ell })\ll d \sum _{j\in \mathbb {Z}\backslash \{0\}} \rho _i(j) \ll \Vert \overline{q}\Vert ^{-w_i}\Vert \overline{\ell }\Vert ^{-w_i}. \end{aligned}$$

Next, we proceed with estimation of \(J^{(2)}_i(\overline{q},\overline{\ell })\). Let \(c_0:=\min \{\Vert \overline{q}\Vert : \overline{q} \in \mathbb {Z}^{n}\backslash \{0\}\}\). We set

$$\begin{aligned} N(q_1,\ell _1) = \big | \big \{ (p,r) \in \mathbb {Z}^2 \, : \, q_1 r = \ell _1 p \hbox { with } |p| \leqslant (c_o^{-n}\vartheta + n)|q_1| \big \} \big | \end{aligned}$$

and obtain

$$\begin{aligned} J^{(2)}_i(\overline{q},\overline{\ell })= & {} \sum _{p,r:\, q_1r-\ell _1 p= 0} \int _{-n+p/q_1}^{n+p/q_1}\chi ^{(i)}_{\overline{q}}(q_1v_1)\,\chi ^{(i)}_{\overline{\ell }}\left( \ell _1v_1\right) \, dv_1\\\leqslant & {} N(q_1,\ell _1) \int _{-\infty }^{\infty }\chi ^{(i)}_{\overline{q}}(q_1v_1)\,\chi ^{(i)}_{\overline{\ell }}\left( \ell _1v_1\right) \, dv_1\\\ll & {} N(q_1,\ell _1)|q_1|^{-1}\Vert \overline{q}\Vert ^{-w_i}\ll N(q_1,\ell _1)\max \left( \Vert \overline{q}\Vert ,\Vert \overline{\ell }\Vert \right) ^{-(1+w_i)}, \end{aligned}$$

where we used that \(q_1\) is chosen according to (4.5). Combining the obtained estimates for \(J_i^{(1)}(\overline{q},\overline{\ell })\) and \(J^{(2)}_i(\overline{q},\overline{\ell })\), we conclude that when \(\overline{q}\) and \(\overline{\ell }\) are linearly dependent,

$$\begin{aligned} J_i(\overline{q},\overline{\ell })\ll \Vert \overline{q}\Vert ^{-w_i}\Vert \overline{\ell }\Vert ^{-w_i}+ N(q_1,\ell _1)\max \left( \Vert \overline{q}\Vert ,\Vert \overline{\ell }\Vert \right) ^{-(1+w_i)}, \end{aligned}$$
(4.6)

where \(q_1\) is chosen according to (4.5).

Now we proceed to estimate

$$\begin{aligned} \left\| {\hat{f}}\circ a^s\right\| _{L^2(\mathcal {Y})}^2\ll \sum _{\upsilon _1\, e^{s}\leqslant \Vert \overline{q}\Vert ,\Vert \overline{\ell }\Vert \leqslant \upsilon _2\, e^s} \prod _{i=1}^m I_i(\overline{q},\overline{\ell }). \end{aligned}$$
(4.7)

Using (4.4), the sum in (4.7) over linearly independent \(\overline{q}\) and \(\overline{\ell }\) can be estimated as

$$\begin{aligned} \ll \sum _{\upsilon _1\, e^{s}\leqslant \Vert \overline{q}\Vert ,\Vert \overline{\ell }\Vert \leqslant \upsilon _2\, e^s} \prod _{i=1}^m \Vert \overline{q}\Vert ^{-w_i}\Vert \overline{\ell }\Vert ^{-w_i}\ll \sum _{\upsilon _1\, e^{s}\leqslant \Vert \overline{q}\Vert ,\Vert \overline{\ell }\Vert \leqslant \upsilon _2\, e^s} \Vert \overline{q}\Vert ^{-n}\Vert \overline{\ell }\Vert ^{-n}\ll 1. \end{aligned}$$

For a subset I of \(\{1,\ldots ,m\}\), we set \(w(I):=\sum _{i\in I} w_i\). Then using (4.6), we deduce that the sum in (4.7) over linearly dependent \(\overline{q}\) and \(\overline{\ell }\) is bounded by

$$\begin{aligned}&\ll {\sum }_{\upsilon _1\, e^{s}\leqslant \Vert \overline{q}\Vert ,\Vert \overline{\ell }\Vert \leqslant \upsilon _2\, e^s}^{*} \sum _{I\subset \{1,\ldots ,m\}} \Vert \overline{q}\Vert ^{-w(I)}\Vert \overline{\ell }\Vert ^{-w(I)} N(q_1,\ell _1)^{|I^c|}\nonumber \\&\quad \times \max \left( \Vert \overline{q}\Vert ,\Vert \overline{\ell }\Vert \right) ^{-(|I^c|+w({I^c}))} \nonumber \\&\ll \sum _{I\subset \{1,\ldots ,m\}} (e^s)^{-(n+|I^c|+w(I))} {\sum }_{\upsilon _1\, e^{s}\leqslant \Vert \overline{q}\Vert ,\Vert \overline{\ell }\Vert \leqslant \upsilon _2\, e^s}^* N(q_1,\ell _1)^{|I^c|}. \end{aligned}$$
(4.8)

The star indicates that the sum is taken over linearly dependent \(\overline{q}\) and \(\overline{\ell }\).

When \(I^c=\emptyset \), then \(w(I)=n\). Since the number of \((\overline{q},\overline{\ell })\) satisfying \(\upsilon _1\, e^{s}\leqslant \Vert \overline{q}\Vert ,\Vert \overline{\ell }\Vert \leqslant \upsilon _2\, e^s\) is estimated as \(O(e^{2ns})\), it is clear that the corresponding term in the above sum is uniformly bounded.

Now we suppose that \(I^c\ne \emptyset \). Since \(\overline{q}\) and \(\overline{\ell }\) are linearly dependent, the vector \(\overline{\ell }\) is uniquely determined given \(\ell _1\) and \(\overline{q}\), and we obtain that for some \(\upsilon _1',\upsilon _2'>0\),

$$\begin{aligned} {\sum }_{\upsilon _1\, e^{s}\leqslant \Vert \overline{q}\Vert ,\Vert \overline{\ell }\Vert \leqslant \upsilon _2\, e^s}^* N(q_1,\ell _1)^{|I^c|}\ll (e^{s})^{n-1}\sum _{\upsilon _1'\, e^{s}\leqslant |q_1|\leqslant \upsilon '_2\, e^s} \sum _{1\leqslant |\ell _1|\leqslant |q_1|} N(q_1,\ell _1)^{|I^c|}. \end{aligned}$$

We shall use the following lemma:

Lemma 4.9

For every \(k\geqslant 1,\)

$$\begin{aligned} \sum _{1\leqslant q\leqslant T} \sum _{1\leqslant \ell \leqslant q} N(q,\ell )^k\ll T^{k+1}(\log T)^{\nu _k} \end{aligned}$$

where \(\nu _1=1\) and \(\nu _k=0\) when \(k\geqslant 2\).

Proof

We observe that the sum of \(N(q,\ell )^k\) over \(1\leqslant \ell \leqslant q\) is equal to the number of solutions \((p_1,\ldots ,p_k,r_1,\ldots , r_k,\ell )\) of the system of equations

$$\begin{aligned} q r_1-\ell p_1= 0,\ldots , q r_k-\ell p_k= 0 \end{aligned}$$
(4.9)

satisfying

$$\begin{aligned} |p_1|,\ldots ,|p_k|\leqslant (c_0^{-n}\vartheta +n)q\quad \hbox {and}\quad 1\leqslant \ell \leqslant q. \end{aligned}$$

We order these solutions according to \(d:=\gcd (q,\ell )\). Let \(q=q'd\) and \(\ell =\ell 'd\). Then \(q'\) and \(\ell '\) are coprime, and the system (4.9) is equivalent to

$$\begin{aligned} q' r_1-\ell ' p_1= 0,\ldots , q' r_k-\ell ' p_k= 0. \end{aligned}$$
(4.10)

Because of coprimality, each \(p_i\) have to be divisible by \(q'\), so that the number of such \(p_i\)’s is at most \(O(q/q')=O(d)\). We note that given d the number of possible choices for \(\ell \) is at most q / d, and \((p_1,\ldots ,p_k,\ell )\) uniquely determine \((r_1,\ldots ,r_k)\). Hence, the number of solutions of (4.9) is estimated by

$$\begin{aligned} \ll \sum _{d|q} (q/d)d^k=q\sigma _{k-1}(q), \end{aligned}$$

where \(\sigma _{k-1}(q)=\sum _{d|q} d^{k-1}\). Writing \(q=q'd\), we conclude that

$$\begin{aligned} \sum _{1\leqslant q\leqslant T} \sum _{1\leqslant \ell \leqslant q} N(q,\ell )^k\ll & {} \sum _{1\leqslant q\leqslant T} q\sigma _{k-1}(q)\leqslant T\sum _{1\leqslant q'\leqslant T} \sum _{q=1}^{\lfloor T/q'\rfloor } d^{k-1}\\\ll & {} T^{k+1}\sum _{1\leqslant q'\leqslant T} (q')^{-k}\ll T^{k+1}(\log T)^{\nu _k}. \end{aligned}$$

This proves the lemma. \(\square \)

Remark 4.10

A similar estimate in the case \(k = 1\) was proved by Schmidt in [16].

A simple modification of this argument also gives that

$$\begin{aligned} \sum _{1\leqslant |q|\leqslant T} \sum _{1\leqslant |\ell |\leqslant |q|} N(q,\ell )^k\ll T^{k+1}(\log T)^{\nu _k}. \end{aligned}$$

Hence, it follows that

$$\begin{aligned} \sum _{\upsilon _1\, e^{s}\leqslant \Vert \overline{q}\Vert ,\Vert \overline{\ell }\Vert \leqslant \upsilon _2\, e^s}^* N(q_1,\ell _1)^{|I^c|}\ll (e^s)^{n+|I^c|}(1+s)^{\nu (I)}, \end{aligned}$$

where \(\nu (I)=1\) when \(|I^c|=1\) and \(\nu (I)=0\) otherwise. Thus, the sum (4.8) is estimated as

$$\begin{aligned} \ll 1+ \sum _{I\subsetneq \{1,\ldots ,m\}} e^{-s w(I)} (1+s)^{\nu (I)}. \end{aligned}$$

The terms in this sum are uniformly bounded unless \(I=\emptyset \) and \(|I^c|=1\), namely, when \(m>1\). When \(m=1\), we obtain the bound \(O(1+s)\). This proves the proposition. \(\square \)

4.3 Truncated Siegel transform

The Siegel transform of a compactly supported function is typically unbounded on \({\mathcal {X}}\); to deal with this complication, it is natural to approximate \({\hat{f}}\) by compactly supported functions on \({\mathcal {X}}\), the so called truncated Siegel transforms, which we shall denote by \({\hat{f}}^{(L)}\). They will be constructed using a smooth cut-off function \(\eta _L\), which will be defined in the following lemma.

Lemma 4.11

For every \(c>1,\) there exists a family \((\eta _L)\) in \(C_c^\infty ({\mathcal {X}})\) satisfying : 

$$\begin{aligned} 0\leqslant \eta _L \leqslant 1,\quad \eta _L=1 \quad \hbox {on }\{\alpha \leqslant c^{-1}\, L\},\quad \eta _L=0 \quad \hbox {on }\{\alpha > c\,L\},\quad \Vert \eta _L\Vert _{C^k}\ll 1. \end{aligned}$$

Proof

Let \(\chi _L\) denote the indicator function of the subset \(\{\alpha \leqslant L\} \subset {\mathcal {X}}\), and pick a non-negative \(\phi \in C_c^\infty (G)\) with \(\int _G \phi \, dm_G=1\) and with support in a sufficiently small neighbourhood of identity in G to ensure that for all \(g\in \hbox {supp}(\phi )\) and \(x\in {\mathcal {X}}\),

$$\begin{aligned} c^{-1}\, \alpha (x)\leqslant \alpha (g^{\pm 1}x)\leqslant c\, \alpha (x). \end{aligned}$$

We now define \(\eta _L\) as

$$\begin{aligned} \eta _L(x):=(\phi *\chi _L)(x)=\int _G \phi (g)\chi _L(g^{-1}x)\, dm_G(g). \end{aligned}$$

Since \(\phi \geqslant 0\) and \(\int _G \phi \, dm_G=1\), it is clear that \(0\leqslant \eta _L \leqslant 1\). If \(\alpha (x)\leqslant c^{-1}\, L\), then for \(g \in \hbox {supp}(\phi )\), we have \(\alpha (g^{-1}x) \leqslant L\), so that \(\eta _L(x)=\int _G \phi \, dm_G=1\). If \(\alpha (x)>c\, L\), then for \(g\in \hbox {supp}(\phi )\), we have \(\alpha (g^{-1}x) > L\), so that \(\eta _L(x)=0\).

To prove the last property, we observe that it follows from invariance of \(m_G\) that for a differential operator \(\mathcal {D}_Z\) as in (2.4), we have \(\mathcal {D}_Z\eta _L=(\mathcal {D}_Z\phi )*\chi _L\). Therefore, \(\hbox {supp}(\mathcal {D}_Z\eta _L)\subset \{\alpha \leqslant c\,L\}\) and \(\Vert \mathcal {D}_Z\eta _L\Vert _{C^0}\leqslant \Vert \mathcal {D}_Z\phi \Vert _{L^1(G)}\), whence \(\Vert \eta _L\Vert _{C^k}\ll 1\). \(\square \)

For a bounded function \(f:\mathbb {R}^{m+n}\rightarrow \mathbb {R}\) with compact support, we define the truncated Siegel transform of f as

$$\begin{aligned} {\hat{f}}^{(L)}:={\hat{f}}\cdot \eta _L. \end{aligned}$$

We record some basic properties of this transform that will be used later in the proofs.

Lemma 4.12

For \(f\in C_c^\infty (\mathbb {R}^{m+n}),\) the truncated Siegel transform \({\hat{f}}^{(L)}\) is in \(C_c^\infty ({\mathcal {X}}),\) and it satisfies

$$\begin{aligned}&\displaystyle \left\| {\hat{f}}^{(L)}\right\| _{L^p({\mathcal {X}})}\leqslant \Vert {\hat{f}}\Vert _{L^p({\mathcal {X}})} \ll _{\mathrm {supp}(f),p} \Vert f\Vert _{C^0} \quad \hbox {for all }p<m+n, \end{aligned}$$
(4.11)
$$\begin{aligned}&\displaystyle \left\| {\hat{f}}^{(L)}\right\| _{C^0}\ll _{\mathrm {supp}(f)} L\, \Vert f\Vert _{C^0}, \end{aligned}$$
(4.12)
$$\begin{aligned}&\displaystyle \left\| {\hat{f}}^{(L)}\right\| _{C^k}\ll _{\mathrm {supp}(f)} L\, \Vert f\Vert _{C^k}, \end{aligned}$$
(4.13)
$$\begin{aligned}&\displaystyle \left\| {\hat{f}}-{\hat{f}}^{(L)}\right\| _{L^1({\mathcal {X}})}\ll _{\mathrm {supp}(f),\tau } L^{-\tau }\, \Vert f\Vert _{C^0}\quad \hbox {for all } \tau <m+n-1, \end{aligned}$$
(4.14)
$$\begin{aligned}&\displaystyle \left\| {\hat{f}}-{\hat{f}}^{(L)}\right\| _{L^2({\mathcal {X}})}\ll _{\mathrm {supp}(f),\tau } L^{-(\tau -1)/2}\, \Vert f\Vert _{C^0}\quad \hbox { for all } \tau <m+n-1.\qquad \end{aligned}$$
(4.15)

Moreover,  the implied constants are uniform when \(\text {supp}(f)\) is contained in a fixed compact set.

Proof

It follows from Proposition 4.1 that

$$\begin{aligned} \left| {\hat{f}}^{(L)}\right| \ll _{\mathrm {supp}(f)} \Vert f\Vert _{C^0} \,\alpha \, \eta _L. \end{aligned}$$

Since \(0\leqslant \eta _L\leqslant 1\), (4.11) follows from Proposition 4.2, and the upper bound in (4.12) holds since \(\hbox {supp}(\eta _L)\subset \{\alpha \leqslant cL\}\).

We observe that for a differential operator \(\mathcal {D}_Z\) as in (2.4), we have \(\mathcal {D}_Z({\hat{f}})= \widehat{\mathcal {D}_Z f}\). Hence, we deduce from Proposition 4.1 that

$$\begin{aligned} \left| \mathcal {D}_Z({\hat{f}})\right| \ll _{\mathrm {supp}(f)} \Vert f\Vert _{C^k}\, \alpha . \end{aligned}$$

Since \(\hbox {supp}(\eta _L)\subset \{\alpha \leqslant cL\}\) and \(\Vert \eta _L\Vert _{C^k}\ll 1\), we deduce that

$$\begin{aligned} \left\| {\hat{f}}^{(L)}\right\| _{C^k} \ll _{\mathrm {supp}(f)} L\, \Vert f\Vert _{C^k} \end{aligned}$$

proving (4.13).

To prove (4.14), we observe that since \(0\leqslant \eta _L\leqslant 1\) and \(\eta _L=1\) on \(\{\alpha < c^{-1}L\}\), it follows from Proposition 4.1 that

$$\begin{aligned} \left\| {\hat{f}}-{\hat{f}}^{(L)}\right\| _{L^1({\mathcal {X}})}=\int _{{\mathcal {X}}} |{\hat{f}}|\cdot |1-\eta _L|\, d\mu _{\mathcal {X}}\ll _{\mathrm {supp}(f)} \Vert f\Vert _{C^0}\int _{\{\alpha \geqslant c^{-1}L\}} \alpha \, d\mu _{\mathcal {X}}. \end{aligned}$$

Hence, applying the Hölder inequality with \(1\leqslant p<m+n\) and \(q=(1-1/p)^{-1}\), we deduce that

$$\begin{aligned} \left\| {\hat{f}}-{\hat{f}}^{(L)}\right\| _{L^1({\mathcal {X}})} \ll _{\mathrm {supp}(f)} \Vert f\Vert _{C^0}\, \Vert \alpha \Vert _p\, \mu _{\mathcal {X}}(\{\alpha \geqslant c^{-1}L\})^{1/q}. \end{aligned}$$

Now it follows from Proposition 4.2 that

$$\begin{aligned} \left\| {\hat{f}}-{\hat{f}}^{(L)}\right\| _{L^1({\mathcal {X}})} \ll _{\mathrm {supp}(f),p} \Vert f\Vert _{C^0}\,L^{-(p-1)}, \end{aligned}$$

which proves (4.14). The proof of (4.15) is similar, and we omit the details. \(\square \)

5 CLT for smooth Siegel transforms

Assume that \(f\in C^\infty _c(\mathbb {R}^{m+n})\) satisfies \(f\geqslant 0\) and \(\hbox {supp}(f)\subset \{(x_{m+1},\ldots ,x_{m+n})\ne 0\}\). We shall in this section analyze the asymptotic behavior of the averages

$$\begin{aligned} F_N(y):=\frac{1}{\sqrt{N}} \sum _{s=0}^{N-1} \left( {\hat{f}}(a^s y)-\mu _\mathcal {Y}({\hat{f}}\circ a^s)\right) \quad \hbox { with }y\in \mathcal {Y}, \end{aligned}$$

and prove the following result:

Theorem 5.1

If \(m\geqslant 2\) (and thus \(m+n \geqslant 3)\) and f is as above,  then the variance

$$\begin{aligned} \sigma _f^2:= & {} \sum _{s=-\infty }^\infty \left( \int _{{\mathcal {X}}} ({\hat{f}}\circ a^s){\hat{f}} \, d\mu _{\mathcal {X}}-\mu _{\mathcal {X}}({\hat{f}})^2\right) \\= & {} \zeta (m+n)^{-1} \sum _{s=-\infty }^\infty \sum _{p,q\geqslant 1} \left( \int _{\mathbb {R}^{m+n}} f(pa^sz)f(qz)\,dz+\int _{\mathbb {R}^{m+n}} f(pa^sz)f(-qz)\,dz\right) . \end{aligned}$$

is finite,  and for every \(\xi \in \mathbb {R},\)

$$\begin{aligned} \mu _\mathcal {Y}\left( \{y\in \mathcal {Y}:\, F_N(y)<\xi \}\right) \rightarrow \hbox { Norm}_{\sigma _f}(\xi ) \end{aligned}$$
(5.1)

as \(N\rightarrow \infty \).

The proof of Theorem 5.1 follows the same plan as the proof of Theorem 3.1, but we need to develop an additional approximation argument which involves truncations of the Siegel transform \({\hat{f}}\). This can potentially change the behaviour of the averages \(F_N\), so we will have to take into account possible escapes of masses for the sequences of submanifolds \(a^s\mathcal {Y}\) in \({\mathcal {X}}\).

Throughout the proof, we shall frequently make use of the basic observation that if we approximate \(F_N\) by \({{\tilde{F}}}_N\) in such a way that \(\Vert F_N-{{\tilde{F}}}_N\Vert _{L^1(\mathcal {Y})}\rightarrow 0\), then \(F_N\) and \({{\tilde{F}}}_N\) will have the same convergence in distribution. Each time we apply this observation, the new approximation will depend on some sequence which depend on N; ultimately, we will end up with three different, but interrelated, sequences K(N), L(N) and M(N), which need to be matched. In Sect. 5.3, we will provide explicit choices for these sequences.

Let

$$\begin{aligned} {{\tilde{F}}}_N:=\frac{1}{\sqrt{N}} \sum _{s=M}^{N-1} \left( {\hat{f}}\circ a^s-\mu _\mathcal {Y}({\hat{f}}\circ a^s)\right) \end{aligned}$$
(5.2)

for some \(M=M(N)\rightarrow \infty \) that will be chosen later. We observe that

$$\begin{aligned} \Vert F_N-{\tilde{F}}_N\Vert _{L^1(\mathcal {Y})}\leqslant & {} \frac{1}{\sqrt{N}} \sum _{s=0}^{M-1} \left\| {\hat{f}}\circ a^s-\mu _\mathcal {Y}({\hat{f}}\circ a^s)\right\| _{L^1(\mathcal {Y})}\nonumber \\\leqslant & {} \frac{2M}{\sqrt{N}} \sup _{s\geqslant 0} \int _\mathcal {Y}|{\hat{f}}\circ a^s|\, d\mu _\mathcal {Y}. \end{aligned}$$
(5.3)

and thus, by Proposition 4.6 we see that

$$\begin{aligned} \Vert F_N-{\tilde{F}}_N\Vert _{L^1(\mathcal {Y})}\rightarrow 0\quad \hbox {as }N\rightarrow \infty . \end{aligned}$$

provided

$$\begin{aligned} M=o(N^{1/2}). \end{aligned}$$
(5.4)

It particular, it follows that if (5.1) holds for \({{\tilde{F}}}_N\), then it also holds for \(F_N\), we shall prove the former. In order to simplify notation, let us drop the tilde, and assume from now on that \(F_N\) is given by (5.2).

Given a sequence \(L = L(N)\), which shall be chosen later, we consider the average

$$\begin{aligned} F^{(L)}_N:=\frac{1}{\sqrt{N}} \sum _{s=M}^{N-1} \left( {\hat{f}}^{(L)}\circ a^s-\mu _\mathcal {Y}({\hat{f}}^{(L)}\circ a^s)\right) \end{aligned}$$

defined for the truncated Siegel transforms \({\hat{f}}^{(L)}\) introduced in Sect. 4.3. We have

$$\begin{aligned} \left\| F_N-F^{(L)}_N\right\| _{L^1(\mathcal {Y})}\leqslant & {} \frac{1}{\sqrt{N}} \sum _{s=M}^{N-1} \left\| \left( {\hat{f}}\circ a^s -{\hat{f}}^{(L)}\circ a^s\right) -\mu _\mathcal {Y}\left( {\hat{f}}\circ a^s-{\hat{f}}^{(L)}\circ a^s\right) \right\| _{L^1(\mathcal {Y})}\\\leqslant & {} \frac{2}{\sqrt{N}} \sum _{s=M}^{N-1} \left\| {\hat{f}}\circ a^s -{\hat{f}}^{(L)}\circ a^s\right\| _{L^1(\mathcal {Y})}. \end{aligned}$$

We recall that \({\hat{f}}^{(L)}={\hat{f}}\cdot \eta _L\), \(0\leqslant \eta _L\leqslant 1\), and \(\eta _L(x)=1\) when \(\alpha (x)\leqslant c^{-1}\,L\), so that

$$\begin{aligned} \left\| {\hat{f}}\circ a^s -{\hat{f}}^{(L)}\circ a^s\right\| _{L^1(\mathcal {Y})}= & {} \left\| ({\hat{f}}\circ a^s) (1-\eta _L\circ a^s)\right\| _{L^1(\mathcal {Y})} \\\leqslant & {} \int _{\{\alpha (a^sy)\geqslant c^{-1}\, L\}} |{\hat{f}}(a^sy)|\, d\mu _\mathcal {Y}(y). \end{aligned}$$

Hence, by the Cauchy–Schwarz inequality,

$$\begin{aligned} \left\| {\hat{f}}\circ a^s -{\hat{f}}^{(L)}\circ a^s\right\| _{L^1(\mathcal {Y})} \leqslant \left\| {\hat{f}}\circ a^s \right\| _{L^2(\mathcal {Y})}\, \mu _\mathcal {Y}\left( \left\{ y\in \mathcal {Y}:\, \alpha (a^sy)\geqslant c^{-1}L \right\} \right) ^{1/2}. \end{aligned}$$

Let us now additionally assume that

$$\begin{aligned} M\gg \log L, \end{aligned}$$
(5.5)

so that the assumption of Proposition 4.5 is satisfied when \(s\geqslant M\). This implies that

$$\begin{aligned} \mu _\mathcal {Y}\left( \left\{ y\in \mathcal {Y}:\, \alpha (a^sy)\geqslant c^{-1}L \right\} \right) \ll _p L^{-p}\quad \hbox {for all }p<m+n. \end{aligned}$$

We also recall that by Proposition 4.8 when \(m\geqslant 2\),

$$\begin{aligned} \sup _{s\geqslant 0} \left\| {\hat{f}}\circ a^s \right\| _{L^2(\mathcal {Y})}<\infty , \end{aligned}$$

whence, for \(s\geqslant M\),

$$\begin{aligned} \left\| {\hat{f}}\circ a^s -{\hat{f}}^{(L)}\circ a^s\right\| _{L^1(\mathcal {Y})}\ll _p L^{-p/2}, \end{aligned}$$

and thus

$$\begin{aligned} \left\| F_N-F^{(L)}_N\right\| _{L^1(\mathcal {Y})}\ll _p N^{1/2} L^{-p/2}\quad \hbox {for all }p<m+n. \end{aligned}$$
(5.6)

If we now choose \(L=L(N)\rightarrow \infty \) so that

$$\begin{aligned} N=o\left( L^p\right) \quad \hbox {for some }p<m+n, \end{aligned}$$
(5.7)

then it follows that

$$\begin{aligned} \left\| F_N-F^{(L)}_N\right\| _{L^1(\mathcal {Y})}\rightarrow 0\quad \hbox {as } N\rightarrow \infty . \end{aligned}$$

Hence, if we can show Theorem 5.1 for the averages \(F^{(L)}_N\) with the parameter constraints above, then it would also hold for \(F_N\). In order to prove CLT for \((F^{(L)}_N)\), we follow the route of Proposition 3.4 and estimate cumulants and \(L^2\)-norms of the sequence.

5.1 Estimating cumulants

We set

$$\begin{aligned} \psi ^{(L)}_s(y):={\hat{f}}^{(L)}(a^sy)-\mu _\mathcal {Y}({\hat{f}}^{(L)}\circ a^s). \end{aligned}$$

Our aim is to estimate

$$\begin{aligned} \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}\left( F_N^{(L)}\right) =\frac{1}{N^{r/2}} \sum _{s_1,\ldots ,s_r=M}^{N-1} \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}\left( \psi ^{(L)}_{s_1},\ldots ,\psi ^{(L)}_{s_r}\right) \end{aligned}$$
(5.8)

when \(r\geqslant 3\). The argument proceeds as in Sect. 3.2, but we have to refine the previous estimates to take into account the dependence on the parameters L and M. Using the notation from Sect. 3.2, we have the decomposition

$$\begin{aligned} \{M,\ldots ,N-1\}^r =\Omega (\beta _{r+1};M,N) \cup \left( \bigcup _{j=0}^{r} \bigcup _{|\mathcal {Q}| \geqslant 2} \Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};M,N) \right) ,\qquad \end{aligned}$$
(5.9)

where

$$\begin{aligned} \Omega (\beta _{r+1};M,N):= & {} \{M,\ldots ,N-1\}^r \cap \Delta (\beta _{r+1}),\\ \Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};M,N):= & {} \{M,\ldots ,N-1\}^r \cap \Delta _{\mathcal {Q}}(\alpha _j,\beta _{j+1}). \end{aligned}$$

We decompose the sum into the sums over \(\Omega (\beta _{r+1};M,N)\) and \(\Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};M,N)\). Let us choose M so that

$$\begin{aligned} M> \beta _{r+1}. \end{aligned}$$
(5.10)

Then \(\Omega (\beta _{r+1};M,N)=\emptyset \), and does not contribute to our estimates.

5.1.1 Case 1: Summing over \((s_1,\ldots ,s_r)\in \Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};M,N)\) with \(\mathcal {Q}=\{\{0\},\{1,\ldots ,r\}\}\)

In this case, we shall show that

$$\begin{aligned} \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}\left( \psi ^{(L)}_{s_1},\ldots ,\psi ^{(L)}_{s_r}\right) \approx \hbox {Cum}_{\mu _{\mathcal {X}}}^{(r)}\left( \phi ^{(L)}\circ a^{s_1},\ldots ,\phi ^{(L)}\circ a^{s_r}\right) \end{aligned}$$
(5.11)

where \(\phi ^{(L)}:= {\hat{f}}^{(L)}-\mu _{\mathcal {X}}({\hat{f}}^{(L)})\). This reduces to estimating the integrals

$$\begin{aligned}&\int _{\mathcal {Y}} \left( \prod _{i\in I} \psi ^{(L)}_{s_i}\right) \,d\mu _\mathcal {Y}\nonumber \\&\quad = \sum _{J\subset I} (-1)^{|I\backslash J|} \left( \int _{\mathcal {Y}} \left( \prod _{i\in J} {\hat{f}}^{(L)}\circ a^{s_i}\right) \,d\mu _\mathcal {Y}\right) \prod _{i\in I\backslash J} \left( \int _{\mathcal {Y}} ({\hat{f}}^{(L)}\circ a^{s_i})\,d\mu _\mathcal {Y}\right) .\nonumber \\ \end{aligned}$$
(5.12)

If \((s_1,\ldots ,s_r)\in \Omega _\mathcal {Q}(\alpha _j,\beta _{j+1};N)\), and thus

$$\begin{aligned} |s_{i_1}-s_{i_2}|\leqslant \alpha _j \quad \hbox {and}\quad s_{i_1}\geqslant \beta _{j+1}\quad \quad \hbox {for all }1\leqslant i_1,i_2\leqslant r, \end{aligned}$$

it follows from Corollary 3.3 with \(r=1\) that there exists \(\delta >0\) such that

$$\begin{aligned} \int _{\mathcal {Y}} ({\hat{f}}^{(L)}\circ a^{s_i})\,d\mu _\mathcal {Y}=\mu _{{\mathcal {X}}}\left( {\hat{f}}^{(L)}\right) +O\left( e^{-\delta \beta _{j+1} }\,\left\| {\hat{f}}^{(L)}\right\| _{C^k}\right) . \end{aligned}$$
(5.13)

For a fixed \(J \subset I\), we define

$$\begin{aligned} \Phi ^{(L)}:=\prod _{i\in J} {\hat{f}}^{(L)}\circ a^{s_i-s_1}, \end{aligned}$$

and note that for some \(\xi =\xi (m,n,k)>0\), we have

$$\begin{aligned} \left\| \Phi ^{(L)}\right\| _{C^k}\ll \prod _{i\in J} \left\| {\hat{f}}^{(L)}\circ a^{s_i-s_1}\right\| _{C^k} \ll e^{|J|\xi \, \alpha _j}\,\left\| {\hat{f}}^{(L)}\right\| _{C^k}^{|J|}. \end{aligned}$$

If we again apply Corollary 3.3 to the function \(\Phi ^{(L)}\), we obtain

$$\begin{aligned}&\int _{\mathcal {Y}} \left( \prod _{i\in J} {\hat{f}}^{(L)}\circ a^{s_i}\right) \,d\mu _\mathcal {Y}=\int _{\mathcal {Y}} (\Phi ^{(L)}\circ a^{s_1})\,d\mu _\mathcal {Y}\nonumber \\&\quad =\int _{{\mathcal {X}}}\Phi ^{(L)}\, d\mu _{\mathcal {X}}+O\left( e^{-\delta \beta _{j+1} }\,\left\| \Phi ^{(L)}\right\| _{C^k}\right) \nonumber \\&\quad =\int _{{\mathcal {X}}}\left( \prod _{i\in J} {\hat{f}}^{(L)}\circ a^{s_i}\right) \, d\mu _{\mathcal {X}}+O\left( e^{-\delta \beta _{j+1} } e^{r\xi \, \alpha _j}\,\left\| {\hat{f}}^{(L)}\right\| _{C^k}^{|J|}\right) , \end{aligned}$$
(5.14)

where we used that \(\mu _{\mathcal {X}}\) is invariant under the transformation a. Let us now choose the exponents \(\alpha _j\) and \(\beta _{j+1}\) so that \(\delta \beta _{j+1}-r\xi \alpha _j>0\). Combining (5.12), (5.13) and (5.14), we deduce that

$$\begin{aligned} \int _{\mathcal {Y}} \left( \prod _{i\in I} \psi ^{(L)}_{s_i}\right) \,d\mu _\mathcal {Y}= & {} \sum _{J\subset I} (-1)^{|I\backslash J|} \left( \int _{{\mathcal {X}}} \left( \prod _{i\in J} {\hat{f}}^{(L)}\circ a^{s_i}\right) \,d\mu _{\mathcal {X}}\right) \mu _{{\mathcal {X}}}\left( {\hat{f}}^{(L)}\right) ^{|I\backslash J|}\nonumber \\&+\,O\left( e^{-\delta \beta _{j+1} } e^{r\xi \, \alpha _j}\,\left\| {\hat{f}}^{(L)}\right\| _{C^k}^{|I|}\right) \nonumber \\= & {} \int _{{\mathcal {X}}} \prod _{i\in I} \left( {\hat{f}}^{(L)}\circ a^{s_i}-\mu _{\mathcal {X}}({\hat{f}}^{(L)})\right) \,d\mu _{\mathcal {X}}\nonumber \\&+\,O\left( e^{-(\delta \beta _{j+1}-r\xi \alpha _j )}\,\left\| {\hat{f}}^{(L)}\right\| _{C^k}^{|I|}\right) , \end{aligned}$$
(5.15)

and thus, for any partition \(\mathcal {P}\),

$$\begin{aligned} \prod _{I\in \mathcal {P}}\int _{\mathcal {Y}} \left( \prod _{i\in I} \psi ^{(L)}_{s_i}\right) \,d\mu _\mathcal {Y}= & {} \prod _{I\in \mathcal {P}}\int _{{\mathcal {X}}} \left( \prod _{i\in I} \phi ^{(L)}\circ a^{s_i}\right) \,d\mu _{\mathcal {X}}\\&+\,O\left( e^{-(\delta \beta _{j+1}-r\xi \alpha _j )}\,\left\| {\hat{f}}^{(L)}\right\| _{C^k}^{r}\right) , \end{aligned}$$

and consequently,

$$\begin{aligned} \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}\left( \psi ^{(L)}_{s_1},\ldots ,\psi ^{(L)}_{s_r}\right) =&\;\hbox {Cum}_{\mu _{\mathcal {X}}}^{(r)}\left( \phi ^{(L)}\circ a^{s_1},\ldots ,\phi ^{(L)}\circ a^{s_r}\right) \nonumber \\&+ O\left( e^{-(\delta \beta _{j+1}-r\xi \, \alpha _j)}\,\left\| {\hat{f}}^{(L)}\right\| _{C^k}^{r}\right) \end{aligned}$$
(5.16)

whenever \((s_1,\ldots ,s_r)\in \Omega _\mathcal {Q}(\alpha _j,\beta _{j+1};M,N)\) with \(\mathcal {Q}=\{\{0\},\{1,\ldots ,r\}\}\), from which (5.11) follows.

We now claim that

$$\begin{aligned}&\left| \hbox {Cum}_{\mu _{\mathcal {X}}}^{(r)}\left( \phi ^{(L)}\circ a^{s_1},\ldots ,\phi ^{(L)}\circ a^{s_r}\right) \right| \nonumber \\&\quad \ll _{f} \left\| {\hat{f}}^{(L)}\right\| _{C^0}^{(r-(m+n-1))^+} \left\| {\hat{f}}^{(L)} \right\| _{L^{m+n-1}({\mathcal {X}})}^{\min (r,m+n-1)}, \end{aligned}$$
(5.17)

where we use the notation \(x^+=\max (x,0)\). The implied constant in (5.17) and below in the proof depend only on \(\hbox {supp}(f)\). By the definition of the cumulant, to prove (5.17), it suffices to show that for every \(z\geqslant 1\) and indices \(i_1,\ldots ,i_z\),

$$\begin{aligned}&\int _{{\mathcal {X}}} \left| \left( \phi ^{(L)}\circ a^{s_{i_1}}\right) \cdots \left( \phi ^{(L)}\circ a^{s_{i_z}}\right) \right| \, d\mu _{\mathcal {X}}\nonumber \\&\qquad \ll _f \left\| {\hat{f}}^{(L)}\right\| _{C^0}^{(z-(m+n-1))^+} \left\| {\hat{f}}^{(L)} \right\| _{L^{m+n-1}({\mathcal {X}})}^{\min (z,m+n-1)}. \end{aligned}$$
(5.18)

Using the generalized Hölder inequality, we deduce that when \(z\leqslant m+n-1\),

$$\begin{aligned}&\int _{{\mathcal {X}}} \left| \left( \phi ^{(L)}\circ a^{s_{i_1}}\right) \cdots \left( \phi ^{(L)}\circ a^{s_{i_z}}\right) \right| \, d\mu _{\mathcal {X}}\\&\quad \leqslant \left\| \phi ^{(L)}\circ a^{s_{i_1}}\right\| _{L^{m+n-1}({\mathcal {X}})}\cdots \left\| \phi ^{(L)}\circ a^{s_{i_z}}\right\| _{L^{m+n-1}({\mathcal {X}})}\\&\quad \ll \left\| {\hat{f}}^{(L)}\right\| _{L^{m+n-1}({\mathcal {X}})}^z. \end{aligned}$$

Also when \(z>m+n-1\),

$$\begin{aligned}&\int _{{\mathcal {X}}} \left| \left( \phi ^{(L)}\circ a^{s_{i_1}}\right) \cdots \left( \phi ^{(L)}\circ a^{s_{i_z}}\right) \right| \, d\mu _{\mathcal {X}}\\&\quad \leqslant \left\| \phi ^{(L)}\right\| _{C^0}^{z-(m+n-1)} \int _{{\mathcal {X}}} \left| \left( \phi ^{(L)}\circ a^{s_{i_1}}\right) \cdots \left( \phi ^{(L)}\circ a^{s_{i_{m+n-1}}}\right) \right| \, d\mu _{\mathcal {X}}\\&\quad \ll _f \left\| {\hat{f}}^{(L)}\right\| _{C^0}^{z-(m+n-1)} \left\| {\hat{f}}^{(L)}\right\| _{L^{m+n-1}({\mathcal {X}})}^{m+n-1}. \end{aligned}$$

This implies (5.18) and (5.17).

Finally, we recall that if \((s_1,\ldots ,s_r)\in \Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};M,N)\) with \(\mathcal {Q}=\{\{0\},\{1,\ldots ,r\}\}\), then we have \(|s_{i_1}-s_{i_2}|\leqslant \alpha _j\) for all \(i_1\ne i_2\), and thus

$$\begin{aligned} |\Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};M,N)|\ll N\alpha _j^{r-1}. \end{aligned}$$
(5.19)

Combining (5.17) and (5.19), we conclude that

$$\begin{aligned}&\frac{1}{N^{r/2}} \sum _{(s_1,\ldots ,s_r)\in \Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};M,N)}\left| \hbox {Cum}_{\mu _{\mathcal {X}}}^{(r)}\left( \phi ^{(L)}\circ a^{s_1},\ldots ,\phi ^{(L)}\circ a^{s_r}\right) \right| \\&\quad \ll _f\; N^{1-r/2} \alpha _j^{r-1} \left\| {\hat{f}}^{(L)}\right\| _{C^0}^{(r-(m+n-1))^+} \left\| {\hat{f}}^{(L)} \right\| _{L^{m+n-1}({\mathcal {X}})}^{\min (r,m+n-1)}. \end{aligned}$$

Hence, it follows from (5.16) that

$$\begin{aligned}&\frac{1}{N^{r/2}} \sum _{(s_1,\ldots ,s_r)\in \Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};M,N)}\hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}\left( \psi _{s_1}^{(L)},\ldots ,\psi _{s_r}^{(L)}\right) \\&\quad \ll _f \; N^{r/2}\, e^{-(\delta \beta _{j+1}- r\alpha _j\xi )}\, \left\| {\hat{f}}^{(L)}\right\| _{C^k}^r\\&\qquad +\,N^{1-r/2} \alpha _j^{r-1} \left\| {\hat{f}}^{(L)}\right\| _{C^0}^{(r-(m+n-1))^+} \left\| {\hat{f}}^{(L)} \right\| _{L^{m+n-1}({\mathcal {X}})}^{\min (r,m+n-1)} \end{aligned}$$

and using Lemma 4.12, we deduce that

$$\begin{aligned}&\frac{1}{N^{r/2}} \sum _{(s_1,\ldots ,s_r)\in \Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};M,N)}\hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}\left( \psi _{s_1}^{(L)},\ldots ,\psi _{s_r}^{(L)}\right) \\&\quad \ll _f \; N^{r/2}\, e^{-(\delta \beta _{j+1}- r\alpha _j\xi )}\, L^r \Vert f\Vert _{C^k} +N^{1-r/2} \alpha _j^{r-1} L^{(r-(m+n-1))^+}\Vert f\Vert _{C^0}^r. \end{aligned}$$

5.1.2 Case 2: Summing over \((s_1,\ldots ,s_r)\in \Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};M,N)\) with \(|\mathcal {Q}|\geqslant 2\) and \(\mathcal {Q}\ne \{\{0\},\{1,\ldots ,r\}\}\)

In this case, the estimate (3.24) gives for \((s_1,\ldots , s_r)\in \Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};M,N)\),

$$\begin{aligned} \left| \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(\psi ^{(L)}_{s_1},\ldots ,\psi ^{(L)}_{s_r})\right| \ll e^{-(\delta \beta _{j+1} - r\xi \alpha _j)} \,\left\| {\hat{f}}^{(L)}\right\| _{C^k}^{r}, \end{aligned}$$

and

$$\begin{aligned} \sum _{(s_1,\ldots , s_r)\in \Omega _{\mathcal {Q}}(\alpha _j,\beta _{j+1};M,N)} \left| \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(\psi ^{(L)}_{s_1},\ldots ,\psi ^{(L)}_{s_r})\right|\ll & {} N^{r/2}e^{-(\delta \beta _{j+1} - r\xi \alpha _j)} \,\left\| {\hat{f}}^{(L)}\right\| _{C^k}^{r}\\\ll & {} N^{r/2}e^{-(\delta \beta _{j+1} - r\xi \alpha _j)} L^r\,\Vert f\Vert _{C^k}^r, \end{aligned}$$

where we used Lemma 4.12.

Finally, we combine the established bounds to estimate \(\hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(F_N^{(L)})\). We choose the parameters \(\alpha _j\) and \(\beta _j\) as in (3.27). Then \(\beta _{r+1}\ll _r \gamma \). In particular, we may choose

$$\begin{aligned} M\gg _r \gamma \end{aligned}$$
(5.20)

to guarantee that (5.10) is satisfied. With these choices of \(\alpha _j\) and \(\beta _j\), we obtain the estimate

$$\begin{aligned} \left| \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(F^{(L)}_N)\right| \ll _f N^{r/2} e^{-\delta \gamma } L^{r} \Vert f\Vert _{C^{k}}^r +N^{1-r/2} \gamma ^{r-1} L^{(r-(m+n-1))^+} \Vert f\Vert _{C^0}^r.\nonumber \\ \end{aligned}$$
(5.21)

We observe that since \(m\geqslant 2\),

$$\begin{aligned} \frac{(r-(m+n-1))^+}{m+n}<r/2-1 \quad \hbox {for all }r\geqslant 3, \end{aligned}$$

Hence, we can choose \(q>1/(m+n)\) such that

$$\begin{aligned} q(r-(m+n-1))^+<r/2-1 \quad \hbox {for all }r\geqslant 3. \end{aligned}$$

Then we select

$$\begin{aligned} L=N^q, \end{aligned}$$

so that, in particular, the condition (5.7) is satisfied. Now (5.21) can be rewritten as

$$\begin{aligned} \left| \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(F^{(L)}_N)\right| \ll _f N^{r/2+rq} e^{-\delta \gamma }\, \Vert f\Vert _{C^{k}}^r +N^{q(r-(m+n-1))^+-(r/2-1)} \gamma ^{r-1}\, \Vert f\Vert _{C^0}^r.\nonumber \\ \end{aligned}$$
(5.22)

Choosing \(\gamma \) of the form

$$\begin{aligned} \gamma =c_r(\log N) \end{aligned}$$

with sufficiently large \(c_r>0\), we conclude that

$$\begin{aligned} \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(F^{(L)}_N)\rightarrow 0\quad \hbox {as } N\rightarrow \infty \end{aligned}$$

for all \(r\geqslant 3\).

5.2 Estimating variances

We now turn to the analysis of the variances of the average \(F_N^{(L)}\) which are given by

$$\begin{aligned} \left\| F^{(L)}_N\right\| _{L^2(\mathcal {Y})}^2 =\frac{1}{N}\sum _{s_1=M}^{N-1}\sum _{s_2=M}^{N-1} \int _{\mathcal {Y}} \psi ^{(L)}_{s_1}\psi ^{(L)}_{s_2}\, d\mu _\mathcal {Y}. \end{aligned}$$

We proceed as in Sect. 3.3 taking into account dependence on parameters M and L. We observe that this expression is symmetric with respect to \(s_1\) and \(s_2\), writing \(s_1=s+t\) and \(s_2=t\) with \(0\leqslant s\leqslant N-M-1\) and \(M\leqslant t \leqslant N-s-1\), we obtain that

$$\begin{aligned} \left\| F^{(L)}_N\right\| _{L^2(\mathcal {Y})}^2 =&\Theta ^{(L)}_N(0) +2\sum _{s=1}^{N-M-1}\Theta ^{(L)}_N(s), \end{aligned}$$
(5.23)

where

$$\begin{aligned} \Theta ^{(L)}_N(s):=\frac{1}{N}\sum _{t=M}^{N-1-s} \int _{\mathcal {Y}}\psi ^{(L)}_{s+t}\psi ^{(L)}_t\, d\mu _\mathcal {Y}. \end{aligned}$$

We have

$$\begin{aligned} \int _{\mathcal {Y}}\psi ^{(L)}_{s+t}\psi ^{(L)}_t\, d\mu _\mathcal {Y}= \int _{\mathcal {Y}} ({\hat{f}}^{(L)}\circ a^{s+t})({\hat{f}}^{(L)}\circ a^t)\, d\mu _\mathcal {Y}- \mu _\mathcal {Y}({\hat{f}}^{(L)}\circ a^{s+t})\mu _\mathcal {Y}({\hat{f}}^{(L)}\circ a^t). \end{aligned}$$

To estimate \(\Theta ^{(L)}_N(s)\), we introduce an additional parameter \(K=K(N)\rightarrow \infty \) such that \(K\leqslant M\) (to be specified later) and consider separately the cases when \(s< K\) and when \(s\geqslant K\).

First, we consider the case when \(s\geqslant K\). By Corollary 3.3, we have

$$\begin{aligned} \int _{\mathcal {Y}} ({\hat{f}}^{(L)}\circ a^{s+t})({\hat{f}}^{(L)}\circ a^t)\, d\mu _\mathcal {Y}&=\mu _{{\mathcal {X}}}({\hat{f}}^{(L)})^2 +O\left( e^{-\delta \min (s,t)} \left\| {\hat{f}}^{(L)}\right\| _{C^k}^2 \right) .\nonumber \\ \end{aligned}$$
(5.24)

Also, by Corollary 3.3,

$$\begin{aligned} \int _{\mathcal {Y}} ({\hat{f}}^{(L)}\circ a^t)\, d\mu _\mathcal {Y}&=\mu _{{\mathcal {X}}} ({\hat{f}}^{(L)}) +O\left( e^{-\delta t} \left\| {\hat{f}}^{(L)}\right\| _{C^k} \right) . \end{aligned}$$
(5.25)

Hence, combining (5.24) and (5.25), we deduce that

$$\begin{aligned} \int _{\mathcal {Y}}\psi ^{(L)}_{s+t}\psi ^{(L)}_t\, d\mu _\mathcal {Y}=O \left( e^{-\delta \min (s,t)} \left\| {\hat{f}}^{(L)}\right\| _{C^k}^2 \right) . \end{aligned}$$

Since

$$\begin{aligned} \sum _{s=K}^{N-M-1}\left( \sum _{t=M}^{N-1-s}e^{-\delta \min (s,t)}\right) \leqslant \sum _{s=K}^{N-1}\sum _{t=M}^{N-1} (e^{-\delta s}+e^{-\delta t})\ll N e^{-\delta K}, \end{aligned}$$

we conclude that

$$\begin{aligned} \sum _{s=K}^{N-M-1}\Theta ^{(L)}_N(s) \ll \, e^{-\delta K}\, \left\| {\hat{f}}^{(L)}\right\| _{C^k}^2 \ll _f \, e^{-\delta K} L^{2}\, \Vert f\Vert _{C^k}^2, \end{aligned}$$
(5.26)

where we used Lemma 4.12. The implied constants here and below in the proof depend only on \({{\,\mathrm{supp}\,}}(f)\).

Let us now consider the case \(s< K\). We observe that Corollary 3.3 (for \(r = 1\)) applied to the function \(\phi _s^{(L)}:= ({\hat{f}}^{(L)}\circ a^s){\hat{f}}^{(L)}\) yields,

$$\begin{aligned} \int _{\mathcal {Y}} ({\hat{f}}^{(L)}\circ a^{s+t})({\hat{f}}^{(L)}\circ a^t)\, d\mu _\mathcal {Y}= & {} \int _{\mathcal {Y}} (\phi _s^{(L)}\circ a^t) \, d\mu _\mathcal {Y}\\= & {} \int _{{\mathcal {X}}} \phi _s^{(L)}\, d\mu _{\mathcal {X}}+O\left( e^{-\delta t}\, \left\| \phi _s^{(L)}\right\| _{C^k} \right) . \end{aligned}$$

Furthermore, for some \(\xi =\xi (m,n,k)>0\), we have

$$\begin{aligned} \left\| \phi _s^{(L)}\right\| _{C^k}\ll \left\| {\hat{f}}^{(L)}\circ a^s\right\| _{C^k}\, \left\| {\hat{f}}^{(L)}\right\| _{C^k}\ll e^{\xi s}\, \left\| {\hat{f}}^{(L)}\right\| _{C^k}^2. \end{aligned}$$

Therefore, we deduce that

$$\begin{aligned} \int _{\mathcal {Y}} ({\hat{f}}^{(L)}\circ a^{s+t})({\hat{f}}^{(L)}\circ a^t)\, d\mu _\mathcal {Y}= & {} \int _{{\mathcal {X}}} ({\hat{f}}^{(L)}\circ a^s){\hat{f}}^{(L)} \,d\mu _{\mathcal {X}}\\&+\,O \left( e^{-\delta t} e^{\xi s}\, \left\| {\hat{f}}^{(L)}\right\| _{C^k}^2 \right) . \end{aligned}$$

Combining this estimate with (5.25), we conclude that

$$\begin{aligned} \int _{\mathcal {Y}}\psi ^{(L)}_{s+t}\psi ^{(L)}_t\, d\mu _\mathcal {Y}= & {} \int _{{\mathcal {X}}} ({\hat{f}}^{(L)}\circ a^s){\hat{f}}^{(L)}\, d\mu _{\mathcal {X}}-\mu _{\mathcal {X}}({\hat{f}}^{(L)})^2 \\&+\, O \left( e^{-\delta t} e^{\xi s}\, \left\| {\hat{f}}^{(L)}\right\| _{C^k}^2 \right) . \end{aligned}$$

Hence, setting

$$\begin{aligned} \Theta ^{(L)}_\infty (s):=\int _{{\mathcal {X}}} ({\hat{f}}^{(L)}\circ a^s){\hat{f}}^{(L)}\, d\mu _{\mathcal {X}}-\mu _{\mathcal {X}}({\hat{f}}^{(L)})^2, \end{aligned}$$

we obtain that

$$\begin{aligned} \Theta ^{(L)}_N(s)= & {} \frac{N-M-s}{N} \Theta ^{(L)}_\infty (s)+O \left( N^{-1} e^{-\delta M} e^{\xi s}\, \left\| {\hat{f}}^{(L)}\right\| _{C^k}^2 \right) \\= & {} \Theta ^{(L)}_\infty (s) +O \left( N^{-1}(M+s)\left\| {\hat{f}}^{(L)}\right\| ^2_{L^2({\mathcal {X}})}+ N^{-1} e^{-\delta M} e^{\xi s}\, \left\| {\hat{f}}^{(L)}\right\| _{C^k}^2 \right) . \end{aligned}$$

Therefore, using Lemma 4.12, we deduce that

$$\begin{aligned}&\Theta ^{(L)}_N(0)+2\sum _{s=1}^{K-1}\Theta ^{(L)}_N(s) = \, \Theta ^{(L)}_\infty (0)+2\sum _{s=1}^{K-1}\Theta ^{(L)}_\infty (s) \nonumber \\&\quad +\,O_f \left( N^{-1}(M+K)K\,\Vert f\Vert ^2_{C^0}+ N^{-1} e^{-\delta M} e^{\xi K} L^{2}\,\Vert f\Vert _{C^k}^2\right) . \end{aligned}$$
(5.27)

Combining (5.26) and (5.27), we conclude that

$$\begin{aligned}&\Theta ^{(L)}_N(0)+2\sum _{s=1}^{M-N-1}\Theta ^{(L)}_N(s)\nonumber \\&\quad = \, \Theta ^{(L)}_\infty (0)+2\sum _{s=1}^{K-1}\Theta ^{(L)}_\infty (s) \nonumber \\&\qquad +\, O_f \left( N^{-1}(M+K)K \, \Vert f\Vert ^2_{C^0}+ (N^{-1} e^{-\delta M} e^{\xi K}+e^{-\delta K}) L^{2}\,\Vert f\Vert _{C^k}^2 \right) .\qquad \quad \end{aligned}$$
(5.28)

We choose the parameters \(K=K(N)\), \(M=M(N)\), and \(L=L(N)\) so that

$$\begin{aligned} e^{-\delta K} L^{2} \rightarrow 0, \end{aligned}$$
(5.29)
$$\begin{aligned} N^{-1} e^{-\delta M} e^{\xi K} L^{2} \rightarrow 0, \end{aligned}$$
(5.30)
$$\begin{aligned} N^{-1}(M+K)K \rightarrow 0 \end{aligned}$$
(5.31)

as \(N\rightarrow \infty \). Then

$$\begin{aligned} \left\| F^{(L)}_N\right\| _{L^2(\mathcal {Y})}^2 = \Theta ^{(L)}_\infty (0)+2\sum _{s=1}^{K-1}\Theta ^{(L)}_\infty (s)+o(1). \end{aligned}$$

Next, we shall show that with a suitable choice of parameters,

$$\begin{aligned} \left\| F^{(L)}_N\right\| _{L^2(\mathcal {Y})}^2 = \Theta _\infty (0)+2\sum _{s=1}^{K-1}\Theta _\infty (s) +o(1), \end{aligned}$$
(5.32)

where

$$\begin{aligned} \Theta _\infty (s):=\int _{{\mathcal {X}}} ({\hat{f}}\circ a^s){\hat{f}}\, d\mu _{\mathcal {X}}-\mu _{\mathcal {X}}({\hat{f}})^2. \end{aligned}$$

We recall that by Lemma 4.12, for all \(\tau <m+n-1\),

$$\begin{aligned} \left\| {\hat{f}}- {\hat{f}}^{(L)}\right\| _{L^1({\mathcal {X}})}\ll _{f,\tau } L^{-\tau }\, \Vert f\Vert _{C^0}\quad \hbox {and}\quad \left\| {\hat{f}}- {\hat{f}}^{(L)}\right\| _{L^2({\mathcal {X}})}\ll _{f,\tau } L^{-(\tau -1)/2}\, \Vert f\Vert _{C^0},\nonumber \\ \end{aligned}$$
(5.33)

where the implied constant depends only on \({{\,\mathrm{supp}\,}}(f)\). It follows from these estimates that

$$\begin{aligned} \mu _{\mathcal {X}}({\hat{f}}^{(L)})= & {} \mu _{\mathcal {X}}({\hat{f}})+O_{f,\tau }(L^{-\tau }\, \Vert f\Vert _{C^0}),\\ \int _{{\mathcal {X}}} ({\hat{f}}^{(L)}\circ a^s){\hat{f}}^{(L)}\, d\mu _{\mathcal {X}}= & {} \int _{{\mathcal {X}}} ({\hat{f}}\circ a^s){\hat{f}}\, d\mu _{\mathcal {X}}+O_{f,\tau }(L^{-(\tau -1)/2}\, \Vert f\Vert ^2_{C^0}), \end{aligned}$$

so that

$$\begin{aligned} \Theta ^{(L)}_\infty (s)=\Theta _\infty (s)+O_{f,\tau }\left( L^{-(\tau -1)/2} \, \Vert f\Vert ^2_{C^0}\right) . \end{aligned}$$
(5.34)

We choose the parameters \(K=K(N)\rightarrow \infty \) and \(L=L(N)\rightarrow \infty \) so that

$$\begin{aligned} K L^{-(\tau -1)/2} \rightarrow 0 \quad \hbox {for some }\tau <m+n-1. \end{aligned}$$
(5.35)

Then (5.32) follows. We conclude that

$$\begin{aligned} \left\| F^{(L)}_N\right\| _{L^2(\mathcal {Y})}^2 \rightarrow \Theta _\infty (0)+2\sum _{s=1}^{\infty }\Theta _\infty (s) \end{aligned}$$
(5.36)

as \(N\rightarrow \infty \).

Finally, we compute \(\Theta _\infty (s)\) by using Rogers formula (Proposition 4.4) applied to the function

$$\begin{aligned} F_s(\overline{z}_1,\overline{z}_2):=\sum _{p,q\geqslant 1} f(p a^s\overline{z}_1)f(q \overline{z}_2),\quad (\overline{z}_1,\overline{z}_2)\in \mathbb {R}^{m+n}\times \mathbb {R}^{m+n}. \end{aligned}$$

Since

$$\begin{aligned} \int _{{\mathcal {X}}} ({\hat{f}}\circ a^s){\hat{f}}\, d\mu _{\mathcal {X}}= \int _{{\mathcal {X}}} \left( \sum _{\overline{z}_1,\overline{z}_2\in (P(\mathbb {Z}^{m+n}))} F_s(g\overline{z}_1,g\overline{z}_2)\right) d\mu _{\mathcal {X}}(g\Gamma ), \end{aligned}$$

we deduce that

$$\begin{aligned}&\int _{{\mathcal {X}}} ({\hat{f}}\circ a^s){\hat{f}}\, d\mu _{\mathcal {X}}=\left( \int _{\mathbb {R}^{m+n}} f(\overline{z})\, d\overline{z} \right) ^2\\&\quad +\, \zeta (m+n)^{-1}\sum _{p,q\geqslant 1} \left( \int _{\mathbb {R}^{m+n}} f(pa^s\overline{z})f(q\overline{z})\,d\overline{z} +\int _{\mathbb {R}^{m+n}} f(pa^s\overline{z})f(-q\overline{z})\,d\overline{z}\right) . \end{aligned}$$

Since by the Siegel Mean Value Theorem (Proposition 4.3),

$$\begin{aligned} \int _{{\mathcal {X}}}{\hat{f}} \, d\mu _{\mathcal {X}}= \int _{\mathbb {R}^{m+n}} f(\overline{z})\, d\overline{z}, \end{aligned}$$

we conclude that

$$\begin{aligned} \Theta _\infty (s)=\zeta (m+n)^{-1}\sum _{p,q\geqslant 1} \left( \int _{\mathbb {R}^{m+n}} f(pa^s\overline{z})f(q\overline{z})\,dz+\int _{\mathbb {R}^{m+n}} f(pa^s\overline{z})f(-q\overline{z})\,d\overline{z}\right) . \end{aligned}$$

Finally, we show that the sum in (5.36) is finite. We represent points \(\overline{z}\in \mathbb {R}^{m+n}\) as \(\overline{z}=(\overline{x},\overline{y})\) with \(\overline{x}\in \mathbb {R}^{m}\) and \(\overline{y}\in \mathbb {R}^{n}\). Since f is bounded, and its compact support is contained in \(\{\overline{y}\ne 0\}\), we may assume without loss of generality that f is the characteristic function of the set

$$\begin{aligned} \{(\overline{x},\overline{y})\in \mathbb {R}^{m+n}:\,\upsilon _1\leqslant \Vert \overline{y}\Vert \leqslant \upsilon _2,\quad |x_i|\leqslant \vartheta \, \Vert \overline{y}\Vert ^{-w_i},\; i=1,\ldots ,m\} \end{aligned}$$

with \(0<\upsilon _1<\upsilon _2\) and \(\vartheta >0\). Let

$$\begin{aligned} \Omega _s(p):=\left\{ (\overline{x},\overline{y})\in \mathbb {R}^{m+n}:\,\frac{\upsilon _1\, e^s}{p}\leqslant \Vert \overline{y}\Vert \leqslant \frac{\upsilon _2\, e^s}{p},\quad p^{1+w_i}|x_i|\Vert \overline{y}\Vert ^{w_i}\leqslant \vartheta ,\; i=1,\ldots ,m \right\} . \end{aligned}$$

Then

$$\begin{aligned} \int _{\mathbb {R}^{m+n}} f(pa^s\overline{z})f(\pm q\overline{z})\,d\overline{z} =\hbox {vol}\left( \Omega _s(p)\cap \Omega _0(q)\right) . \end{aligned}$$

Setting

$$\begin{aligned} I(u):=\left\{ \overline{y}\in \mathbb {R}^n:\, \upsilon _1\, u\leqslant \Vert \overline{y}\Vert \leqslant \upsilon _2 \,u\right\} , \end{aligned}$$

we obtain that

$$\begin{aligned} \hbox {vol}\left( \Omega _s(p)\cap \Omega _0(q)\right)= & {} \int _{I(e^sp^{-1})\cap I(q^{-1})} \left( \prod _{i=1}^m 2\vartheta \max (p,q)^{-1-w_i}\Vert \overline{y}\Vert ^{-w_i}\right) \, d\overline{y}\\\ll & {} \max (p,q)^{-m-n}\int _{I(e^sp^{-1})\cap I(q^{-1})} \Vert \overline{y}\Vert ^{-n}\, d\overline{y}. \end{aligned}$$

We note that \(I(e^sp^{-1})\cap I(q^{-1})=\emptyset \) unless \((\upsilon _1\upsilon _2^{-1})e^sq\leqslant p\leqslant (\upsilon _2\upsilon _1^{-1})e^sq\), and also that

$$\begin{aligned} \int _{I(q^{-1})} \Vert \overline{y}\Vert ^{-n}\, d\overline{y}\ll \int _{\upsilon _1/q}^{\upsilon _2/q} r^{-1}\, dr\ll 1. \end{aligned}$$

Hence, it follows that

$$\begin{aligned} \sum _{s=1}^\infty \Theta _\infty (s)\ll & {} \sum _{s,q=1}^\infty \sum _{(\upsilon _1\upsilon _2^{-1})e^sq\leqslant p\leqslant (\upsilon _2\upsilon _1^{-1})e^sq} \max (p,q)^{-(m+n)} \\\ll & {} \sum _{s,q=1}^\infty (e^sq)^{-(m+n-1)}<\infty , \end{aligned}$$

because \(m+n\geqslant 3\).

5.3 Proof of Theorem 5.1

As we already remarked above, it is sufficient to show that the sequence of averages \(F_N^{(L)}\) converges in distribution to the normal law. To verify this, we use the Method of Cumulants (Proposition 3.4). It is easy to see that

$$\begin{aligned} \int _{\mathcal {Y}} F_N^{(L)}\, d\mu _\mathcal {Y}=0. \end{aligned}$$

Moreover, with a suitable choice of parameters, we have shown in Sect. 5.1 that for \(r\geqslant 3\),

$$\begin{aligned} {{\,\mathrm{Cum}\,}}_{\mu _\mathcal {Y}}^{(r)}\left( F^{(L)}_N\right) \rightarrow 0\quad \hbox {as } N\rightarrow \infty , \end{aligned}$$

and in Sect. 5.2 that

$$\begin{aligned} \left\| F^{(L)}_N\right\| _{L^2(\mathcal {Y})}^2\rightarrow \sigma _f^2<\infty \quad \hbox {as }N\rightarrow \infty . \end{aligned}$$

Hence, Proposition 3.4 applies, and it remains to verify that we can choose our parameters that satisfy the stated assumptions. We recall that

$$\begin{aligned} L=N^q\quad \hbox {and}\quad \gamma =c_r(\log N). \end{aligned}$$

The parameters \(M=M(N)\geqslant K=K(N)\) need to satisfy the seven conditions (5.4), (5.5), (5.20), (5.29), (5.30), (5.31), (5.35). We take

$$\begin{aligned} K(N)=c_1(\log N) \end{aligned}$$

with sufficiently large \(c_1>0\) so that (5.29) is satisfied. Then taking

$$\begin{aligned} M(N)=(\log N)(\log \log N), \end{aligned}$$

we arrange that (5.5), (5.20), and (5.30) hold for all \(N\geqslant N_0(r)\). We note that the constant \(c_r\) and the implicit constant in (5.20) depends on r, and the \((\log \log N)\)-factor here is added to guarantee that the parameter M is independent of r. Finally, the conditions (5.4), (5.31), (5.35) are immediate from our choices.

6 CLT for counting functions and the proof of Theorem 1.2

We recall from Sect. 1.3 that

$$\begin{aligned} \Delta _T(u)=|\Lambda _u\cap \Omega _T|+O(1), \end{aligned}$$

where \(\Lambda _u\) is defined in (1.8) and the domains \(\Omega _T\) are defined in (1.9). We shall decompose this domain into smaller pieces using the linear map \(a=\hbox {diag}(e^{w_1},\ldots ,e^{w_m},e^{-1},\ldots ,e^{-1})\). We note that for any integer \(N \geqslant 1\),

$$\begin{aligned} \Omega _{e^N}=\bigsqcup _{s=0}^{N-1} a^{-s}\Omega _e, \end{aligned}$$

and thus

$$\begin{aligned} |\Lambda _u\cap \Omega _{e^N}|=\sum _{s=0}^{N-1} {\hat{\chi }}(a^s\Lambda _u), \end{aligned}$$

where \(\chi \) denotes the characteristic function of the set \(\Omega _e\). Hence the proof of Theorem 1.2 reduces to analyzing sums of the form \(\sum _{s=0}^{N-1} {\hat{\chi }}(a^sy)\) with \(y\in \mathcal {Y}\). For this purpose, we define

$$\begin{aligned} F_N:=\frac{1}{\sqrt{N}} \sum _{s=0}^{N-1} \left( {\hat{\chi }}\circ a^s-\mu _\mathcal {Y}({\hat{\chi }}\circ a^s)\right) . \end{aligned}$$
(6.1)

Our main result in this section now reads as follows.

Theorem 6.1

If \(m\geqslant 2,\) then for every \(\xi \in \mathbb {R},\)

$$\begin{aligned} \mu _\mathcal {Y}\left( \{y\in \mathcal {Y}:\, F_N(y)<\xi \}\right) \rightarrow \hbox {Norm}_{\sigma }(\xi ) \end{aligned}$$

as \(N\rightarrow \infty ,\) where

$$\begin{aligned} \sigma ^2:= 2^{m+1}\left( \prod _{i=1}^m \vartheta _i\right) \left( \int _{S^{n-1}} \Vert \overline{z}\Vert ^{-n}\, d\overline{z}\right) \left( \frac{2\zeta (m+n-1)}{\zeta (m+n)}-1\right) . \end{aligned}$$

We approximate \(\chi \) by a family of non-negative functions \(f_\varepsilon \in C^\infty _c(\mathbb {R}^{m+n})\) whose supports are contained in an \(\varepsilon \)-neighbourhood of the set \(\Omega _e\), and

$$\begin{aligned} \chi \leqslant f_\varepsilon \leqslant 1,\quad \Vert f_\varepsilon -\chi \Vert _{L^1(\mathbb {R}^{m+n})}\ll \varepsilon ,\quad \Vert f_\varepsilon -\chi \Vert _{L^2(\mathbb {R}^{m+n})}\ll \varepsilon ^{1/2},\quad \Vert f_\varepsilon \Vert _{C^k}\ll \varepsilon ^{-k}. \end{aligned}$$

This approximation allows us to construct smooth approximations of the Siegel transform \({\hat{\chi }}\) in the following sense.

Proposition 6.2

For every \(s\geqslant 0,\)

$$\begin{aligned} \int _{\mathcal {Y}} \left| {\hat{f}}_\varepsilon \circ a^s-{\hat{\chi \circ }} a^s\right| \, d\mu _\mathcal {Y}\ll \varepsilon +e^{-s}. \end{aligned}$$

Proof

We observe that there exists \(\vartheta _i(\varepsilon )>\vartheta _i\) such that \(\vartheta _i(\varepsilon )=\vartheta _i+O(\varepsilon )\) and \(f_\varepsilon \leqslant \chi _\varepsilon \), where \(\chi _\varepsilon \) denotes the characteristic function of the set

$$\begin{aligned} \left\{ (\overline{x},\overline{y})\in \mathbb {R}^{m+n}:\, 1-\varepsilon \leqslant \Vert \overline{y}\Vert \leqslant e+\varepsilon ,\; |x_i|<\vartheta _i(\varepsilon )\,\Vert \overline{y}\Vert ^{-w_i}\;\hbox { for } i=1,\ldots ,m \right\} . \end{aligned}$$

Then it follows that

$$\begin{aligned} |{\hat{f}}_\varepsilon (a^s\Lambda )-{\hat{\chi }}(a^s\Lambda )|=\sum _{v\in \Lambda \backslash \{0\}} \left( f_\varepsilon (a^s v)-\chi (a^sv)\right) \leqslant \sum _{v\in \Lambda \backslash \{0\}} \left( \chi _\varepsilon (a^s v)-\chi (a^sv)\right) . \end{aligned}$$

It is clear that \(\chi _\varepsilon -\chi \) is bounded by the sum \(\chi _{1,\varepsilon }+\chi _{2,\varepsilon }+\chi _{3,\varepsilon }\) of the characteristic functions of the sets

$$\begin{aligned}&\left\{ (\overline{x},\overline{y})\in \mathbb {R}^{m+n}:\, 1-\varepsilon \leqslant \Vert \overline{y}\Vert \leqslant 1,\; |x_i|<\vartheta _i(\varepsilon )\,\Vert \overline{y}\Vert ^{-w_i}\;\hbox { for } i=1,\ldots ,m\right\} ,\\&\left\{ (\overline{x},\overline{y})\in \mathbb {R}^{m+n}:\, e\leqslant \Vert \overline{y}\Vert \leqslant e+\varepsilon ,\; |x_i|<\vartheta _i(\varepsilon )\,\Vert \overline{y}\Vert ^{-w_i}\;\hbox { for } i=1,\ldots ,m\right\} ,\\&\left\{ (\overline{x},\overline{y})\in \mathbb {R}^{m+n}:\, 1\leqslant \Vert \overline{y}\Vert \leqslant e,\; |x_i|<\vartheta _i(\varepsilon )\,\Vert \overline{y}\Vert ^{-w_i}\;\right. \\&\quad \left. \hbox { for all } i,\; |x_j|\geqslant \vartheta _j \, \Vert \overline{y}\Vert ^{-w_j}\hbox { for some }j \right\} \end{aligned}$$

respectively. In particular, we obtain that

$$\begin{aligned} {\hat{f}}_\varepsilon (a^s\Lambda )-{\hat{\chi }}(a^s\Lambda )\leqslant {\hat{\chi }}_{1,\varepsilon }(a^s\Lambda )+{\hat{\chi }}_{2,\varepsilon }(a^s\Lambda )+{\hat{\chi }}_{3,\varepsilon }(a^s\Lambda ). \end{aligned}$$

Hence, it remains to show that for \(j=1,2,3\),

$$\begin{aligned} \int _{\mathcal {Y}} ({\hat{\chi }}_{j,\varepsilon }\circ a^s)\, d\mu _\mathcal {Y}\ll \varepsilon +e^{-s}. \end{aligned}$$

As in (4.3), we compute that

$$\begin{aligned} \int _{\mathcal {Y}} ({\hat{\chi }}_{1,\varepsilon }\circ a^s)\, d\mu _\mathcal {Y}=\sum _{(1-\varepsilon )e^s\leqslant \Vert \overline{q}\Vert \leqslant e^s} \prod _{i=1}^m \left( \sum _{p_i\in \mathbb {Z}} \int _{[0,1]^n} \chi _{\vartheta _i(\varepsilon )\Vert \overline{q}\Vert ^{-w_i}}\left( p_i+\left<\overline{u}_i,\overline{q}\right>\right) d\overline{u}_i\right) ,\quad \end{aligned}$$
(6.2)

where \(\chi _{\theta }\) denotes the characteristic function of the interval \(\left[ -\theta , \theta \right] \). We observe that

$$\begin{aligned} \int _{[0,1]^n} \chi _{\vartheta _i(\varepsilon )\Vert \overline{q}\Vert ^{-w_i}}(p_i+\left<\overline{u}_i,\overline{q}\right>) d\overline{u}_i \ll ({\max }_k |q_k|)^{-1} \Vert \overline{q}\Vert ^{-w_i}\ll \Vert \overline{q}\Vert ^{-1-w_i}, \end{aligned}$$

and moreover this integral is non-zero only when \(|p_i|=O(\Vert \overline{q}\Vert )\). Hence,

$$\begin{aligned} \sum _{p_i\in \mathbb {Z}} \int _{[0,1]^n} \chi _{\vartheta _i(\varepsilon )\Vert \overline{q}\Vert ^{-w_i} }\left( p_i+\left<\overline{u}_i,\overline{q}\right>\right) d\overline{u}_i \ll \Vert \overline{q}\Vert ^{-w_i}, \end{aligned}$$

and

$$\begin{aligned} \int _{\mathcal {Y}} ({\hat{\chi }}_{1,\varepsilon }\circ a^s)\, d\mu _\mathcal {Y}\ll & {} \sum _{(1-\varepsilon )e^s\leqslant \Vert \overline{q}\Vert \leqslant e^s}\prod _{i=1}^m \Vert \overline{q}\Vert ^{-w_i}\\\ll & {} e^{-ns}\, \left| \left\{ \overline{q}\in \mathbb {Z}^n:\, (1-\varepsilon )e^s\leqslant \Vert \overline{q}\Vert \leqslant e^s\right\} \right| . \end{aligned}$$

The number of integral points in the region \(\{(1-\varepsilon )e^s\leqslant \Vert \overline{y}\Vert \leqslant e^s\}\) can be estimated in terms of its volume. Namely, there exist \(r>0\) (depending only on the norm) such that

$$\begin{aligned} \left| \left\{ \overline{q}\in \mathbb {Z}^n:\, (1-\varepsilon )e^s\leqslant \Vert \overline{q}\Vert \leqslant e^s\right\} \right| \ll \left| \left\{ \overline{y}\in \mathbb {R}^n:\, (1-\varepsilon )e^s-r\leqslant \Vert \overline{y}\Vert \leqslant e^s+r\right\} \right| . \end{aligned}$$

Hence,

$$\begin{aligned} \int _{\mathcal {Y}} ({\hat{\chi }}_{1,\varepsilon }\circ a^s)\, d\mu _\mathcal {Y}\ll & {} e^{-ns}\left( (e^s+r)^n-((1-\varepsilon )e^s-r)^n\right) \\\ll & {} (1+re^{-s})^n-(1-\varepsilon -re^{-s})^n\\\ll & {} \varepsilon +e^{-s}. \end{aligned}$$

The integral for \({\hat{\chi }}_{2,\varepsilon }\circ a^s\) can be estimated similarly.

The integral over \({\hat{\chi }}_{3,\varepsilon }\circ a^s\) as in (6.2) can be written as a sum of the products of the integral

$$\begin{aligned}&\int _{[0,1]^n}\left( \chi _{\vartheta _j(\varepsilon )\Vert \overline{q}\Vert ^{-w_j}}(p_j+\left<\overline{u}_j,\overline{q}\right>)- \chi _{\vartheta _j\Vert \overline{q}\Vert ^{-w_j}}(p_j+\left<\overline{u}_j,\overline{q}\right>)\right) \,d\overline{u}_j \\&\quad \leqslant 2({\max }_k |q_k|)^{-1} (\vartheta _j(\varepsilon )-\vartheta _j) \Vert \overline{q}\Vert ^{-w_j} \ll \varepsilon \Vert \overline{q}\Vert ^{-1-w_j}, \end{aligned}$$

and the integrals

$$\begin{aligned} \int _{[0,1]^n} \chi _{\vartheta _i(\varepsilon )\Vert \overline{q}\Vert ^{-w_i}}(p_i+\left<\overline{u}_i,\overline{q}\right>)\,d\overline{u}_i \leqslant 2\vartheta _i(\varepsilon ) ({\max }_k |q_k|)^{-1} \Vert \overline{q}\Vert ^{-w_i} \ll \Vert \overline{q}\Vert ^{-1-w_i} \end{aligned}$$

with \(i\ne j\). We observe that these integrals are non-zero only when \(|p_j|=O(\Vert \overline{q}\Vert )\) and \(|p_i|=O(\Vert \overline{q}\Vert )\). Hence, we conclude that

$$\begin{aligned} \int _{\mathcal {Y}} ({\hat{\chi }}_{3,\varepsilon }\circ a^s)\, d\mu _\mathcal {Y}\ll \sum _{e^s\leqslant \Vert \overline{q}\Vert \leqslant e^{s+1}} \varepsilon \prod _{i=1}^m \Vert \overline{q}\Vert ^{-w_i} = \varepsilon \sum _{e^s\leqslant \Vert \overline{q}\Vert \leqslant e^{s+1}} \Vert \overline{q}\Vert ^{-n}\ll \varepsilon , \end{aligned}$$

which completes the proof of the proposition. \(\square \)

Now we start with the proof of Theorem 6.1. As in Sect. 5, we modify \(F_N\) and consider instead

$$\begin{aligned} {{\tilde{F}}}_N:=\frac{1}{\sqrt{N}} \sum _{s=M}^{N-1} \left( {\hat{\chi }}\circ a^s-\mu _\mathcal {Y}({\hat{\chi }}\circ a^s)\right) \end{aligned}$$
(6.3)

for a parameter \(M=M(N)\rightarrow \infty \) that will be chosen later. As in (5.3) we obtain that

$$\begin{aligned} \Vert F_N-{\tilde{F}}_N\Vert _{L^1(\mathcal {Y})}\rightarrow 0\quad \hbox {as }N\rightarrow \infty \end{aligned}$$

provided that

$$\begin{aligned} M=o(N^{1/2}). \end{aligned}$$
(6.4)

Hence, if we can prove the CLT for \(({\tilde{F}}_N)\), then the CLT for \((F_N)\) would follow. From now on, to simplify notations, we assume that \(F_N\) is given by (6.3).

Our next step is to exploit the approximation \(\chi \approx f_\varepsilon \), so we introduce

$$\begin{aligned} F^{(\varepsilon )}_N:=\frac{1}{\sqrt{N}} \sum _{s=M}^{N-1} \left( {\hat{f}}_\varepsilon \circ a^s-\mu _\mathcal {Y}({\hat{f}}_\varepsilon \circ a^s)\right) , \end{aligned}$$

where the parameter \(\varepsilon =\varepsilon (N)\rightarrow 0\) will be specified later. We observe that it follows from Proposition 6.2 that

$$\begin{aligned} \left\| F^{(\varepsilon )}_N-F_N\right\| _{L^1(\mathcal {Y})}\leqslant \frac{2}{\sqrt{N}} \sum _{s=M}^{N-1} \left\| {\hat{f}}_\varepsilon \circ a^s-{\hat{\chi }}\circ a^s\right\| _{L^1(\mathcal {Y})}\ll N^{1/2}(\varepsilon +e^{-M}). \end{aligned}$$

We choose \(\varepsilon =\varepsilon (N)\) and \(M=M(N)\) so that

$$\begin{aligned} N^{1/2}\varepsilon \rightarrow 0\quad \hbox {and}\quad N^{1/2} e^{-M}\rightarrow 0. \end{aligned}$$
(6.5)

Then

$$\begin{aligned} \left\| F^{(\varepsilon )}_N-F_N\right\| _{L^1(\mathcal {Y})}\rightarrow 0\quad \hbox {as } N\rightarrow \infty . \end{aligned}$$

Hence, it remains to prove convergence in distribution for the sequence \(F^{(\varepsilon )}_N\).

We observe that the sequence \(F_N^{(\varepsilon )}\) fits into the framework of Sect. 5. However, we need to take into account the dependence on the new parameter \(\varepsilon \) and refine the previous estimates. It will be important for our argument that the supports of the functions \(f_\varepsilon \) are uniformly bounded, \(\Vert f_\varepsilon \Vert _{C^0}\ll 1\), and \(\Vert f_\varepsilon \Vert _{C^k}\ll \varepsilon ^{-k}\).

As in Sect. 5, we consider the truncation

$$\begin{aligned} F^{(\varepsilon ,L)}_N:=\frac{1}{\sqrt{N}} \sum _{s=M}^{N-1} \left( {\hat{f}}^{(L)}_\varepsilon \circ a^s-\mu _\mathcal {Y}({\hat{f}}^{(L)}_\varepsilon \circ a^s)\right) . \end{aligned}$$

defined for a parameter \(L=L(N)\rightarrow \infty \). We assume that

$$\begin{aligned} M\gg \log L, \end{aligned}$$
(6.6)

so that Proposition 4.8 applies when \(s\geqslant M\). Since the family of functions \(f_\varepsilon \) is majorized by a fixed bounded function with compact support, Proposition 4.8 implies that when \(m\geqslant 2\),

$$\begin{aligned} \left\| {\hat{f}}_\varepsilon \circ a^s\right\| _{L^2(\mathcal {Y})}\ll 1\quad \hbox {for all }s\geqslant 0, \end{aligned}$$

uniformly on \(\varepsilon \). Hence, the bound (5.6) can be proved exactly as before, and we obtain

$$\begin{aligned} \left\| F_N^{(\varepsilon )} - F_N^{(\varepsilon , L)}\right\| _{L^1(\mathcal {Y})}\ll _p N^{1/2}L^{-p/2}\quad \hbox {for all }p<m+n. \end{aligned}$$

We choose the parameter L as before so that

$$\begin{aligned} N=o\left( L^p\right) \quad \hbox {for some }p<m+n \end{aligned}$$
(6.7)

to guarantee that

$$\begin{aligned} \left\| F_N^{(\varepsilon )} - F_N^{(\varepsilon , L)}\right\| _{L^1(\mathcal {Y})}\rightarrow 0\quad \hbox {as }N\rightarrow \infty . \end{aligned}$$

Now it remains to show that the family \(F_N^{(\varepsilon , L)}\) satisfies the CLT with a suitable choice of parameters \(M,L,\varepsilon \). As in Sect. 5 we will show that for \(r\geqslant 3\),

$$\begin{aligned} {{\,\mathrm{Cum}\,}}_{\mu _\mathcal {Y}}^{(r)}\left( F^{(\varepsilon ,L)}_N\right)&\rightarrow 0\quad \hbox {as } N\rightarrow \infty , \end{aligned}$$
(6.8)

and

$$\begin{aligned} \left\| F^{(\varepsilon ,L)}_N\right\| _{L^2(\mathcal {Y})}^2\rightarrow \sigma \quad \hbox {as } N\rightarrow \infty \end{aligned}$$
(6.9)

with an explicit \(\sigma \in (0,\infty )\).

Under the condition

$$\begin{aligned} M\gg \gamma , \end{aligned}$$
(6.10)

the estimate (5.21) gives the bound

$$\begin{aligned} \left| \hbox {Cum}_{\mu _\mathcal {Y}}^{(r)}(F^{(\varepsilon ,L)}_N)\right|\ll & {} N^{r/2} e^{-\delta \gamma } L^{r}\Vert f_\varepsilon \Vert _{C^k}^r+ N^{1-r/2}\gamma ^{r-1} L^{(r-(m+n-1))^+} \Vert f_\varepsilon \Vert _{C^0}^r\\\ll & {} N^{r/2} e^{-\delta \gamma } L^{r}\varepsilon ^{-rk}+ N^{1-r/2}\gamma ^{r-1} L^{(r-(m+n-1))^+}. \end{aligned}$$

We note that the implicit constant in (5.21) depends only on \({{\,\mathrm{supp}\,}}(f_\varepsilon )\) so that it is uniform on \(\varepsilon \). We choose \(L=N^q\) as in Sect. 5 and \(\gamma =c_r(\log N)\), where \(c_r>0\) will be specified later. In particular, then \(N^{1-r/2}\gamma ^{r-1} L^{(r-(m+n-1))^+}\rightarrow 0\), and assuming that

$$\begin{aligned} N^{r/2} L^{r}\varepsilon ^{-rk}=o(e^{\delta \gamma }), \end{aligned}$$
(6.11)

it follows that (6.8) holds.

To prove (6.9), we have to estimate

$$\begin{aligned} \left\| F^{(\varepsilon ,L)}_N\right\| _{L^2(\mathcal {Y})}^2 =\frac{1}{N}\sum _{s_1=M}^{N-1}\sum _{s_2=M}^{N-1} \int _{\mathcal {Y}} \psi ^{(\varepsilon ,L)}_{s_1}\psi ^{(\varepsilon ,L)}_{s_2}\, d\mu _\mathcal {Y}, \end{aligned}$$

where

$$\begin{aligned} \psi ^{(\varepsilon ,L)}_s(y):={\hat{f}}_\varepsilon ^{(L)}(a^sy)-\mu _\mathcal {Y}({\hat{f}}_\varepsilon ^{(L)} \circ a^s). \end{aligned}$$

As in (5.23), we obtain that

$$\begin{aligned} \left\| F^{(\varepsilon ,L)}_N\right\| _{L^2(\mathcal {Y})}^2 = \Theta ^{(\varepsilon ,L)}_N(0) +2\sum _{s=1}^{N-M-1}\Theta ^{(\varepsilon ,L)}_N(s), \end{aligned}$$

where

$$\begin{aligned} \Theta ^{(\varepsilon ,L)}_N(s):=\frac{1}{N}\sum _{t=M}^{N-1-s} \int _{\mathcal {Y}}\psi ^{(\varepsilon ,L)}_{s+t}\psi ^{(\varepsilon ,L)}_t\, d\mu _\mathcal {Y}. \end{aligned}$$

Our estimate proceeds as in Sect. 5, and we shall show that with a suitable choice of parameters,

$$\begin{aligned} \left\| F^{(\varepsilon ,L)}_N\right\| _{L^2(\mathcal {Y})}^2 =&\Theta ^{(\varepsilon ,L)}_\infty (0) +2\sum _{s=1}^{K-1}\Theta ^{(\varepsilon ,L)}_\infty (s)+o(1), \end{aligned}$$
(6.12)

where

$$\begin{aligned} \Theta ^{(\varepsilon ,L)}_\infty (s):=\int _{{\mathcal {X}}} ({\hat{f}}_\varepsilon ^{(L)}\circ a^s){\hat{f}}_\varepsilon ^{(L)}\, d\mu _{\mathcal {X}}-\mu _{\mathcal {X}}({\hat{f}}_\varepsilon ^{(L)})^2. \end{aligned}$$

Indeed, arguing as in (5.28), we deduce that

$$\begin{aligned}&\Theta ^{(\varepsilon ,L)}_N(0)+2\sum _{s=1}^{M-N-1}\Theta ^{(\varepsilon ,L)}_N(s)\\&\quad =\Theta ^{(\varepsilon ,L)}_\infty (0)+2\sum _{s=1}^{K-1}\Theta ^{(\varepsilon ,L)}_\infty (s)\\&\qquad +\, O \left( N^{-1}(M+K)K+ N^{-1} e^{-\delta M} e^{\xi K} L^{2}\varepsilon ^{-2k} + e^{-\delta K} L^{2}\varepsilon ^{-2k} \right) . \end{aligned}$$

Hence, (6.12) holds provided that

$$\begin{aligned}&\displaystyle e^{-\delta K} L^{2}\varepsilon ^{-2k} \rightarrow 0, \end{aligned}$$
(6.13)
$$\begin{aligned}&\displaystyle N^{-1} e^{-\delta M} e^{\xi K} L^{2}\varepsilon ^{-2k} \rightarrow 0, \end{aligned}$$
(6.14)
$$\begin{aligned}&\displaystyle N^{-1}(M+K)K \rightarrow 0. \end{aligned}$$
(6.15)

Next, we set

$$\begin{aligned} \Theta ^{(\varepsilon )}_\infty (s):=\int _{{\mathcal {X}}} ({\hat{f}}_\varepsilon \circ a^s){\hat{f}}_\varepsilon \, d\mu _{\mathcal {X}}-\mu _{\mathcal {X}}({\hat{f}}_\varepsilon )^2 \end{aligned}$$

and observe that as in (5.34),

$$\begin{aligned} \Theta ^{(\varepsilon ,L)}_\infty (s)=\Theta ^{(\varepsilon )}_\infty (s)+O_\tau \left( L^{-(\tau -1)/2}\right) \quad \hbox {for all } \tau <m+n-1, \end{aligned}$$

uniformly on \(\varepsilon \). Hence, choosing K so that

$$\begin{aligned} K L^{-(\tau -1)/2} \rightarrow 0 \quad \hbox {for some }\tau <m+n-1, \end{aligned}$$
(6.16)

we conclude that

$$\begin{aligned} \left\| F^{(\varepsilon ,L)}_N\right\| _{L^2(\mathcal {Y})}^2 = \Theta ^{(\varepsilon )}_\infty (0)+2\sum _{s=1}^{K-1}\Theta ^{(\varepsilon )}_\infty (s)+o(1). \end{aligned}$$

The terms \(\Theta ^{(\varepsilon )}(s)\) can be computed using Propositions 4.3 and 4.4, and we obtain that

$$\begin{aligned} \Theta ^{(\varepsilon )}_\infty (s)=\zeta (m+n)^{-1}\sum _{p,q\geqslant 1} \left( \int _{\mathbb {R}^{m+n}} f_\varepsilon (pa^s\overline{z})f_\varepsilon (q\overline{z})\,d\overline{z}+ \int _{\mathbb {R}^{m+n}} f_\varepsilon (pa^s\overline{z})f_\varepsilon (-q\overline{z})\,d\overline{z}\right) . \end{aligned}$$

We also set

$$\begin{aligned} \Theta _\infty (s):= & {} \zeta (m+n)^{-1}\sum _{p,q\geqslant 1} \left( \int _{\mathbb {R}^{m+n}} \chi (pa^s\overline{z})\chi (q\overline{z})\,d\overline{z}+\int _{\mathbb {R}^{m+n}} \chi (pa^s\overline{z})\chi (-q\overline{z})\,d\overline{z}\right) \\= & {} 2\zeta (m+n)^{-1}\sum _{p,q\geqslant 1} \int _{\mathbb {R}^{m+n}} \chi (pa^s\overline{z})\chi (q\overline{z})\,d\overline{z}. \end{aligned}$$

We claim that

$$\begin{aligned} |\Theta ^{(\varepsilon )}_\infty (s)-\Theta _\infty (s)|\ll \varepsilon ^{1/2}e^{-(m+n-2)s/2}. \end{aligned}$$
(6.17)

This reduces to the estimation of

$$\begin{aligned}&\left| \int _{\mathbb {R}^{m+n}} f_\varepsilon (pa^s\overline{z})f_\varepsilon (q\overline{z})\,d\overline{z}- \int _{\mathbb {R}^{m+n}} \chi (pa^s\overline{z})\chi (q\overline{z})\,d\overline{z}\right| \nonumber \\&\quad \leqslant \left| \int _{\mathbb {R}^{m+n}} (f_\varepsilon -\chi )(pa^s\overline{z})f_\varepsilon (q\overline{z})\,d\overline{z}\right| + \left| \int _{\mathbb {R}^{m+n}} \chi (pa^s\overline{z})(\chi -f_\varepsilon )(q\overline{z})\,d\overline{z} \right| .\qquad \qquad \end{aligned}$$
(6.18)

We observe that there exists \(0<\upsilon _1<\upsilon _2\) such that

$$\begin{aligned} \int _{\mathbb {R}^{m+n}} (f_\varepsilon -\chi )(pa^s\overline{z})f_\varepsilon (q\overline{z})\,d\overline{z}=0 \end{aligned}$$

unless \(\upsilon _1\,e^sq\leqslant p\leqslant \upsilon _2\, e^sq\), and by the Cauchy–Schwarz inequality under these restrictions,

$$\begin{aligned} \left| \int _{\mathbb {R}^{m+n}} (f_\varepsilon -\chi )(pa^s\overline{z})f_\varepsilon (q\overline{z})\,d\overline{z}\right| \leqslant \frac{\Vert f_\varepsilon -\chi \Vert _{L^2}}{p^{(m+n)/2}}\frac{\Vert f_\varepsilon \Vert _{L^2}}{q^{(m+n)/2}} \ll \frac{\varepsilon ^{1/2}}{q^{m+n}e^{(m+n)s/2}}. \end{aligned}$$

Since \(m+n\geqslant 3\),

$$\begin{aligned}&\sum _{p,q\geqslant 1}\left| \int _{\mathbb {R}^{m+n}} (f_\varepsilon -\chi )(pa^s\overline{z})f_\varepsilon (q\overline{z})\,d\overline{z}\right| \\&\quad = \sum _{q\geqslant 1}\sum _{\upsilon _1\,e^sq\leqslant p\leqslant \upsilon _2\, e^sq} \left| \int _{\mathbb {R}^{m+n}} (f_\varepsilon -\chi )(pa^s\overline{z})f_\varepsilon (q\overline{z})\,d\overline{z}\right| \\&\quad \ll \sum _{q\geqslant 1}\frac{\varepsilon ^{1/2}e^sq }{q^{m+n}e^{(m+n)s/2}}\ll \varepsilon ^{1/2}e^{-(m+n-2)s/2}. \end{aligned}$$

The sum of the other integral appearing in (6.18) is estimated similarly. This proves (6.17).

Provided that \(\varepsilon =\varepsilon (N)\rightarrow 0,\) the estimate (6.17) implies that

$$\begin{aligned} \left\| F^{(\varepsilon ,L)}_N\right\| _{L^2(\mathcal {Y})}^2 = \Theta _\infty (0)+2\sum _{s=1}^{K-1}\Theta _\infty (s)+o(1). \end{aligned}$$

Hence,

$$\begin{aligned} \left\| F^{(\varepsilon ,L)}_N\right\| _{L^2(\mathcal {Y})}^2 \rightarrow \sigma ^2:=\Theta _\infty (0)+2\sum _{s=1}^{\infty }\Theta _\infty (s) \end{aligned}$$

as \(N\rightarrow \infty \).

Finally, we compute the limit

$$\begin{aligned} \sigma ^2=\sum _{s=-\infty }^{\infty }\Theta _\infty (s)= 2\zeta (m+n)^{-1}\sum _{s=-\infty }^{\infty } \sum _{p,q\geqslant 1}\int _{\mathbb {R}^{m+n}} \chi (pa^s\overline{z})\chi (q\overline{z})\, d\overline{z}. \end{aligned}$$

We note that the sum

$$\begin{aligned} \Xi :=\sum _{s=-\infty }^{\infty } \chi \circ a^s \end{aligned}$$

is equal to the characteristic function of the set

$$\begin{aligned} \left\{ (\overline{x},\overline{y})\in \mathbb {R}^{m+n}:\, \Vert \overline{y}\Vert >0, \;\; |x_i|<\vartheta _i\, \Vert \overline{y}\Vert ^{-w_i},\; i=1,\ldots ,m\right\} , \end{aligned}$$

and

$$\begin{aligned}&\int _{\mathbb {R}^{m+n}} \Xi (p\overline{z})\chi (q\overline{z})\, d\overline{z}=\int _{1/q\leqslant \Vert y\Vert<e/q} \left( \prod _{i=1}^m 2\vartheta _i\max (p,q)^{-1-w_i}\Vert \overline{y}\Vert ^{-w_i}\right) \,d\overline{y}\\&\quad =\,2^m\left( \prod _{i=1}^m \vartheta _i\right) \max (p,q)^{-m-n} \int _{1/q\leqslant \Vert \overline{y}\Vert <e/q} \Vert \overline{y}\Vert ^{-n}\,d\overline{y}\\&\quad =\,2^m\left( \prod _{i=1}^m \vartheta _i\right) \max (p,q)^{-m-n} \int _{S^{n-1}}\Vert \overline{z}\Vert ^{-n} \left( \int _{q^{-1}\Vert \overline{z}\Vert ^{-1}}^{eq^{-1}\Vert \overline{z}\Vert ^{-1}} r^{-1}\, dr\right) \,d\overline{z}\\&\quad =\,2^m\left( \prod _{i=1}^m \vartheta _i\right) \omega _n\,\max (p,q)^{-m-n}, \end{aligned}$$

where \(\omega _n:=\int _{S^{n-1}}\Vert \overline{z}\Vert ^{-n}\, d\overline{z}\). We also see that

$$\begin{aligned} \sum _{p,q\geqslant 1} \max (p,q)^{-m-n}= & {} \sum _{p=1}^\infty p^{-m-n}+2\sum _{1\leqslant p<q} q^{-m-n}\\= & {} \zeta (m+n)+2\sum _{q\geqslant 1} \frac{q-1}{q^{m+n}} =2\zeta (m+n-1)-\zeta (m+n), \end{aligned}$$

and thus

$$\begin{aligned} \sigma ^2=\sum _{s=-\infty }^{\infty }\Theta _\infty (s)= 2^{m+1}\left( \prod _{i=1}^m \vartheta _i\right) \omega _n\left( \frac{2\zeta (m+n-1)}{\zeta (m+n)}-1\right) . \end{aligned}$$

6.1 Proof of Theorem 6.1

As we already remarked above, it is sufficient to show that the average \(F_N^{(\varepsilon ,L)}\) converge in distribution to the normal law. According to Proposition 3.4, it is sufficient to check that

$$\begin{aligned} {{\,\mathrm{Cum}\,}}^{(r)}_{\mu _\mathcal {Y}}\left( F_N^{(\varepsilon ,L)}\right) \rightarrow 0\quad \hbox {as } N\rightarrow \infty \end{aligned}$$

when \(r\geqslant 3\), and

$$\begin{aligned} \left\| F_N^{(\varepsilon ,L)}\right\| _{L^2(\mathcal {Y})}\rightarrow \sigma ^2\quad \hbox {as } N\rightarrow \infty . \end{aligned}$$

These properties have been established above provided that the parameters

$$\begin{aligned} M=M(N),\quad \varepsilon =\varepsilon (N),\quad L=N^q,\quad \gamma =c_r(\log N),\quad K=K(N)\leqslant M(N) \end{aligned}$$

satisfy the ten conditions (6.4)–(6.7), (6.10), (6.11), (6.13)–(6.16). It remains to show that such choice of parameters is possible. The condition (6.7) is guaranteed by the choice of L. First, we take

$$\begin{aligned} \varepsilon (N)=1/N. \end{aligned}$$

Then the first part of (6.5) holds. Then we select sufficiently large \(c_r\) in \(\gamma =c_r(\log N)\) so that (6.11) holds. After that we choose

$$\begin{aligned} K(N)=c_1(\log N) \end{aligned}$$

with sufficiently large \(c_1>0\) so that (6.13) holds. Then it is clear that (6.16) also holds. Given these \(\varepsilon \), L, \(\gamma \), and K, we choose

$$\begin{aligned} M(N)=(\log N)(\log \log N) \end{aligned}$$

so that the second part of (6.5), (6.6), (6.10), and (6.14) hold for all \(N\geqslant N_0(r)\). With these choices, it is clear that (6.4) and (6.15) also hold. Hence, Theorem 6.1 follows from Proposition 3.4.

6.2 Proof of Theorem 1.2

For \(u\in \hbox {M}_{m,n}([0,1])\), we set

$$\begin{aligned} D_T({u}):=\frac{\Delta _T({u})-C_{m,n}\,\log T}{(\log T)^{1/2}}, \end{aligned}$$

where \(C_{m,n}=2^m\vartheta _1\cdots \vartheta _m\omega _n\) with \(\omega _n:=\int _{S^{n-1}} \Vert \overline{z}\Vert ^{-n}\,d\overline{z}\). We shall show that \(D_T({u})\) can be approximated by the averages \(F_N\) defined in (6.1). This will allow us deduce convergence in distribution for \(D_T\). We observe that:

Lemma 6.3

$$\begin{aligned} \sum _{s=0}^{N-1}\int _{\mathcal {Y}} {\hat{\chi }}(a^s y)\,d\mu _\mathcal {Y}(y) = C_{m,n} N+O(1), \end{aligned}$$

where \(C_{m,n}\) is defined above.

Proof

We observe that

$$\begin{aligned} \sum _{s=0}^{N-1}\int _{\mathcal {Y}} {\hat{\chi }}(a^s y)\,d\mu _\mathcal {Y}(y) =\int _{\mathcal {Y}} {\hat{\Xi }}_N( y)\,d\mu _\mathcal {Y}(y), \end{aligned}$$

where \(\Xi _N\) denotes the characteristic function of the set

$$\begin{aligned} \left\{ (\overline{x},\overline{y})\in \mathbb {R}^{m+n}:\, 1\leqslant \Vert \overline{y}\Vert< e^N, \;\; |x_i|<\vartheta _i\, \Vert \overline{y}\Vert ^{-w_i},\; i=1,\ldots ,m\right\} . \end{aligned}$$

Using notation as in the proof of Proposition 4.6, we obtain

$$\begin{aligned} \int _{\mathcal {Y}} {\hat{\Xi }}_N( y)\,d\mu _\mathcal {Y}(y)= & {} \sum _{1\leqslant \Vert \overline{q}\Vert< e^N} \sum _{\overline{p}\in \mathbb {Z}^m} \prod _{i=1}^m \int _{[0,1]^m} \chi ^{(i)}_{\overline{q}}\left( p_i+\left<\overline{u}_i,\overline{q}\right>\right) d\overline{u}_i\\= & {} \sum _{1\leqslant \Vert \overline{q}\Vert< e^N} \prod _{i=1}^m \left( \sum _{p_i\in \mathbb {Z}}\int _{[0,1]^m} \chi ^{(i)}_{\overline{q}}\left( p_i+\left<\overline{u}_i,\overline{q}\right>\right) \, d\overline{u}_i\right) . \end{aligned}$$

We claim that

$$\begin{aligned} \sum _{p\in \mathbb {Z}}\int _{[0,1]^m} \chi ^{(i)}_{\overline{q}}\left( p+\left<\overline{u},\overline{q}\right>\right) \, d\overline{u} =2\vartheta _i \Vert \overline{q}\Vert ^{-w_i}. \end{aligned}$$
(6.19)

To prove this, let us consider more generally a bounded measurable functions \(\chi \) on \(\mathbb {R}\) with compact support, the function \(\psi (x)=\chi (x_1)\) on \(\mathbb {R}^m\), and the function \({{\tilde{\psi }}}(x)=\sum _{p\in \mathbb {Z}} \chi (p+x_1)\) on the torus \(\mathbb {R}^m/\mathbb {Z}^m\). We suppose without loss of generality that \(q_1\ne 0\) and consider a non-degenerate linear map

$$\begin{aligned} S:\mathbb {R}^m\rightarrow \mathbb {R}^m: \overline{u}\mapsto \left( \left<\overline{u},\overline{q}\right>,u_2,\ldots ,u_m\right) \end{aligned}$$

which induced a linear epimorphism of the torus \(\mathbb {R}^m/\mathbb {Z}^m\). Using that S preserves the Lebesgue probability measure \(\mu \) on \(\mathbb {R}^m/\mathbb {Z}^m\), we deduce that

$$\begin{aligned} \sum _{p\in \mathbb {Z}}\int _{[0,1]^m} \chi \left( p+\left<\overline{u},\overline{q}\right>\right) \, d\overline{u}= & {} \int _{\mathbb {R}^m/\mathbb {Z}^m} {{\tilde{\psi }}}(Sx)\, d\mu (x)\\= & {} \int _{\mathbb {R}^m/\mathbb {Z}^m} {{\tilde{\psi }}}(x)\, d\mu (x)=\int _{\mathbb {R}}\chi (x_1)\,dx_1, \end{aligned}$$

which yields (6.19).

In turn, (6.19) implies that

$$\begin{aligned} \int _{\mathcal {Y}} {\hat{\Xi }}_N\,d\mu _\mathcal {Y}= 2^m\left( \prod _{i=1}^m \vartheta _i\right) \sum _{1\leqslant \Vert \overline{q}\Vert < e^N} \Vert \overline{q}\Vert ^{-n}. \end{aligned}$$

Using that \(\Vert \overline{y}_1\Vert ^{-n}=\Vert \overline{y}_1\Vert ^{-n}+O\left( \Vert \overline{y}_1\Vert ^{-n-1}\right) \) when \(\Vert \overline{y}_1-\overline{y}_2\Vert \ll 1\), we deduce that

$$\begin{aligned} \sum _{1\leqslant \Vert \overline{q}\Vert< e^N} \Vert \overline{q}\Vert ^{-n}&=\int _{1\leqslant \Vert \overline{y}\Vert < e^N} \Vert \overline{y}\Vert ^{-n}\, d\overline{y}+O(1), \end{aligned}$$

and expressing the integral in polar coordinates, we obtain

$$\begin{aligned} \int _{1\leqslant \Vert \overline{y}\Vert < e^N} \Vert \overline{y}\Vert ^{-n}\, d\overline{y} =\int _{S^{n-1}}\int _{\Vert \overline{z}\Vert ^{-1}}^{\Vert \overline{z}\Vert ^{-1} e^N} \Vert r\overline{z}\Vert ^{-n}\, r^{n-1}drd\overline{z} =\omega _n N+O(1). \end{aligned}$$

This implies the lemma. \(\square \)

Now we return to the proof of Theorem 1.2. Since

$$\begin{aligned} \Delta _{e^N}(u)=\sum _{s=0}^{N-1}{\hat{\chi }}(a^s\Lambda _u)+O(1), \end{aligned}$$

Lemma 6.3 implies that

$$\begin{aligned} \Vert D_{e^N}-F_N\Vert _{C^0}\rightarrow 0\quad \hbox {as }N\rightarrow \infty , \end{aligned}$$

where \((F_N)\) is defined as in (6.1). Therefore, it follows from Theorem 6.1 that for every \(\xi \in \mathbb {R}\),

$$\begin{aligned} |\{\xi \in \hbox {M}_{m,n}([0,1]):\, D_{e^N}({u})<\xi \})|\rightarrow \hbox {Norm}_\sigma (\xi ) \quad \hbox {as }N\rightarrow \infty . \end{aligned}$$

Let us take \(N_T=\lfloor \log T\rfloor \). Then

$$\begin{aligned} e^{N_T}\leqslant T<e^{N_T+1}\quad \hbox {and}\quad N_T\leqslant \log T<N_T+1, \end{aligned}$$

so that

$$\begin{aligned} D_T\leqslant \frac{\Delta _{e^{N_T+1}}-C_{m,n}\,N_T}{(\log T)^{1/2}}=a_T\, D_{e^{N_T+1}}+ b_T \end{aligned}$$

with \(a_T\rightarrow 1\) and \(b_T\rightarrow 0\) as \(T\rightarrow \infty \). Hence, we deduce that

$$\begin{aligned} |\{{u}\in \hbox {M}_{m,n}([0,1]):\, D_{T}({u})<\xi \}|\geqslant |\{{u}\in \hbox {M}_{m,n}([0,1]):\, D_{e^{N_T+1}}({u})<(\xi -b_T)/a_T \}|. \end{aligned}$$

It follows that for any \(\varepsilon >0\) and sufficiently large T,

$$\begin{aligned} |\{{u}\in \hbox {M}_{m,n}([0,1]):\, D_{T}({u})<\xi \}|\geqslant |\{{u}\in \hbox {M}_{m,n}([0,1]):\, D_{e^{N_T+1}}({u})<\xi -\varepsilon \}|. \end{aligned}$$

Therefore,

$$\begin{aligned} \liminf _{T\rightarrow \infty }|\{{u}\in \hbox {M}_{m,n}([0,1]):\, D_{T}({u})<\xi \}|\geqslant \hbox {Norm}_\sigma (\xi -\varepsilon ) \end{aligned}$$

for all \(\varepsilon >0\). This implies that

$$\begin{aligned} \liminf _{T\rightarrow \infty }|\{{u}\in \hbox {M}([0,1]):\, D_{T}({u})<\xi \}|\geqslant \hbox {Norm}_\sigma (\xi ). \end{aligned}$$

A similar argument also implies the upper bound

$$\begin{aligned} \limsup _{T\rightarrow \infty }|\{{u}\in \hbox {M}_{m,n}([0,1]):\, D_{T}({u})<\xi \}|\leqslant \hbox {Norm}_\sigma (\xi ). \end{aligned}$$

This completes the proof of Theorem 1.2.