Advertisement

Jackknife variance estimation for general two-sample statistics and applications to common mean estimators under ordered variances

  • Ansgar StelandEmail author
  • Yuan-Tsung Chang
Original Paper
  • 115 Downloads

Abstract

We study the jackknife variance estimator for a general class of two-sample statistics. As a concrete application, we consider samples with a common mean but possibly different, ordered variances as arising in various fields such as interlaboratory experiments, field studies, or the analysis of sensor data. Estimators for the common mean under ordered variances typically employ random weights, which depend on the sample means and the unbiased variance estimators. They take different forms when the sample estimators are in agreement with the order constraints or not, which complicates even basic analyses such as estimating their variance. We propose to use the jackknife, whose consistency is established for general smooth two-sample statistics induced by continuously Gâteux or Fréchet-differentiable functionals, and, more generally, asymptotically linear two-sample statistics, allowing us to study a large class of common mean estimators. Furthermore, it is shown that the common mean estimators under consideration satisfy a central limit theorem (CLT). We investigate the accuracy of the resulting confidence intervals by simulations and illustrate the approach by analyzing several data sets.

Keywords

Common mean Data science Central limit theorem Gâteaux derivative Fréchet differentiability Graybill–Deal estimator Jackknife Order constraint Resampling 

1 Introduction

We study the jackknife variance estimation methodology for a wide class of two-sample statistics including asymptotically linear statistics and statistics induced by differentiable two-sample statistical functionals, thus extending the well-studied one-sample results; see Efron and Stein (1981), Shao and Wu (1989), and Steland (2015) and the discussion below. Comparison of two samples by some statistic is a classical statistical design and also widely applied to analyze massive data arising in data science, e.g., when exploring such data by analyzing subsets. Our specific motivation comes from the following classical common mean estimation problem: in many applications, several samples of measurements with common mean but typically with different degrees of uncertainties (in the sense of variances) are drawn. This setting generally arises when using competing measurement systems of different quality or if the factor variable defining the samples affects the dispersion. For example, when checking the octane level in gasoline at pump stations, inspectors use cheap hand-held devices to collect many low precision measurements and send only a few samples to government laboratories for detailed analyses of high precision. In both cases, the same mean octane level is measured, but the variance and the shape of the distribution may differ. The issue of samples with common mean but heterogeneous, ordered variances also arises in big data applications, for example (e.g.,) when processing data from image sensors, see Degerli (2000) and Lin (2010), or accelerometors, Cemer (2011), as used in smartphones or specialized measurement systems. Here, the thermo-mechanical (or Brownian) noise represents a major source of noise and depends on temperature. Therefore, in the presence of a constant signal, samples taken under different conditions exhibit different variances and the order constraint is related to temperature. It is worth mentioning that the general statistical problem how to combine estimators from different samples dates back to the works of Fisher (1932), Tippett (1931) and Cochran (1937), cf. the discussion given in Keller and Olkin (2004). In the present article, we study the classical problem to estimate the common mean in the presence of ordered variances and propose to use the a two-sample jackknife variance estimator to assess the uncertainty.

The jackknife is easy to use and feasible for big data problems, since its computational costs are substantially lower than the other techniques such as the bootstrap. Indeed, our simulations indicate that it also provides substantially higher accuracy of confidence intervals than the bootstrap for the common mean estimation problem. We, therefore, extend the jackknife methodology for smooth statistical functionals to a general two-sample framework and establish a new result, which holds as long as the statistic of interest can be approximated by a linear statistic. This result goes beyond the case of smooth statistics induced by continuously differentiable functionals and allows us to treat a large class of common mean estimators.

Since the jackknife has not yet been studied for two-sample settings, we establish its consistency and asymptotic unbiasedness for possibly nonlinear but asymptotically linear two-sample statistics. We introduce a specific new jackknife variance estimator for two samples with possibly unequal sample sizes \(n_1\) and \(n_2\), which is based on \(n_1 + n_2\) leave-one-out replicates. In addition, for equal sample sizes, we study an alternative procedure which generates replicates by leaving out pairs of observations. Both jackknife estimators are shown to be weakly consistent and asymptotically unbiased. Those general results allow us to show that the jackknife consistently estimates the variance of a large class of common mean estimators including many of those proposed in the literature. They are, however, also interesting in their own right. First, because we provide conditions which are easier to verify in cases where the statistic of interest is not induced by a smooth statistical functional, e.g., due to discontinuities as arising in the common mean estimation problem. Second, since we provide a proof using elementary arguments avoiding the calculus of differentiable statistical functionals. Finally, our approach shows that the pseudo-values are consistent estimates of the summands of the asymptotically equivalent linear statistic. In addition to these results, we also extend the known consistency results for continuously Gâteaux- and Fréchet-differentiable statistical functionals addressing one-sample settings to two-sample settings resulting in a comprehensive treatment of the two-sample jackknife methodology.

For the common mean estimation problem, which we studied in depth as a non-trivial application, several common mean estimators have been discussed in the literature. For unequal variances, the Graybill–Deal (GD) estimator may be used, which weights the sample averages with the inverse unbiased variance estimates. If, however, an order constraint on the variances is imposed, several estimators have been proposed which dominate the GD estimator, see especially Nair (1982), Elfessi and Pal (1992), Chang and Shinozaki (2008), and Chang et al. (2012). Those estimators are given by convex combinations of the sample means with weights, additionally depending on whether the ordering of sample variances is in agreement with the order constraint on the variances.

When random weights are used, even the calculation of the variance of such a common mean estimator is a concern and has been only studied under the assumption of Gaussian samples for certain special cases such as the GD estimator. As a way out, we propose to employ the jackknife variance estimator of Quenouille (1949) and Tukey (1958), which basically calculates the sample variance of a pseudo-sample obtained by leave-one-out replicates of the statistic of interest. It has been extensively studied for the case of one-sample problems by Efron and Stein (1981) and Shao and Wu (1989), the latter for asymptotically linear statistics, and recently by Steland (2015) for the case of vertically weighted sample averages. Compared to the other approaches, the jackknife variance estimator has the advantage not to give underbiased estimates, cf. (Efron 1982, p. 42) and Efron and Stein (1981). It also made its way in the recent textbooks devoted to computer age statistics and data science (see Efron and Hastie (2016)).

A further issue studied in this paper is the asymptotic distribution of common mean estimators. We show that common mean estimators using random weights are asymptotically normal under fairly weak conditions, which are satisfied by those estimators studied in the literature. Combining this result with the jackknife variance estimators allows us to construct asymptotic confidence intervals and to test statistical hypotheses about the common mean. In Steland (2017), that approach was applied to real data from photovoltaics, compared with the other methods and investigated by data-driven simulations using distributions arising in that field. It was found that confidence intervals based on the proposed methodology have notably higher accuracy in terms of the coverage probability, thus providing a real world example where the approach improves upon the existing ones. In this paper, we broaden these data-driven results by a simulations study investigating distributions with different tail behavior.

The organization of the paper is as follows. Section 2 studies the jackknife variance estimator. Two-sample statistics induced by Gâteaux- and Fréchet-differentiable functionals are studied as well as asymptotically linear two-sample statistics. In Sect. 3, we introduce and review the common mean estimation problem for unequal and ordered variances for two samples and discuss related results from the literature. Section 4 presents the results about the jackknife variance estimation for common mean estimators with random weights and provides a central limit theorem. Finally, Sect. 5 presents the simulations and analyzes three data sets from physics, technology, and social information, to illustrate the approach.

2 The jackknife for two-sample statistics

The Quenouille–Tukey jackknife, see Quenouille (1949), Tukey (1958) and Miller (1974), is a simple and effective resampling technique for bias and variance estimation, see also the monograph Efron (1982). For a large class of one-sample statistics, the consistency of the jackknife variance estimator has been studied in depth by Shao and Wu (1989) and recently in Steland (2015). Lee (1991) studied jackknife variance estimation for a one-way random-effects model by simulations. In the present section, we extend the jackknife to quite general two-sample statistics. To the best of our knowledge, the results of this section are new and extend the existing theoretical investigations.

We shall first discuss the case of asymptotically linear two-sample statistics considering the cases of unequal and equal sample sizes separately, as it turns out that for equal sample sizes one may define a simpler jackknife variance estimator. Then, we study the case of two-sample statistics induced by a two-sample differentiable statistical functional extending the known results for the one-sample setting.

Let us consider a two-sample statistic \(T_n\), i.e., a statistic \(T_n = T_n( X_{11}, \dots , X_{1n_1}, X_{21}, \dots , X_{2n_2} )\) which is symmetric in its first \(n_1\) arguments as well as in its last \(n_2\) arguments; \(n = n_1 + n_2\) is the total sample size. We assume that \(T_n\) has the following property: for all sample sizes \(n_1, n_2 \in {\mathbb {N}}\) with \(n_i/n \rightarrow \lambda _i\), \(i = 1, 2\), as \(\min (n_1,n_2) \rightarrow \infty\), and all independent i.i.d. samples \(X_{ij} \sim F_i\), \(j = 1, \dots , n_i\), \(i = 1, 2\), we have the following:
$$\begin{aligned} T_n = L_n+ R_n, \qquad E(L_n) = 0, \end{aligned}$$
(1)
with a linear statistic
$$\begin{aligned} L_n = \frac{1}{n} \left[ \sum _{i=1}^{n_1} h_1(X_i ) + \sum _{i=n_1+1}^{n} h_2(X_i)\right] , \end{aligned}$$
(2)
for two kernel functions \(h_1, h_2\) with
$$\begin{aligned} \int h_i^4 {\text {d}} F_i < \infty , \qquad i = 1,2, \end{aligned}$$
(3)
and a remainder term \(R_n\) satisfying \(n E(R_n^2) = o(1)\), as \(n \rightarrow \infty\). Here and in what follows:
$$\begin{aligned} (X_1, \dots , X_n)' = (X_{11}, \dots , X_{1n_1}, X_{21}, \dots , X_{2n_2})'. \end{aligned}$$
Recall that a statistic \(T_n\) attaining such a decomposition is called asymptotically linear.
Let us recall the following definitions and facts. Suppose that \(U_n\) is a statistic which satisfies, for some parameter \(\theta\), the central limit theorem:
$$\begin{aligned} \sqrt{n} (U_n - \theta ) {\mathop {\rightarrow }\limits ^{d}} N( 0, \sigma ^2(U) ), \end{aligned}$$
as \(n \rightarrow \infty\), with \(\lim _{n \rightarrow \infty } n {\mathrm{Var\,}}( U_n ) = \sigma ^2(U) \in (0, \infty )\). Here, \({\mathop {\rightarrow }\limits ^{d}}\) denotes convergence in distribution. Then, the constant \(\sigma ^2(U)\) is called asymptotic variance of\(U_n\). By (1), \(T_n\) inherits its asymptotic variance denoted \(\sigma ^2(T)\) from \(L_n\), and thus we obtain
$$\begin{aligned} \sigma ^2(T)&= \lim _{n \rightarrow \infty } {\mathrm{Var\,}}( \sqrt{n} L_n ) \nonumber \\&= \lim _{n \rightarrow \infty } n \left( \frac{n_1}{n} \right) ^2 \frac{ {\mathrm{Var\,}}( h_1(X_1) ) }{n_1} + n \left( \frac{n_2}{n} \right) ^2 \frac{ {\mathrm{Var\,}}( h_2(X_{n_1+1}) ) }{n_2} \end{aligned}$$
(4)
$$\begin{aligned}&= \lambda _1 \tau _1^2 + \lambda _2 \tau _2^2, \end{aligned}$$
(5)
where
$$\begin{aligned} \tau _1^2 = {\mathrm{Var\,}}( h_1(X_{11}) ) \quad \text {and} \quad \tau _2^2 = {\mathrm{Var\,}}( h_2(X_{21}) ). \end{aligned}$$

2.1 Unequal samples sizes

First, we discuss the general case of unequal sample sizes. Let us make the dependence on the observations explicit and write:
$$\begin{aligned} T_n = T_n( X_1, \dots , X_{n_1}, X_{n_1+1}, \dots , X_n). \end{aligned}$$
The leave-one-out statistic obtained when omitting the ith observation is denoted by
$$\begin{aligned} T_{n,-i} = T_{n-1}( X_1, \dots , X_{i-1}, X_{i+1}, \dots , X_n ), \end{aligned}$$
for \(i = 1, \dots , n\). Define the leave-one-out pseudo-values:
$$\begin{aligned} {\widehat{ \xi }}_i = n T_n - (n-1) T_{n,-i}, \qquad i = 1, \dots , n. \end{aligned}$$
(6)
The jackknife leave-one-out variance estimator for \(\sigma ^2(T)\) is then defined by
$$\begin{aligned} {\widehat{ \sigma }}^2_n(T) = \frac{n_1}{n} {\widehat{ \tau }}^2_{1} + \frac{n_2}{n} {\widehat{ \tau }}^2_{2}, \end{aligned}$$
(7)
where
$$\begin{aligned} {\widehat{ \tau }}^2_{1} = \frac{1}{n_1-1} \sum _{j=1}^{n_1} ( {\widehat{ \xi }}_j - \overline{{\widehat{ \xi }}}_{1:n_1} )^2 \end{aligned}$$
and
$$\begin{aligned} {\widehat{ \tau }}^2_{2} = \frac{1}{n_2-1} \sum _{j=n_1+1}^{n_2} ( {\widehat{ \xi }}_j - \overline{{\widehat{ \xi }}}_{(n_1+1):n} )^2 \end{aligned}$$
with \(\overline{{\widehat{ \xi }}}_{a:b} = \frac{1}{b-a+1} \sum _{j=a}^b {\widehat{ \xi }}_j\).
The associated jackknife variance estimator of \({\mathrm{Var\,}}( T_n )\) is then given by the following:
$$\begin{aligned} {\widehat{ {\mathrm{Var\,}} }}( T_n ) = \frac{ {\widehat{ \sigma }}_n^2(T) }{ n }; \end{aligned}$$
see also Efron and Stein (1981), Efron (1982), and the discussion given in Remark 2.2.

Theorem 2.1

Assume that\(T_n\)satisfies (1), (2), and (3). If, in addition, the remainder term satisfies
$$\begin{aligned} n E( R_n^2 )= & {} o(1), \end{aligned}$$
(8)
$$\begin{aligned} n^2 E( R_n - R_{n-1} )^2=\, & {} o(1), \end{aligned}$$
(9)
as\(n \rightarrow \infty\), then the following assertions hold.
  1. (i)
    For each\(1 \le i \le n\):
    $$\begin{aligned} E({\widehat{ \xi }}_i - \xi _i)^2 = o(1), \end{aligned}$$
    where\(\xi _i = h_1(X_i)\), if\(1\le i \le n_1\), and\(\xi _i = h_2(X_i)\), if\(n_1+1\le i \le n\).
     
  2. (ii)
    \({\widehat{ \sigma }}^2_n(T)\)is a consistent and asymptotically unbiased estimator for the asymptotic variance\(\sigma ^2(T)\)of\(T_n\), that is:
    $$\begin{aligned} \left| \frac{ {\widehat{ \sigma }}^2_n(T) }{ \sigma ^2(T) } - 1 \right| \rightarrow 0, \end{aligned}$$
    in probability, as\(\min (n_1,n_2) \rightarrow \infty\)and
    $$\begin{aligned} \left| \frac{ E{\widehat{ \sigma }}^2_n(T) }{ \sigma ^2(T) } - 1 \right| \rightarrow 0, \end{aligned}$$
    as\(\min (n_1,n_2) \rightarrow \infty\). The associated jackknife variance estimator of\({\mathrm{Var\,}}( T_n )\)shares the above consistency properties.
     

Remark 2.1

Observe that the examples for (8) and (9) to hold are \(h_i(x) = x - \mu _i\) (arithmetic means) and \(h_i(x) = (x-\mu _i)^2 - \sigma _i^2\) (sample variances, as verified in the Appendix). The conditions on the second moment of the remainder term in (8) and (9) can be interpreted as measures of smoothness of \(T_n\). They have been also employed by Shao and Wu (1989), cf. their Theorem 1, to study jackknife variance estimation for one-sample statistics, but our proof is quite different from the methods of proof used there. Especially, we show that, by virtue of condition (9), the summands of the asymptotic linearization, \(L_n\), i.e., the random variables \(h_1(X_i)\), \(i = 1, \dots , n_1\), and \(h_2(X_i)\), \(i = n_1 + 1 ,\dots , n_1+n_2\), can be estimated consistently. To the best of our knowledge, this interesting and useful result has not yet been established in the literature.

Proof

We provide a direct proof. As a preparation, observe the following facts about the statistic \(T_n\) and its leave-one-out pseudo-values: When omitting the ith observation, say \(i \le n_1\), and calculating the statistic from the resulting sample of size \(n-1\), we have the following:
$$\begin{aligned} T_{n,-i} = T_{n-1}( X_1, \dots , X_{i-1}, X_{i+1}, \dots , X_n ), \end{aligned}$$
where \(( X_1, \dots , X_{i-1}, X_{i+1}, \dots , X_n ) {\mathop {=}\limits ^{d}} ( X_1', \dots , X_{n-1}')\) whenever the first \(n_1-1\) observations \(X_1', \dots , X_{n_1-1}'\) are i.i.d. with common d.f. \(F_1\) and the remaining \(X_i'\)s are i.i.d. with d.f. \(F_2\). Hence:
$$\begin{aligned} T_{n,-i} = L_{n,-i} + R_{n,-i}\, {\mathop {=}\limits ^{d}}\, L_{n-1} + R_{n-1}, \end{aligned}$$
where \(L_{n,-i} + R_{n,-i}\) denotes the decomposition of \(T_{n,-i}\) into a linear statistic and a remainder term and \(L_{n-1} + R_{n-1}\) is the decomposition when applying the statistic to the sample \(X_1, \dots , X_{n_1-1}, X_{n_1+1}, \dots , X_{n}\), i.e., when the last observation of the first sample is omitted. In the same vain, when omitting an arbitrary observation of second sample, the resulting decomposition of the statistic is equal in distribution to the decomposition obtained when omitting the last observation of the second sample. In particular, the second moment of the remainder term corresponding to \(T_{n,-i}\) does not depend on i. By (1), we can write \(T_n\) as follows:
$$\begin{aligned} T_n = \frac{n_1}{n} \frac{1}{n_1} \sum _{j=1}^{n_1} h_1(X_j) + \frac{n_2}{n} \frac{1}{n_2} \sum _{j=n_1+1}^{n} h_2(X_j) + R_n. \end{aligned}$$
To show the validity of the jackknife variance estimator, we shall focus on the first sample and put \(\xi _i = h_1(X_i)\), \(i = 1, \dots , n_1\).
Let \(R_{n,-i}\) be the remainder term arising in that decomposition when omitting the ith observation. This means that \(T_{n,-i} = L_{n,-i} + R_{n,-i}\) with \(L_{n,-i} = \frac{n_1-1}{n-1} \frac{1}{n_1-1} \sum _{j=1, j \not = i}^{n_1} h_1(X_j) + \frac{n_2}{n-1} \frac{1}{n_2} \sum _{j=n_1+1}^{n-1} h_2(X_j)\), if \(1 \le i \le n_1\) and \(L_{n,-i} = \frac{n_1}{n-1} \frac{1}{n_1} \sum _{j=1}^{n_1} h_1(X_j) + \frac{n_2-1}{n-1} \frac{1}{n_2-1} \sum _{j=n_1+1, j \not =i}^{n} h_2(X_j)\), if \(n_1+1 \le i \le n\), and \(R_{n,-i} = T_{n,-i} - L_{n,-i}\) for \(i = 1, \dots , n\). By assumption, \(n^2E(R_{n,-i})^2 = o(1)\) holds for all \(i = 1, \dots , n\). To show the validity of the jackknife variance estimator, we shall focus on the first sample and put \(\xi _i = h_1(X_i)\), \(i = 1, \dots , n_1\). By definition (6), the leave-one-out pseudo-values are then given by
$$\begin{aligned} {\widehat{ \xi }}_i = n T_n - (n-1) T_{n,-i} = h_1(X_i) + A_{ni}, \quad A_{ni} = nR_n - (n-1) {\widetilde{ R }}_{n-1,-i} \end{aligned}$$
with
$$\begin{aligned} {\widetilde{ R }}_{n-1,-i} {\mathop {=}\limits ^{d}} R_{n-1,-1}, \qquad 1 \le i \le n_1 \end{aligned}$$
(and \({\widetilde{ R }}_{n-1,-i}\, {\mathop {=}\limits ^{d}}\, R_{n-1,n-1} =: R_{n-1}\) for \(n_1+1 \le i \le n.\)). This means that
$$\begin{aligned} {\widehat{ \xi }}_i = \xi _i + A_{ni}, \qquad \max _{1 \le i \le n_1} n E(A_{ni}^2) = o(1). \end{aligned}$$
(10)
The same arguments using \(R_{n,-i} {\mathop {=}\limits ^{d}} R_{n,-n}\) show that (10) holds for all \(i = 1, \dots , n\). Since \(R_{n,-i} {\mathop {=}\limits ^{d}} R_{n-1}\), as well, we may conclude that
$$\begin{aligned} \max _{1 \le i \le n} E( A_{ni}^2 )&= \max _{1 \le i \le n} E( n R_n - (n-1) R_{n,-i} )^2 \\&= \max _{1 \le i \le n} E( n(R_n - R_{n,-i} )^2 + R_{n,-i} )^2 \\&= n^2 E( R_n - R_{n-1} )^2 + E( R_{n-1}^2 ) + 2 n E( (R_n-R_{n-1}) R_{n-1} ) \\&= o(1), \end{aligned}$$
where the last term is estimated by Cauchy–Schwarz inequality:
$$\begin{aligned} n | E( (R_n - R_{n-1}) R_{n-1} ) | \le \sqrt{ n^2 E( (R_n-R_{n-1})^2) } \sqrt{ E( R_{n-1}^2 ) } = o(1). \end{aligned}$$
Clearly, \(\tau _1^2\) can be estimated consistently by the sample variance of \(\xi _1, \dots , \xi _{n_1}\). We have to show that we may replace the \(\xi _i\)’s by the \({\widehat{ \xi }}_i\)’s. As shown above, \({\widehat{ \xi }}_i = \xi _i + A_n\), where \(E(A_n^2) = o(1)\). Hence:
$$\begin{aligned} E|{\widehat{ \xi }}_i^2 - \xi _i^2| = E | (\xi _i + A_n)^2 - \xi _i^2 | = 2 E| \xi _i A_n | + E (A_n^2) \le 2 \sqrt{ E( \xi _1^2 ) } \sqrt{ E (A_n^2) } + o(1) = o(1), \end{aligned}$$
for \(i = 1, \dots , n\), such that the \(L_1\)-convergence of the sample moment of the squares and of the squared sample moments follows from Lemma A.3. We can, therefore, conclude that
$$\begin{aligned} E \left| \frac{1}{n_1} \sum _{j=1}^{n_1} ( {\widehat{ \xi }}_j - \overline{{\widehat{ \xi }}}_{1:n_1} )^2 - \frac{1}{n_1} \sum _{j=1}^{n_1} ( \xi _j - \overline{\xi }_{1:n_1} )^2 \right| =o(1), \end{aligned}$$
as \(n_1 \rightarrow \infty\). Combining this with the fact that the sample variance \(\frac{1}{n_1} \sum _{j=1}^{n_1} \xi _j^2 - ( \frac{1}{n_1} \sum _{j=1}^{n_1} \xi _j )^2\) of the \(\xi _j\)’s is \(L_1\)-consistent if \(E| \xi _1 |^4 < \infty\), we obtain the \(L_1\)-consistency of the jackknife variance estimator for \(\tau _1^2\):
$$\begin{aligned} {\widehat{ \tau }}^2_{1} = \frac{1}{n_1-1} \sum _{j=1}^{n_1} ( {\widehat{ \xi }}_j - \overline{{\widehat{ \xi }}}_{1:n_1} )^2, \end{aligned}$$
which implies the (weak) consistency in the sense that
$$\begin{aligned} \left| \frac{ {\widehat{ \tau }}^2_{1} }{ \tau _1^2 } - 1 \right| = \frac{ | {\widehat{ \tau }}^2_{1} - \tau _1^2 | }{ \tau _1^2 } {\mathop {\rightarrow }\limits ^{P}} 1, \end{aligned}$$
(11)
as \(n_1 \rightarrow \infty\), and also yields the asymptotic unbiasedness by taking the expectation of the left-hand side of (11).
In the same vain, the jackknife variance estimator \({\widehat{ \tau }}^2_{2}\) for \(\tau _2^2\) is \(L_1\)-consistent, weakly consistent, and asymptotically unbiased. Consequently, we may estimate the asymptotic variance of \(T_n\) by the jackknife variance estimator:
$$\begin{aligned} {\widehat{ \sigma }}^2_n(T) = \frac{n_1}{n} {\widehat{ \tau }}^2_{1} + \frac{n_2}{n} {\widehat{ \tau }}^2_{2}, \end{aligned}$$
and we obtain the following:
$$\begin{aligned} E| {\widehat{ \sigma }}^2_n(T) - \sigma ^2(T) | = o(1), \end{aligned}$$
as \(\min (n_1,n_2) \rightarrow \infty\),
$$\begin{aligned} \left| \frac{ {\widehat{ \sigma }}^2_n(T) }{ \sigma ^2(T) } - 1 \right| \rightarrow 0, \end{aligned}$$
in probability, as \(\min (n_1,n_2) \rightarrow \infty\) and the asymptotic unbiasedness:
$$\begin{aligned} \left| \frac{ E{\widehat{ \sigma }}^2_n(T) }{ \sigma ^2(T) } - 1 \right| \rightarrow 0, \end{aligned}$$
as \(\min (n_1,n_2) \rightarrow \infty\). \(\square\)

2.2 Equal sample sizes \(N = n_1 = n_2\)

For equal sample sizes \(N = n_1 = n_2\), such that \(n = 2N\), one may, of course, apply the jackknife variance estimator discussed above. However, for such balanced two-sample designs, one can propose a simpler jackknife variance estimator when viewing \(T_n\) as a statistic of the N pairs:
$$\begin{aligned} Z_j = ( X_{1j}, X_{2j} ), \qquad j = 1, \dots , N. \end{aligned}$$
Thus, we write
$$\begin{aligned} T_n = T_N( Z_1, \dots , Z_N ) \end{aligned}$$
to make the dependence on the N pairs of random vectors explicit in our notation. Let
$$\begin{aligned} T_{N,-i}&= T_{N-1}( Z_1, \dots , Z_{i-1}, Z_{i+1}, \dots , Z_N ) \\&= T_{2N-2}( X_{11}, \dots , X_{1,i-1}, X_{1,i+1}, \dots , X_{1N}, X_{21}, \dots , X_{2,i-1}, X_{2,i+1}, \dots , X_{2N} ) \end{aligned}$$
denote the leave-one-pair-out statistics and define the leave-one-pair-out pseudo-values:
$$\begin{aligned} {\widehat{ \xi }}_i = N T_N( Z_1, \dots , Z_N ) - (N-1) T_{N,-i}. \end{aligned}$$
The jackknife variance estimator for the asymptotic variance \(\sigma ^2(T)\) is now given by the following:
$$\begin{aligned} {\widehat{ \sigma }}_N^2(T) = \frac{1}{N-1} \sum _{i=1}^N ( {\widehat{ \xi }}_i - \overline{{\widehat{ \xi }}}_N )^2 = (N-1) \sum _{i=1}^N ( T_{N,-i} - \overline{T_{N,-\bullet }} )^2, \end{aligned}$$
(12)
and the corresponding jackknife variance estimator of \({\mathrm{Var\,}}( T_N )\) is as follows:
$$\begin{aligned} {\widehat{ {\mathrm{Var\,}} }}(T_N) = \frac{{\widehat{ \sigma }}_N^2(T)}{ N } = \frac{N-1}{N} \sum _{i=1}^N ( T_{N,-i} - \overline{T_{N,-\bullet }} )^2. \end{aligned}$$
(13)

Theorem 2.2

For equal sample sizes, the jackknife estimators (12) and (13) are consistent and asymptotically unbiased.

Proof

The proof goes along the lines of the proof of Theorem 2.1. Therefore, we indicate the main changes and simplifications. Observe that, for equal sample sizes, the representation of \(T_n\) in terms of a linear statistic and a negligible remainder term simplifies to
$$\begin{aligned} T_N = \frac{1}{N} \sum _{j=1}^N \frac{1}{2} [h(X_{1j}) + h( X_{2j} )] + R_N = \frac{1}{N} \sum _{j=1}^N h(Z_j) + R_N, \quad N E(R_N^2) = o(1), \end{aligned}$$
with kernel function
$$\begin{aligned} \xi (z) = \frac{1}{2} [h_1( z_1 ) + h_2(z_2)], \qquad z = (z_1, z_2)' \in {\mathbb {R}}^2, \end{aligned}$$
not depending on n. The asymptotic variance of \(T_N\) is now given by the following:
$$\begin{aligned} \sigma ^2(T) = \lim _{N \rightarrow \infty } {\mathrm{Var\,}}( \sqrt{N} T_N ) = {\mathrm{Var\,}}( \xi ( Z_1 ) ). \end{aligned}$$
It follows that the leave-one-out pseudo-values satisfy the following:
$$\begin{aligned} {\widehat{ \xi }}_i = N T_N - (N-1) T_{N,-i} = \xi (Z_i) + A_N, \end{aligned}$$
where arguments as detailed in the proof of Theorem 2.1 show that the remainder term \(A_N\) satisfies \(N E( A_N^2) = o(1)\) and \(N^2 E( R_N - R_{N-1})^2 = o(1)\), as \(N \rightarrow \infty\). Therefore, by repeating the arguments given there for the general case of unequal sample sizes, we see that a \(L_1\)-consistent, weakly consistent, and asymptotically unbiased estimator of \(\sigma ^2(T)\) is given by the sample variance of the pseudo-values \({\widehat{ \xi }}_1, \dots , {\widehat{ \xi }}_N\), and hence, the consistency of
$$\begin{aligned} {\widehat{ \sigma }}_N^2(T) = \frac{1}{N-1} \sum _{i=1}^N ( {\widehat{ h }}_i - \overline{{\widehat{ h }}}_N )^2 = (N-1) \sum _{i=1}^N ( T_{N,-i} - \overline{T_{N,-\bullet }} )^2 \end{aligned}$$
follows, which coincides with the proposed jackknife estimator. \(\square\)

Remark 2.2

In (12), the variance estimator leading to unbiased estimation for i.i.d. samples is used. Using the sample variance formula gives instead
$$\begin{aligned} \widetilde{\sigma }_N^2(T) = \frac{ (N-1)^2 }{N} \sum _{i=1}^N ( T_{N,-i} - \overline{T_{N,-\bullet }} )^2 \end{aligned}$$
(14)
and
$$\begin{aligned} \widetilde{{\mathrm{Var\,}}}(T_N) = \frac{ {\widetilde{ \sigma }}_N^2(T) }{ N } = \left( \frac{N-1}{N} \right) ^2 \sum _{i=1}^N ( T_{N,-i} - \overline{T_{N,-\bullet }} )^2. \end{aligned}$$
(15)
Observe that we may put any non-random factor \(f_N\) in front of the sum, \(\sum _{i=1}^N ( T_{N,-i} - \overline{T_{N,-\bullet }} )^2\), in (13) which satisfies \(f_N \rightarrow 1\), as \(N \rightarrow \infty\). The choice \((N-1)/N\) is usually justified by the fact that then the jackknife variance estimator matches the formula used for the arithmetic mean; see (Efron and Tibshirani 1993, p. 142). The same comments apply to the formulas provided for unequal sample sizes.
For equal sample sizes, we may also directly refer to the results of Shao and Wu (1989) to obtain the consistency of the delete-d jackknife variance estimator. Let \(S_{N,r}\) be the collection of subsets of \(\{ 1, \dots , N \}\) which have size \(r = N-d\). For \(s = \{ i_1, \dots , i_r \} \in S_{N,r}\), let \(T_N^{(s)} = T_N( \xi _{i_1}, \dots , \xi _{i_r} )\). The delete-d jackknife variance estimator is then defined by the following:
$$\begin{aligned} {\widehat{ \sigma }}_{J(d)}^2 = \frac{ r }{ d {N \atopwithdelims ()d} } \sum _{s \in S_{N,r}} ( T_N^{(s)} - T_N)^2. \end{aligned}$$

Corollary 2.1

Suppose that\(N E( R_N^2 ) = o(1)\)and\(d = d_N\)satisfies
$$\begin{aligned} d/N \ge \varepsilon _0 \quad \hbox { for some}\ \varepsilon _0 > 0 \end{aligned}$$
and\(r = N-d \rightarrow \infty\). Then, the delete-djackknife variance estimator\({\widehat{ \sigma }}_{J(d)}^2\)is consistent in the sense that\(N {\widehat{ \sigma }}_{J(d)}^2 - \sigma ^2 = o_P(1)\)and asymptotically unbiased in the sense that\(N {\widehat{ \sigma }}_{J(d)}^2 - \sigma ^2 = o(1)\), as\(N \rightarrow \infty\).

Proof

We have the decomposition \(T_N = L_N + R_N\), where the linear statistic \(L_N\) can be written as \(L_N = \frac{1}{N} \sum _{i=1}^N \xi _i\) with \(\xi _i = \frac{1}{2} [ h_1(X_{1i}) + h_2(X_{2i}) ]\), \(i = 1, \dots , N\). The proof of Theorem 1 in Shao and Wu (1989) uses the representation \(N {\widehat{ \sigma }}_{J(d)}^2 = \frac{N r}{d {N \atopwithdelims ()d}} \sum _s (L(s) - U_s)^2\), where \(L(s) = \frac{1}{r} \sum _{i \in s} \xi _i - \frac{1}{N} \sum _{i=1}^N \xi _i\) and \(U_s = R_N - R_{N,s}\) with \(R_{N,s}\) the remainder term associated with \(T_N^{(s)}\), and then only refers to the properties of L(s) and \(U_s\) and does not refer to the original definition of \(T_N\). Since the linear term has the same form as in Shao and Wu (1989), the proof directly carry overs and the stated conditions are sufficient to ensure (Shao and Wu 1989, (3.3)), cf. Corollary 1 therein. \(\square\)

2.3 Two-sample Gâteaux and Fréchet-differentiable functionals

Let us first study the case of statistics induced by sufficiently smooth statistical functionals. Let \({\mathcal {F}}\) be the (convex) set of distribution functions on \({\mathbb {R}}\) and let \({\mathcal {D}}= \{ (G_1,G_2) -(H_1,H_2) : (G_1, G_2), (H_1,H_2) \in {\mathcal {F}}^2 \}\) be the linear space associated with \({\mathcal {F}}^2 = {\mathcal {F}}\times {\mathcal {F}}\). \(d \delta _x\), \(x \in {\mathbb {R}}\), denotes the Dirac measure in x with distribution function \(\delta _x(z) = {\mathbf {1}}( x \le z )\), \(z \in {\mathbb {R}}\).

Denote by \({\widehat{ F }}_{n_i}^{(i)}(x) = \frac{1}{n_i} \sum _{j=1}^{n_i} {\mathbf {1}}( X_{ij} \le x )\), \(x \in {\mathbb {R}}\), the empirical distribution function of the ith sample, \(i = 1, 2\). A functional \(T: {\mathcal {F}}\times {\mathcal {F}}\rightarrow {\mathbb {R}}\) induces a two-sample statistic, namely \(T_n = T( {\widehat{ F }}_{n_1}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} )\). A special case of interest is when T is additive with respect to the distribution functions, i.e., \(T_n = T_1( {\widehat{ F }}_{n_1}^{(1)} ) + T_2( {\widehat{ F }}_{n_2}^{(2)} )\) for two one-sample functionals \(T_1, T_2 : {\mathcal {F}}\rightarrow {\mathbb {R}}\), which we shall call components in what follows.

We will first consider statistics induced by a Gâteaux-differentiable statistical functional. Here, additional assumptions are required to obtain consistency of the jackknife variance estimator. Fréchet differentiability is a stronger notion and implies the consistency of the jackknife.

Definition 2.1

A two-sample functional \(T : {\mathcal {F}}\times {\mathcal {F}}\rightarrow {\mathbb {R}}\) is called Gâteaux differentiable at \((G_1, G_2) \in {\mathcal {F}}^2\) with Gâteaux derivative \(L_{(G_1,G_2)} : {\mathcal {D}}\rightarrow {\mathbb {R}}\), if
$$\begin{aligned} \lim _{t \rightarrow 0} \frac{ T( (G_1,G_2) + t (D_1 D_2) ) - T(G_1,G_2) - L_{(G_1,G_2)}( t (D_1, D_2) ) }{ t } = 0 \end{aligned}$$
holds for all \((D_1, D_2) \in {\mathcal {D}}\). T is called continuously Gâteaux differentiable at \((G_1, G_2)\), if, for any sequence \(t_k \rightarrow 0\) and for all sequences \(( G_1^{(k)}, G_2^{(k)} )\), \(k \ge 1\), with \(\max _{i=1,2} \Vert G_i^{(k)} - G_i \Vert _\infty \rightarrow 0\), as \(k \rightarrow \infty\):
$$\begin{aligned} \sup _{x,y \in {\mathbb {R}}} \left| \frac{ T( (G_1^{(k)},G_2^{(k)}) + t_k ( \delta _x - G_1^{(k)}, \delta _y - G_2^{(k)} ) ) - T( G_1^{(k)}, G_2^{(k)} ) }{ t_k } - L_{(G_1,G_2)}( \delta _x - G_1^{(k)}, \delta _y - G_2^{(k)} ) \right| \end{aligned}$$
converges to 0, as \(k \rightarrow \infty\).
If T is additive with (continuously) differentiable components \(T_1\) and \(T_2\), then \(L_{(G_1,G_2)}( H_1, H_2 ) = L^{(1)}_{G_1}(H_1) + L^{(2)}_{G_2}(H_2)\), where \(L^{(i)}_{G_i}\) is the Gâteaux derivative of \(T_i\), \(i = 1, 2\). In general, the linearization of the two-sample statistic \(T( {\widehat{ F }}_{n_1}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} )\) induced by a (continuously) Gâteaux differentiable functional \(T(F_1,F_2)\) is given by the following:
$$\begin{aligned} L_{(F_1,F_2)}( {\widehat{ F }}_{n_1}^{(1)} - F_1, {\widehat{ F }}_{n_2}^{(2)} - F_2 ) = \frac{1}{n_1n_2} \sum _{i=1}^{n_1} \sum _{j=1}^{n_2} L_{(F_1,F_2)}( \delta _{X_{1i}} - F_1, \delta _{X_{2j}} - F_2 ), \end{aligned}$$
whereas, for an addditive statistical functional, this formula simplifies to
$$\begin{aligned} L_n^{(T)} = \frac{1}{n_1} \sum _{j=1}^{n_1} \psi _1( X_{1j} ) + \frac{1}{n_2} \sum _{j=1}^{n_2} \psi _2( X_{2j} ), \end{aligned}$$
(16)
where \(\psi _i(x) = L_{F_i}^{(i)}( \delta _{x} - F_i )\), \(i = 1,2\), such that \(E \psi _i(X_{i1} ) = 0\), since, e.g., \(E( L_{F_1}^{(1)}( \delta _{X_{1i}} - F_1 ) = L_{F_1}^{(1)}( E( {\mathbf {1}}( X_{1i} \le \cdot ))- F_1( \cdot ) ) = 0\), \(i = 1, \dots , n_1\), by linearity.
In general, the requirement that T is Gâteaux differentiable is too weak to entail a central limit theorem. Therefore, from now on, we assume, as in Shao (1993), where the one-sample setting is studied, that \(T_n = T( {\widehat{ F }}_{n_1}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} )\) is asymptotically normal:
$$\begin{aligned} \sqrt{n}( T_n - T(F_1,F_2) ) {\mathop {\rightarrow }\limits ^{d}} N( 0, \sigma ^2(T) ), \end{aligned}$$
with asymptotic variance \(\sigma ^2(T)\), where
$$\begin{aligned} \sigma ^2(T) = \lim _{n \rightarrow \infty } \frac{n}{n_1} {\mathrm{Var\,}}( \psi _1( X_{11} ) ) + \frac{n}{n_1} {\mathrm{Var\,}}( \psi _2( X_{21} ) ) = \lambda _1^{-1} {\mathrm{Var\,}}( \psi _1( X_{11} ) ) + \lambda _2^{-1} {\mathrm{Var\,}}( \psi _2( X_{21} ) ), \end{aligned}$$
(17)
if \(n \rightarrow \infty\) with \(n_i / n \rightarrow \lambda _i\), \(i = 1, 2\). Consequently, (5) and (17) coincide.

The following lemma shows that, under weak conditions, the linearization \(L_n^{(T)}\) is asymptotically as required in the previous sections and clarifies the relationship between the functions \(\psi _1, \psi _2\) in (16) and the functions \(h_1, h_2\) in (2).

Lemma 2.1

Let\(T_n\)be a two-sample statistic induced by an additive functional. If\(E( \psi _i^2( X_{i1} ) ) < \infty\)and\(n_i/n \rightarrow \lambda _i\), \(i = 1, 2\), then
$$\begin{aligned} L_n^{(T)} = \frac{1}{n} \left\{ \sum _{j=1}^{n_1} \lambda _1^{-1} \psi _1( X_{1j} ) + \sum _{j=1}^{n_2} \lambda _2^{-1} \psi _2( X_{2j} ) \right\} + R_n^{(T)}, \end{aligned}$$
where\(E( \sqrt{n} R_n^{(T)} )^2 = o(1)\), as\(n \rightarrow \infty\), such that\(h_i = \lambda _i \psi _i\) , \(i = 1, 2\). Therefore, one obtains the representation (2) when putting\(h_i(x) = \lambda _i^{-1} \psi _i(x)\), \(x \in {\mathbb {R}}\), \(i = 1, 2\).

Proof

We have \(L_n^{(T)} = \sum _{i=1,2} \frac{1}{n} \frac{n}{n_i} \sum _{j=1}^{n_i} \psi _i( X_{ij} )\), and therefore:
$$\begin{aligned} L_n^{(T)} = \frac{1}{n} \left[ \sum _{j=1}^{n_1} \lambda _1^{-1} \psi _1( X_{1j} ) + \sum _{j=1}^{n_2} \lambda _2^{-1} \psi _2( X_{2j} ) \right] + R_n^{(T)}, \end{aligned}$$
where \(\sqrt{n} R_n^{(T)} = \sum _{i=1,2} \sqrt{\frac{n_i}{n}} \left( \frac{n}{n_i} - \lambda _i^{-1} \right) \frac{1}{\sqrt{n_i}} \sum _{j=1}^{n_i} \psi _i( X_{ij} )\) satisfies \(E( \sqrt{n} R_n^{(T)} )^2 = o(1)\), since \(E( \frac{1}{\sqrt{n_i}} \sum _{j=1}^{n_i} \psi _i( X_{ij} ))^2 = O(1)\), as \(n_i \rightarrow \infty\), \(i = 1, 2\). \(\square\)

Theorem 2.3

Assume thatTis a continuously Gâteaux differentiable two-sample functional with
$$\begin{aligned} E( L_{(F_1,F_2)}( \delta _{X_1} - F_1, 0 ) ) = E (L_{(F_1,F_2)}( 0, \delta _{X_2} - F_2 ) ) = 0 \end{aligned}$$
and
$$\begin{aligned} E( L_{(F_1,F_2)}( \delta _{X_1} - F_1, 0 )^2 ), E (L_{(F_1,F_2)}( 0, \delta _{X_2} - F_2 )^2 ) < \infty . \end{aligned}$$
Then
$$\begin{aligned} {\widehat{ \tau }}_i^2 {\mathop {\rightarrow }\limits ^{a.s.}} {\mathrm{Var\,}}( L_{(F_1,F_2)}( \delta _{X_{i1}} - F_i, 0 ), \end{aligned}$$
as\(n_i \rightarrow \infty\), \(i = 1, 2\), and therefore:
$$\begin{aligned} {\widehat{ \tau }}_2^2 + {\widehat{ \tau }}_2^2 {\mathop {\rightarrow }\limits ^{a.s.}} \sigma ^2(T), \end{aligned}$$
as\(n \rightarrow \infty\), provided\(n_i/n \rightarrow \lambda _i\), \(i = 1, 2\).

Proof

The general method of the proof is as in Shao (1993). We show the assertion for \({\widehat{ \tau }}_1^2\). Observe that for \(i = 1, \dots , n_1\):
$$\begin{aligned} {\widehat{ \xi }}_i = n T( {\widehat{ F }}_{n_1}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} ) - (n-1) T( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} ). \end{aligned}$$
Hence, since the sample variance is location invariant:
$$\begin{aligned} {\widehat{ \tau }}_1^2&= \frac{1}{n_1-1} \sum _{i=1}^{n_1} ( {\widehat{ \xi }}_{i} - \overline{{\widehat{ \xi }}}_{1:n_1} )^2 \\&= \frac{(n-1)^2}{n_1-1} \sum _{i=1}^{n_1} \left( T( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} ) - \frac{1}{n_1} \sum _{k=1}^{n_1} T( {\widehat{ F }}_{n_1,-k}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} ). \right) ^2 \end{aligned}$$
Since \((n_1-1) {\widehat{ F }}_{n_1,-i}^{(1)} = n_1 {\widehat{ F }}_{n_1}^{(1)} - \delta _{X_{1i}}\), such that
$$\begin{aligned} {\widehat{ F }}_{n_1,-i}^{(1)} = \frac{n_1}{n_1-1} {\widehat{ F }}_{n_1}^{(1)} - \frac{1}{n_1-1} \delta _{X_{1i}} = {\widehat{ F }}_{n_1}^{(1)} - \frac{1}{n_1-1} ( \delta _{X_{1i}} - {\widehat{ F }}_{n_1}^{(1)} ), \end{aligned}$$
we obtain
$$\begin{aligned} ( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} ) = ( {\widehat{ F }}_{n_1}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} ) - t_n \left( \delta _{ X_{1i} } - {\widehat{ F }}_{n_1}^{(1)}, 0 \right) \end{aligned}$$
with \(t_n = 1/(n_1-1)\). Because \(\max _{i=1,2} \Vert {\widehat{ F }}_{n_i}^{(i)} - F_i \Vert _\infty \rightarrow 0\), \(n \rightarrow \infty\), a.s., and since T is continuously Gâteaux differentiable, we may conclude that
$$\begin{aligned} \max _{1 \le i \le n_1} \left| \frac{ T( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} ) - T( {\widehat{ F }}_{n_1}, {\widehat{ F }}_{n_2} ) }{ t_n } - L_{(F_1,F_2)}( \delta _{X_{1i}} - {\widehat{ F }}_{n_1}^{(1)}, 0 ) \right| \rightarrow 0, \end{aligned}$$
(18)
as \(n \rightarrow \infty\), w.p. 1. By linearity:
$$\begin{aligned} L_{(F_1,F_2)}( \delta _{X_{1i}} - {\widehat{ F }}_{n_1}^{(1)}, 0 )&= L_{(F_1,F_2)}( \delta _{X_{1i}}, 0 ) - \frac{1}{n_1} \sum _{j=1}^{n_1} L_{(F_1,F_2)}( \delta _{X_{1j}}, 0 ) = Z_i - \overline{Z}_{n_1}, \end{aligned}$$
if we put \(Z_i = L_{(F_1,F_2)}( \delta _{X_{1i}}, 0 )\), \(i = 1, \dots , n_1\), and \(\overline{Z}_{n_1} = \frac{1}{n_1} \sum _{j=1}^{n_1} Z_i\). Since the \(Z_i\), \(1 \le i \le n_1\), are i.i.d. centered with finite variance, the strong law of large numbers implies \(\frac{1}{n_1} \sum _{i=1}^{n_1} Z_i^r \rightarrow E Z_1^r\), a.s., \(r = 1, 2\), such that
$$\begin{aligned} \frac{1}{n_1} \sum _{i=1}^{n_1} (Z_i - \overline{Z}_{n_1})^2 - {\mathrm{Var\,}}(Z_1) \rightarrow 0, \end{aligned}$$
(19)
as \(n \rightarrow \infty\), a.s.. (18) means that
$$\begin{aligned} \max _{1 \le i \le n_1} \left| (n_1-1) [ T( {\widehat{ F }}_{n_1}, {\widehat{ F }}_{n_2} ) - T( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} ) ] - (Z_i - \overline{Z}_{n_1}) \right| \rightarrow 0, \end{aligned}$$
(20)
as \(n \rightarrow \infty\), a.s.. Combining this with \(| \overline{Z}_n - E(Z_1) | {\mathop {\rightarrow }\limits ^{a.s.}} 0\), as \(n \rightarrow \infty\), we arrive at
$$\begin{aligned} \max _{1 \le i \le n_1} | (n_1-1) [ T( {\widehat{ F }}_{n_1}, {\widehat{ F }}_{n_2} ) - T( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} ) ] - (Z_i - \mu ) | {\mathop {\rightarrow }\limits ^{a.s.}} 0, \end{aligned}$$
as \(n \rightarrow \infty\). It follows that the sample variance of the random variables \(V_{n_1,-i} = (n_1-1) T( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} )\), \(1 \le i \le n_1\), converges a.s. to \({\mathrm{Var\,}}( Z_1 )\), as well. However, this implies that
$$\begin{aligned} {\widehat{ \tau }}_1^2 = \left( \frac{n-1}{n_1-1} \right) ^2 \frac{1}{n_1-1} \sum _{i=1}^{n_1} \left( V_{n_1,-i} - \frac{1}{n_1} \sum _{k=1}^{n_1} V_{n_1,-k} \right) ^2 {\mathop {\rightarrow }\limits ^{a.s.}} \lambda _1^{-2} {\mathrm{Var\,}}( Z_1 ), \end{aligned}$$
as \(n \rightarrow \infty\). Recalling the representation (17) the a.s. convergence of \({\widehat{ \tau }}_1^2 + {\widehat{ \tau }}_2^2 \rightarrow \sigma ^2(T)\), as \(n \rightarrow \infty\), follows. \(\square\)

Corollary 2.2

Under the conditions of Theorem 2.3, the jackknife variance estimator defined in (7) is consistent for\(\sigma ^2(T) = \lambda _1^2 {\mathrm{Var\,}}( h_1( X_{11} )) + \lambda _2^2 {\mathrm{Var\,}}( h_2( X_{21} ))\), since
$$\begin{aligned} {\widehat{ \tau }}_i^2 {\mathop {\rightarrow }\limits ^{a.s.}} E( h_i^2(X_{i1} ) ), \end{aligned}$$
as\(n \rightarrow \infty\), for\(i = 1, 2\).

Proof

For an additive functional, \(L_{(F_1,F_2)}(H_1, H_2) = L_{F_1}^{(1)}( H_1 ) + L_{F_2}^{(2)}( H_2 )\), such that \(Z_1 = L_{F_1}( \delta _{X_{11}} - F_1 ) = \psi _1( X_{11} )\), and hence, \({\mathrm{Var\,}}( Z_1 ) = {\mathrm{Var\,}}( \psi _1( X_{11} ) )\). As shown in Lemma 2.1, \(\psi _1 = \lambda _1 h_1\), where \(h_1\) is as in (2), leading to \({\widehat{ \tau }}_1^2 {\mathop {\rightarrow }\limits ^{a.s.}} \lambda _1^{-2} {\mathrm{Var\,}}(Z_1) = {\mathrm{Var\,}}( h_1(X_{11} ) )\). \(\square\)

To summarize, under Gâteaux differentiability, the jackknife variance estimator works, provided that asymptotic normality holds. Let us now study the case of a two-sample statistic induced by a Fréchet-differentiable statistical functional.

Let \(\rho _1\) be a metric defined on \({\mathcal {D}}\) and introduce the metric:
$$\begin{aligned} \rho ( (G_1, G_2), (H_1, H_2) ) = \rho _1( G_1, H_1 ) + \rho _2( G_2, H_2 ), \end{aligned}$$
for \((G_1,H_1), (G_2, H_2) \in {\mathcal {D}}\times {\mathcal {D}}\), on \({\mathcal {D}}\times {\mathcal {D}}\).

Definition 2.2

A functional \(T: {\mathcal {F}}\times {\mathcal {F}} \rightarrow {\mathbb {R}}\) is called continuously Fréchet-differentiable at \((G_1,G_2) \in {\mathcal {F}}\times {\mathcal {F}}\) with respect to \(\rho _1\), if T is Fréchet-differentiable at \((G_1, G_2)\), i.e., there is a linear functional \(L_{(G_1,G_2)}\) on \({\mathcal {D}}\), such that, for all \((C_1,C_2) \in {\mathcal {C}}= \{ A \subset {\mathcal {D}}: A \ \text {bounded} \}\):
$$\begin{aligned} \lim _{t \rightarrow 0} \sup _{D \in {\mathcal {C}}} \frac{ T( (G_1, G_2) + t D ) - T(G_1,G_2) - L_{(G_1,G_2)}( t D ) }{t} = 0, \end{aligned}$$
and \(\rho (G_k,G) \rightarrow 0\) and \(\rho (H_k,H) \rightarrow 0\), as \(k \rightarrow \infty\), entail
$$\begin{aligned} \lim _{k \rightarrow \infty } \frac{ T( H_k ) - T(G_k) - L_G( H_k - G_k )}{ \rho (H_k, G_k) } = 0. \end{aligned}$$

Proper examples for the choice of \(\rho _1\), e.g., to ensure that sample quantiles are continuously Fréchet-differentiable with respect to \(\rho _1\), are discussed in Shao (1993).

Theorem 2.4

Suppose thatTis a continuously Fréchet-differentiable statistical functional with respect to a metric\(\rho _1\). Assume that
$$\begin{aligned} \rho _1( {\widehat{ F }}_{n_i}^{(i)}, F_i ) {\mathop {\rightarrow }\limits ^{a.s.}} 0, \end{aligned}$$
(21)
as\(n_i \rightarrow \infty\), \(i = 1, 2\), and
$$\begin{aligned} \sum _{j=1}^{n_i} \rho _1^2( {\widehat{ F }}_{n_i,-j}^{(i)}, {\widehat{ F }}_{n_i}^{(i)} ) = O( n_i^{-1} ), \end{aligned}$$
(22)
a.s., for\(i = 1, 2\). Then, the assertions of Theorem 2.3and Corollary 2.2hold true.

Proof

Put
$$\begin{aligned} R_{ni} = T( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} ) - T( {\widehat{ F }}_{n_1}^{(1)}, {\widehat{ F }}_{n_1}^{(2)} ) - L_{(F_1,F_2)}( {\widehat{ F }}_{n_1,-i}^{(1)} - {\widehat{ F }}_{n_1}^{(1)}, 0 ) \end{aligned}$$
for \(i = 1, \dots , n_1\), and \(\overline{R} = \frac{1}{n_1} \sum _{j=1}^{n_1} R_{nj}\). Here
$$\begin{aligned} L_{(F_1,F_2)}( {\widehat{ F }}_{n_1,-i}^{(1)} - {\widehat{ F }}_{n_1}^{(1)}, 0 ) = \frac{1}{n_1-1} \sum _{j=1, j \not =i}^{n_1} Z_j - \overline{Z}, \end{aligned}$$
with \(Z_i = L_{(F_1,F_2)}( \delta _{X_{1i}} - F_1, 0 )\), \(i = 1, \dots , n_1\), and \(\overline{Z} = \frac{1}{n_1} \sum _{i=1}^{n_1} Z_i\). Recall that
$$\begin{aligned} {\widehat{ \tau }}_1^2 = \frac{(n-1)^2}{n_1-1} \sum _{i=1}^{n_1} \left( T( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} ) - \overline{T( {\widehat{ F }}_{n_1,-\cdot }^{(1)}, {\widehat{ F }}_{n_2}^{(2)} )} \right) ^2, \end{aligned}$$
where \(\overline{T( {\widehat{ F }}_{n_1,-\cdot }^{(1)}, {\widehat{ F }}_{n_2}^{(2)} )} = \frac{1}{n_1} \sum _{i=1}^{n_1} T( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} ).\) Using
$$\begin{aligned} T( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} ) - \overline{T( {\widehat{ F }}_{n_1,-\cdot }^{(1)}, {\widehat{ F }}_{n_2}^{(2)} )} = R_{ni} - \overline{R} + \frac{1}{n_1-1} \sum _{j=1, j \not =i}^{n_1} Z_j - \overline{Z}, \end{aligned}$$
we obtain
$$\begin{aligned} {\widehat{ \tau }}_1^2&= \frac{(n-1)^2}{(n_1-1)^3} \sum _{i=1}^{n_1} (Z_i - \overline{Z})^2 + \frac{(n-1)^2}{n_1-1} \sum _{i=1}^{n_1} ( R_{ni} - \overline{R} )^2 \\&\qquad + 2 \frac{(n-1)^2}{n_1-1} \sum _{i=1}^{n_1} R_{ni} \frac{1}{n_1-1} \sum _{j=1,j \not =i }^{n_1} Z_j - \overline{Z}. \end{aligned}$$
Since \(\frac{(n-1)^2}{(n_1-1)^3} = O( \frac{1}{n-1} )\) and \(\frac{(n-1)^2}{n_1-1} = O( n-1 )\), the same arguments as in Shao (1993) entail that it suffices to show that
$$\begin{aligned} (n_1-1) \sum _{i=1}^{n_1} R_{ni}^2 {\mathop {\rightarrow }\limits ^{a.s.}} 0, \end{aligned}$$
(23)
as \(n_1 \rightarrow \infty\). First, observe that \(\rho ( ({\widehat{ F }}_{n_1,-i}^{(1)}, 0 ), (F_1,0 ) ) = \rho _1( {\widehat{ F }}_{n_1,-i}^{(1)}, F_1 )\) and
$$\begin{aligned} \max _{1 \le i \le n_1} \rho _1( {\widehat{ F }}_{n_1,-i}^{(1)}, F_1 ) \le \rho _1( {\widehat{ F }}_{n_1}^{(1)}, F_1 ) + \sqrt{ \max _{1 \le i \le n_1} \rho _1^2( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_1}^{(1)} ) } {\mathop {\rightarrow }\limits ^{a.s}} 0, \end{aligned}$$
as \(n_1 \rightarrow \infty\) by (21). Since T is continuously Fréchet-differentiable:
$$\begin{aligned} \frac{ T( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} ) - T( {\widehat{ F }}_{n_1}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} ) - L_{(F_1,F_2)}( {\widehat{ F }}_{n_1,-i}^{(1)} - {\widehat{ F }}_{n_1}^{(1)}, 0 ) }{ \rho _1( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_1}^{(1)} ) } = o(1), \end{aligned}$$
as \(n_1 \rightarrow \infty\), a.s.. Therefore, for any \(\varepsilon > 0\), there exists \(n_0 \in {\mathbb {N}}\), such that, for \(n \ge n_0\):
$$\begin{aligned} | T( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_2}^{(2)} ) - T( {\widehat{ F }}_{n_1}^{(1)}, {\widehat{ F }}_{n_1}^{(2)} ) - L_{(F_1,F_2)}( {\widehat{ F }}_{n_1,-i}^{(1)} - {\widehat{ F }}_{n_1}^{(1)}, 0 ) | \le \varepsilon \rho _1( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_1}^{(1)} ). \end{aligned}$$
It follows that \(R_{ni}^2 \le \varepsilon ^2 \rho _1^2( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_1}^{(1)} )\), such that
$$\begin{aligned} (n_1-1) \sum _{i=1}^{n_1} R_{ni}^2 \le \varepsilon ^2 (n_1-1) \sum _{i=1}^{n_1} \rho _1^2( {\widehat{ F }}_{n_1,-i}^{(1)}, {\widehat{ F }}_{n_1}^{(1)} ) = O( \varepsilon ), \end{aligned}$$
as \(n_1 \rightarrow \infty\), a.s., by virtue of (22), which completes the proof. \(\square\)

3 Review of estimation under ordered variances

Let \(X_{ij}, i=1,2, j= 1, \dots , n_i(\ge 2)\) be independent observations from the normal distribution with mean \(\mu _i\) and variance \(\sigma _i^2\), where both \(\mu _i\) and \(\sigma _i^2\) are unknown. Also let
$$\begin{aligned} \overline{X}_i = \frac{1}{n_i} \sum _{j=1}^{n_i} X_{ij}, \qquad S_i^2 = \frac{1}{n_i} \sum _{j=1}^{n_i} (X_{ij}- \overline{X}_i)^2, \qquad {\widetilde{ S }}_i^2 = \frac{n_i}{n_i-1} S_i^2 \end{aligned}$$
be the sample means, sample variances, and the associated unbiased variance estimators, which are frequently used for estimation of \(\mu _i\) and \(\sigma _i^2\), \(i = 1, 2\). When needed, we shall indicate the dependence on the sample sizes and write \(S_{n_i,i}^2\) for \(S_i^2\) and \(i = 1, 2\).
When \(\sigma _i^2 , i=1,2\) are known, to estimate the common mean, \(\mu = \mu _1 = \mu _2\), one may use the unbiased minimum variance estimator:
$$\begin{aligned} {\widehat{ \mu }}^{(0)} = \frac{ n_1 \sigma _2^2 }{ n_1 \sigma _2^2 + n_2 \sigma _1^2 } \overline{X}_1 + \frac{ n_2 \sigma _1^2 }{ n_1 \sigma _2^2 + n_2 \sigma _1^2 } \overline{X}_2. \end{aligned}$$
When variances are unknown, Graybill–Deal (GD) proposed, for the case of no order restriction, the estimator:
$$\begin{aligned} {\widehat{ \mu }}^{(1)} = \frac{ n_1 {\widetilde{ S }}_2^2 }{ n_1 {\widetilde{ S }}_2^2 + n_2 {\widetilde{ S }}_1^2 } \overline{X}_1 + \frac{ n_2 {\widetilde{ S }}_1^2 }{ n_1 {\widetilde{ S }}_2^2 + n_2 {\widetilde{ S }}_1^2 } \overline{X}_2. \end{aligned}$$
Here, the sample averages are weighted with random weights forming a convex combination as follows:
$$\begin{aligned} {\widehat{ \mu }} (\gamma ) = \gamma \overline{X}_1+ ( 1-\gamma ) \overline{X}_2, \end{aligned}$$
where \(\gamma\) is a function of \(S_1^2, S_2^2\) and possibly \((\overline{X}_1-\overline{X}_2)^2\).
Kubokawa (1989) has introduced a broad class of common mean estimators with \(\gamma\), given by
$$\begin{aligned} \gamma _{\psi } = 1 - \frac{ a }{ b R \psi ( {\widetilde{ S }}_1^2, {\widetilde{ S }}_2^2, (\overline{X}_1 - \overline{X}_2)^2 ) }, \end{aligned}$$
where \(R = ( b {\widetilde{ S }}_2^2 + c ( \overline{X}_1 - \overline{X}_2 )^2 ) / {\widetilde{ S }}_1^2\), \(\psi\) a positive function, and \(a, b, c \ge 0\) constants. For suitably chosen \(\psi , a, b\), and c, Kobokawa has given a sufficient condition on \(n_1\) and \(n_2\), so that \({\widehat{ \mu }}(\gamma _{\psi })\) is closer to \(\mu\) than \(\overline{X}_1\); see also Pitman (1937). Such estimators have also been studied by several authors assuming Gaussian samples (see Brown and Cohen (1974) and Bhattarcharya (1980), amongst others).

When there is an order restriction between both variances, Mehta and Gurland (1969) proposed three convex combination estimators for small samples and compare the efficiencies of proposed estimators with GD estimator. When order constraints on the variances apply, the question arises whether one can improve upon the above proposals. There is a rich literature on the general problem of estimation for constrained parameter spaces and we refer the reader to the monograph of van Eeden (2006).

For the common mean estimation problem studied here, the following results have been obtained, again under the assumption of normality: When there is a constraint of ordered variances, \(\sigma _1^2 \le \sigma _2^2\), Nair (1982) and Elfessi and Pal (1992) proposed the estimator
$$\begin{aligned} {\widehat{ \mu }}^{(2)} = {\widehat{ \mu }}^{(1)} {\mathbf {1}}( {\widetilde{ S }}_1^2 \le {\widetilde{ S }}_2^2 ) + \left[ \frac{n_1}{n_1+n_2} \overline{X}_1 + \frac{n_2}{n_1+n_2} \overline{X}_2 \right] {\mathbf {1}}( {\widetilde{ S }}_1^2 > {\widetilde{ S }}_2^2 ), \end{aligned}$$
called Nair's estimator in the sequel, and showed that \({\widehat{ \mu }}^{(2)}\) also stochastically dominates the GD estimator. For the special case of balanced sample sizes, \(n_1 = n_2\), Elfessi and Pal (1992) also proposed the estimator
$$\begin{aligned} {\widehat{ \mu }}^{(3)} = {\widehat{ \mu }}^{(1)} {\mathbf {1}}( {\widetilde{ S }}_1^2 \le {\widetilde{ S }}_2^2 ) + \left[ \frac{{\widetilde{ S }}_1^2}{{\widetilde{ S }}_1^2 + {\widetilde{ S }}_2^2} \overline{X}_1 + \frac{{\widetilde{ S }}_2^2}{ {\widetilde{ S }}_1^2 + {\widetilde{ S }}_2^2} \overline{X}_2 \right] {\mathbf {1}}( {\widetilde{ S }}_1^2 > {\widetilde{ S }}_2^2 ), \end{aligned}$$
which stochastically dominates the GD estimator, as well.
Chang et al. (2012) have shown, see their Theorem 2.1, that for ordered variances, \(\sigma _1^2 \le \sigma _2^2\), any such estimator can be further improved by replacing \(\gamma _n\) by some appropriately chosen \({\widetilde{ \gamma }}\) if \(\gamma _n < n_1/n\). This means that one uses the following:
$$\begin{aligned} \gamma ^+_n = \left\{ \begin{array}{ll} \gamma _n, &\quad{} \gamma _n \ge n_1/n, \\ {\widetilde{ \gamma }}, &\quad {} \gamma _n < n_1/n, \end{array} \right. \end{aligned}$$
instead of \(\gamma _n\). Chang et al. (2012) proved that the estimator \({\widehat{ \mu }}_n(\gamma ^+_n)\) stochastically dominates \({\widehat{ \mu }}_n(\gamma _n)\), if \(\gamma _n < n_1/n\) with positive probability and \({\widetilde{ \gamma }}\) satisfies the constraints:
$$\begin{aligned} n_1/n \le {\widetilde{ \gamma }} \le 2 n_1/n - \gamma _n. \end{aligned}$$
(24)
Similar results are obtained by Chang and Shinozaki (2015) under Pitman closeness criterion.
All results discussed above have been obtained assuming Gaussian samples. Motivated by the above findings, we consider, within a purely nonparametric setup, common mean estimators of the form:
$$\begin{aligned} {\widehat{ \mu }}_n(\gamma ) = \gamma _n \overline{X}_1 + (1-\gamma _n) \overline{X}_2, \end{aligned}$$
(25)
where the weight
$$\begin{aligned} \gamma _n = \gamma ( n_1/n, n_2/n, {\widetilde{ S }}_1, {\widetilde{ S }}_2, \overline{X}_1 \overline{X}_2 ) \end{aligned}$$
is a function of the sample fractions \(n_1/n, n_2/n\) and the statistic \(({\widetilde{ S }}_1, {\widetilde{ S }}_2, \overline{X}_1, \overline{X}_2 )\), which is the sufficient statistic under normality. To establish the required theoretical results, we need the following weak assumption on the weights \(\gamma _n\), which is satisfied by the above choices.
Assumption (\(\Gamma\)): \(X_{ij} \sim F_i\), \(1 \le j \le n_i\), \(i = 1,2\), are independent random samples with common means \(\mu = \mu _1 = \mu _2\) and arbitrary variances \(\sigma _1^2\) and \(\sigma _2^2\). The random weights \(\gamma _n = \gamma _n( n_1/n, n_2/n, {\widetilde{ S }}_1^2, {\widetilde{ S }}_2^2, \overline{X}_1, \overline{X}_2 )\) are either of the form:
$$\begin{aligned} \gamma _n = \left\{ \begin{array}{ll} \gamma ^\le , \qquad &{} {\widetilde{ S }}_1^2 \le {\widetilde{ S }}_2^2, \\ \gamma ^>, \qquad &{} {\widetilde{ S }}_1^2 > {\widetilde{ S }}_2^2, \end{array} \right. \end{aligned}$$
for two functions \(\gamma ^\le = \gamma ^\le ( n_1/n, n_2/n, {\widetilde{ S }}_1^2, {\widetilde{ S }}_2^2, \overline{X}_1, \overline{X}_2)\) and \(\gamma ^> = \gamma ^>( n_1/n, n_2/n, {\widetilde{ S }}_1^2, {\widetilde{ S }}_2^2, \overline{X}_1, \overline{X}_2)\), where \(\gamma ^\le ( \cdot )\) and \(\gamma ^>(\cdot )\) are three times continuously differentiable functions with bounded third partial derivatives, or of the form:
$$\begin{aligned} \gamma _n = \left\{ \begin{array}{ll} \gamma ^\le , \qquad &{} S_1^2 \le S_2^2, \\ \gamma ^>, \qquad &{} S_1^2 > S_2^2, \end{array} \right. \end{aligned}$$
for two functions \(\gamma ^\le = \gamma ^\le ( n_1/n, n_2/n, S_1^2, S_2^2, \overline{X}_1, \overline{X}_2)\) and \(\gamma ^> = \gamma ^>( n_1/n, n_2/n, S_1^2, S_2^2, \overline{X}_1, \overline{X}_2)\), where \(\gamma ^\le ( \cdot )\) and \(\gamma ^>(\cdot )\) are three times continuously differentiable functions with bounded third partial derivatives.

Example 3.1

Observe that
$$\begin{aligned} {\widehat{ \mu }}^{(1)} = \frac{ {\widetilde{ S }}_2^2}{ {\widetilde{ S }}_2^2 + \frac{n_2}{n} \frac{n}{n_1} {\widetilde{ S }}_1^2 } \overline{X}_1 + \frac{ {\widetilde{ S }}_1^2 }{ \frac{n_1}{n} \frac{n}{n_2} {\widetilde{ S }}_2^2 + {\widetilde{ S }}_1^2 } \overline{X}_2 = \gamma _4^{\le }( n_1/n, n_2/n,{\widetilde{ S }}_1^2, {\widetilde{ S }}_2^2, \overline{X}_1, \overline{X}_2 ) \end{aligned}$$
and
$$\begin{aligned} \frac{n_1}{n_1+n_2} \overline{X}_1 + \frac{n_2}{n_1+n_2} \overline{X}_2 = \frac{1}{ 1 + \frac{n_2}{n} \frac{n}{n_1} } \overline{X}_1 + \frac{1}{ \frac{n_1}{n} \frac{n}{n_2} + 1} \overline{X}_2 = \gamma _4^>( n_1/n, n_2/n,{\widetilde{ S }}_1^2, {\widetilde{ S }}_2^2, \overline{X}_1, \overline{X}_2 ), \end{aligned}$$
where the functions
$$\begin{aligned} \gamma _4^{\le }( a, b, s, t, \mu , \nu )&= \mu \frac{t}{ t + s (b/ a)} + \nu \frac{s}{t (a/b) + s}, \\ \gamma _4^{>}( a, b, s, t, \mu , \nu )&= \mu \frac{1}{1+b/a} + \nu \frac{1}{a/b + 1}, \end{aligned}$$
are defined for \((a,b,s,t,\mu , \nu ) \in (0,1)^2 \times (0,\infty )^2 \times {\mathbb {R}}^2\). It is easily seen that all the partial derivatives of order three exist and are bounded on compact subsets of the domain. For example:
$$\begin{aligned} \frac{\partial \gamma ^{\le }}{\partial t}&= -a \nu s / (b (a t / b + s) ^ 2) + \mu (-t / (b s / a + t) ^ 2 + 1 / (b s / a + t)), \\ \frac{\partial ^2 \gamma ^{\le }}{\partial t^2}&= 2 a ^ 2 \nu s / (b ^ 2 (a t / b + s) ^ 3) + \mu (2 t / (b s / a + t) ^ 3 - 2 / (b s / a + t) ^ 2), \\ \frac{\partial ^2 \gamma ^{\le }}{\partial t^2 \partial s}&=2 a ^ 2 \nu (-3 s / (a t / b + s) ^ 4 + 1 / (a t / b + s) ^ 3) / b ^ 2 + \mu (4 b / (a (b s / a + t) ^ 3) - 6 b t / (a (b s / a + t) ^ 4)), \end{aligned}$$
where all the expressions appearing in a denominator are positive. Therefore, \({\widehat{ \gamma }}^{(4)}\) attains the representation:
$$\begin{aligned} {\widehat{ \gamma }}^{(4)} = \gamma _4^{\le }( n_1/n, n_2/n,{\widetilde{ S }}_1^2, {\widetilde{ S }}_2^2, \overline{X}_1, \overline{X}_2 ) {\mathbf {1}}( {\widetilde{ S }}_1^2 \le {\widetilde{ S }}_2^2 ) + \gamma _4^>( n_1/n, n_2/n,{\widetilde{ S }}_1^2, {\widetilde{ S }}_2^2, \overline{X}_1, \overline{X}_2 ) {\mathbf {1}}( {\widetilde{ S }}_1^2 > {\widetilde{ S }}_2^2 ) \end{aligned}$$
and satisfies Assumption \((\Gamma )\).

Remark 3.1

As discussed above in greater detail, we have in mind the case of ordered variances and our results aim at contributing to the problem of common mean estimation under ordered variances. However, since it turns out that many results also hold for unequal variances without order restriction, we omit the order restriction in Assumption \((\Gamma )\).

Remark 3.2

Additional assumptions on the underlying distributions will be stated where needed.

Remark 3.3

When discussing the case of equal sample sizes \(N = n_1 = n_2\), we shall index all the quantities by N instead of n. Furthermore, we may and will redefine \(\gamma _N\) as well as \(\gamma ^\le\) and \(\gamma ^>\) to be functions of
$$\begin{aligned} {\widehat{ \theta }}_N = ( N/n, {\widetilde{ S }}_1^2, {\widetilde{ S }}_2^2, \overline{X}_1, \overline{X}_2 )', \end{aligned}$$
which converges a.s. under Assumption (\(\Gamma\)) to
$$\begin{aligned} \theta = (1/2, \sigma _1^2, \sigma _2^2, \mu _1, \mu _2 )', \end{aligned}$$
as \(N \rightarrow \infty\).

4 Variance estimation and asymptotic distribution theory for common mean estimators

As already indicated in the introduction, there is a lack of results about the estimation of the variance of the common mean estimators discussed in the literature. For the case of normal populations, Nair (1980) calculated the variance of the GD estimator for two populations and Voinov (1984) extended the result to the case of several samples. Mehta and Gurland (1969) gave formulas for the variances of their common mean estimators. The issue of unbiased estimation of the variance for the GD estimator has been studied in Voinov (1984) and Sinha (1985). All of those results, however, heavily rely on the normal assumption.

Therefore, we propose to use the nonparametric jackknife variance estimator studied in the previous section, which is applicable for a wide class common mean estimators defined by convex combinations of the sample means with random weights. We shall show that the jackknife is weakly consistent and asymptotically unbiased under fairly weak conditions without requiring normally distributed observations.

To make the dependence on the data explicit, we denote the common mean estimator by the following:
$$\begin{aligned} {\widehat{ \mu }}_n(\gamma ) = {\widehat{ \mu }}_{n_1,n_2}( X_{11}, \dots , X_{1n_1}; X_{21}, \dots , X_{2n_2} ). \end{aligned}$$
The leave-one-out estimates corresponding to the first sample are given by the following:
$$\begin{aligned} {\widehat{ \mu }}_{n,-i}^{(1)} = {\widehat{ \mu }}_{n_1-1,n_2}( X_{11}, \dots , X_{1,i-1}, X_{1,i+1}, \dots , X_{1n_1}; X_{21}, \dots , X_{2n_2} ), \end{aligned}$$
\(i = 1, \dots , n_1\), and those for the second sample are
$$\begin{aligned} {\widehat{ \mu }}_{n,-i}^{(2)} = {\widehat{ \mu }}_{n_1,n_2-1}( X_{11}, \dots , X_{1n_1}; X_{21}, \dots , X_{2,i-1}, X_{2,i+1}, \dots , X_{2n_2} ), \end{aligned}$$
\(i = 1, \dots , n_2\). Now, the jackknife variance estimator of \({\mathrm{Var\,}}( {\widehat{ \mu }}_n(\gamma ) )\) is given by the following:
$$\begin{aligned} {\widehat{ {\mathrm{Var\,}} }}( {\widehat{ \mu }}_n(\gamma ) ) = \frac{n_1}{n} {\widehat{ \tau }}_n^{(1)} + \frac{n_2}{n} {\widehat{ \tau }}_n^{(2)}, \end{aligned}$$
with
$$\begin{aligned} {\widehat{ \tau }}_n^{(i)}&= \frac{1}{n_i} \sum _{j=1}^{n_i} \left( {\widehat{ \mu }}_{n,-j}^{(i)} - \overline{{\widehat{ \mu }}}_{n,\bullet }^{(i)} \right) ^2, \end{aligned}$$
where \(\overline{{\widehat{ \mu }}}_{n,\bullet }^{(i)} = n_i^{-1} \sum _{j=1}^{n_i} {\widehat{ \mu }}_{n,-j}^{(i)}\), \(i = 1, 2\).
The simplified jackknife variance estimator for the special case of equal sample sizes is as follows: again, to make the dependence on the data explicit, let us write the following:
$$\begin{aligned} {\widehat{ \mu }}_N( \gamma ) = {\widehat{ \mu }}_N( Z_1, \dots , Z_N ), \end{aligned}$$
where
$$\begin{aligned} Z_i = (X_{1i}, X_{2i})', \qquad i = 1, \dots , N. \end{aligned}$$
Consider the leave-one-out estimates:
$$\begin{aligned} {\widehat{ \mu }}_{N,-i}( \gamma ) = {\widehat{ \mu }}_N( Z_1, \dots , Z_{i-1}, Z_{i+1}, \dots , Z_N ), \end{aligned}$$
for \(i = 1, \dots , N\). Then, the jackknife estimator for the asymptotic variance of \({\widehat{ \mu }}_N(\gamma )\):
$$\begin{aligned} \sigma ^2(\gamma ) = \sigma _N^2( {\widehat{ \mu }}_N(\gamma ) ) = {\mathrm{Var\,}}( \sqrt{N} {\widehat{ \mu }}_N(\gamma ) ) \end{aligned}$$
is defined by the following:
$$\begin{aligned} {\widehat{ \sigma }}_N^2( \gamma ) = (N-1) \sum _{i=1}^N ( {\widehat{ \mu }}_{N,-i} - \overline{{\widehat{ \mu }}}_{N, \bullet } )^2, \end{aligned}$$
(26)
and the jackknife variance estimator of \({\mathrm{Var\,}}( {\widehat{ \mu }}_N(\gamma ) )\) is as follows:
$$\begin{aligned} {\widehat{ {\mathrm{Var\,}} }}( {\widehat{ \mu }}_N(\gamma ) ) = \frac{N-1}{N} \sum _{i=1}^N ( {\widehat{ \mu }}_{N,-i} - \overline{{\widehat{ \mu }}}_{N, \bullet } )^2, \end{aligned}$$
(27)
where \(\overline{{\widehat{ \mu }}}_{N,\bullet } = N^{-1} \sum _{i=1}^N {\widehat{ \mu }}_{N,-i}\).

The consistency of the jackknife is now established by invoking the key results of the previous section, namely by proving that the common mean estimator is asymptotically linear with an appropriate remainder term. For simplicity of exposition, we state and prove the result for the case of equal sample sizes. The proof only uses elementary arguments, but it is long and technical. It is, therefore, provided in Appendix B.

Theorem 4.1

Suppose that\(X_{ij} \sim F_i\), \(j = 1, \dots , N\), are two i.i.d. samples with distribution functions\(F_1, F_2\)satisfying\(E( X_{11}^{12} ) < \infty\), \(E( X_{21}^{12} ) < \infty\). Then, the following assertions hold.
  1. (i)
    We have
    $$\begin{aligned} {\widehat{ \mu }}_N(\gamma ) - \mu&= \left[ (\nabla [ \gamma ^\le (\theta ) \mu ] + \nabla [(1-\gamma ^\le (\theta )) \mu ]) {\mathbf {1}}_{\{ \sigma _1^2 < \sigma _2^2 \}} \right. \\&\left. \quad + (\nabla [\gamma ^>(\theta ) \mu ] + \nabla [(1-\gamma ^>(\theta )) \mu ]) {\mathbf {1}}_{\{ \sigma _1^2 > \sigma _2^2 \}} \right] ( {\widehat{ \theta }}_N - \theta ) + R_N^\gamma , \end{aligned}$$
    for some remainder term\(R_N^\gamma\)with\(N E(R_N^\gamma )^2 = O(1/N)\)and\(N^2 E( R_N - R_{N-1} )^2 = o(1)\), as\(N \rightarrow \infty\).
     
  2. (ii)
    For ordered variances, i.e., if either\(\sigma _1^2 < \sigma _2^2\), it holds
    $$\begin{aligned} {\widehat{ \mu }}_N(\gamma ) - \mu = \gamma (\theta ) ( \overline{X}_1 - \mu ) + (1-\gamma (\theta )) ( \overline{X}_2 - \mu ) + R_N, \end{aligned}$$
    and for\(\sigma _1^2 > \sigma _2^2\):
    $$\begin{aligned} {\widehat{ \mu }}_N(\gamma ) - \mu = (1-\gamma (\theta )) ( \overline{X}_1 - \mu ) + \gamma (\theta ) ( \overline{X}_2 - \mu ) + R_N, \end{aligned}$$
    with\(N E( R_N^2 ) = O(1/N)\).
     
  3. (iii)
    The jackknife variance estimator is consistent:
    $$\begin{aligned} \frac{{\widehat{ \sigma }}_N^2( {\widehat{ \mu }}_N )}{\sigma ^2(\gamma )} = 1 + o_P(1), \end{aligned}$$
    as\(N \rightarrow \infty\), and asymptotically unbiased:
    $$\begin{aligned} \frac{E( {\widehat{ \sigma }}_N^2( {\widehat{ \mu }}_N ))}{\sigma ^2(\gamma )} = 1 + o(1), \end{aligned}$$
    as\(N \rightarrow \infty\), and the same applies to\({\widehat{ \sigma }}_N^2(\gamma )\).
     

Remark 4.1

For unequal sample sizes, the assertions follow under the additional condition:
$$\begin{aligned} \lim _{\min (n_1,n_2) \rightarrow \infty } \frac{n_i}{n} = \lambda _i, \qquad i = 1,2, \end{aligned}$$
(28)
for \(\lambda _1, \lambda _2 \in (0, 1)\). Then, the estimates (37) and (38) hold, since they only require the decomposition (32) and the estimates (33) given in Lemma A.1.
Let us now briefly discuss the asymptotic normality of the class of estimators under investigation, which does not require ordered variances. Put
$$\begin{aligned} \gamma = \gamma ^\le {\mathbf {1}}_{\{ \sigma _1^2 < \sigma _2^2\}} + \gamma ^> {\mathbf {1}}_{\{ \sigma _1^2 > \sigma _2^2 \}}. \end{aligned}$$
Clearly, for deterministic \(\gamma _N\), the CLT holds. More generally, for random weights satisfying Assumption (\(\Gamma\)), i.e., if  \(\gamma _N\) may depend on \({\widehat{ \theta }}_N\), we have the following result.

Theorem 4.2

Under Assumption\((\Gamma )\)and (28), the common mean estimator\({\widehat{ \mu }}_n(\gamma )\)satisfies the central limit theorem:
$$\begin{aligned} \sqrt{n}[ {\widehat{ \mu }}_n( \gamma ) - \mu ] {\mathop {\rightarrow }\limits ^{d}} N( 0, \sigma ^2(\gamma ) ), \end{aligned}$$
as\(n \rightarrow \infty\), where the asymptotic variance is given by
$$\begin{aligned} \sigma ^2( \gamma ) = \gamma ^2 \lambda _1^{-1} \sigma _1^2 + (1-\gamma )^2 \lambda _2^{-1} \sigma _2^2. \end{aligned}$$
(29)

Proof

We consider the case \(\sigma _1^2 \le \sigma _2^2\). Then, \(\gamma (\cdot ) = \gamma ^\le (\cdot )\) is continuous, such that
$$\begin{aligned} b_n({\widehat{ \theta }}_n) = ( \gamma ({\widehat{ \theta }}_n), 1-{\widehat{ \gamma }}({\widehat{ \theta }}_n ) )' \rightarrow b(\theta ) = (\gamma ^\le (\theta ), 1-\gamma ^\le (\theta ) )', \end{aligned}$$
as \(n \rightarrow \infty\) a.s.. Clearly:
$$\begin{aligned} \sqrt{n} (\overline{X}_i - \mu ) {\mathop {\rightarrow }\limits ^{d}} N(0, \lambda _i^{-1} \sigma _i^2 ), \end{aligned}$$
as \(n \rightarrow \infty\), for \(i = 1, 2\). Notice that \(n = n_1 + n_2 \rightarrow \infty\) and \(n_i / n \rightarrow \lambda _i\), as \(n \rightarrow \infty\), imply \(n_i = (n_i/n) n \rightarrow \infty\), as \(n \rightarrow \infty\), for \(i = 1, 2\). Let us show that
$$\begin{aligned} V_n = \sqrt{n} \left( \begin{array}{cc} \overline{X}_1 - \mu \\ \overline{X}_2 - \mu \end{array} \right) {\mathop {\rightarrow }\limits ^{d}} Z = \left( \begin{array}{cc} \lambda _1^{-1/2} \sigma _1 Z_1 \\ \lambda _2^{-1/2} \sigma _2 Z_2 \end{array} \right) , \end{aligned}$$
(30)
as \(n \rightarrow \infty\), where \(Z_1, Z_2\) are independent standard normal random variables. We apply the Cramér–Wold device. For \(\rho _1, \rho _2 \in {\mathbb {R}}\), we have the following:
$$\begin{aligned} T_n(\rho _1, \rho _2)&= \rho _1 \sqrt{n}( \overline{X}_1 - \mu ) + \rho _2 \sqrt{n}( \overline{X}_2 - \mu ) \\&= \rho _1 \sqrt{\frac{ n }{ n_1 } } \frac{1}{\sqrt{n_1}} \sum _{j=1}^{n_1} (X_{1j} - \mu ) + \rho _2 \sqrt{\frac{ n }{ n_2 } } \frac{1}{\sqrt{n_2}} \sum _{j=1}^{n_2} (X_{2j} - \mu ) \\&= \sum _{j=1}^{n} \zeta _{ni} + o_P(1), \end{aligned}$$
where
$$\begin{aligned} \zeta _{ni} = \left\{ \begin{array}{ll} \frac{1}{\sqrt{n}} \rho _1 \lambda _1^{-1} ( X_{1i} - \mu ), &{} \qquad i = 1, \dots , n_1, \\ \frac{1}{\sqrt{n}} \rho _2 \lambda _2^{-1} ( X_{2,i-n_1} - \mu ), &{} \qquad i = n_1+1, \dots , n, \end{array} \right. \end{aligned}$$
for \(i = 1, \dots , n\), since \(n_i^{-1/2} \sum _{j=1}^{n_i} (X_{ij} - \mu ) = O_P(1)\) and \(| \sqrt{n/n_i} - \sqrt{\lambda _i^{-1}} | = o(1)\), as \(n \rightarrow \infty\), such that
$$\begin{aligned} \left| T_n(\rho _1, \rho _2) - \sum _{j=1}^{n} \zeta _{ni} \right|&\le \sum _{i=1,2} | \rho _i | \left| \sqrt{ \frac{n}{n_i} } - \sqrt{ \lambda _i } \right| \left| \frac{1}{ \sqrt{n_i} } \sum _{j=1}^{n_i} ( X_{ij} - \mu ) \right| \\&= o_P(1), \end{aligned}$$
as \(n \rightarrow \infty\). Observe that, for each \(n \in {\mathbb {N}}\), the random variables \(\zeta _{n1}, \dots , \zeta _{nn}\) are mean zero and independent with  \(\sup _{n \ge 1} \max _{1 \le i \le n} {\mathrm{Var\,}}| \sqrt{n}\zeta _{ni} | < \infty\). Furthermore:
$$\begin{aligned} \sum _{i=1}^n {\mathrm{Var\,}}( \zeta _{ni} )&= \rho _1^2 \frac{n_1}{n} \lambda _1^{-2} \sigma _1^2 + \rho _1^2 \frac{n_2}{n} \lambda _2^{-2} \sigma _2^2 \\&\rightarrow \rho _1^2 \lambda _1^{-1} \sigma _1^2 + \rho _2^2 \lambda _2^{-1} \sigma _2^2, \end{aligned}$$
as \(n \rightarrow \infty\). Hence, the Lindeberg condition is satisfied, such that the CLT implies
$$\begin{aligned} T_n(\rho _1, \rho _2) {\mathop {\rightarrow }\limits ^{d}} N( 0, \rho _1^2 \lambda _1^{-1} \sigma _1^2 + \rho _2^2 \lambda _2^{-1} \sigma _2^2 ), \end{aligned}$$
as \(n \rightarrow \infty\). This verifies (30). Now, the assertions follows by an application of Slutsky’s lemma:
$$\begin{aligned} \sqrt{n}( {\widehat{ \mu }}_n(\gamma ) - \mu ) = b_n( {\widehat{ \theta }}_n )' V_n {\mathop {\rightarrow }\limits ^{d}} b(\theta )' Z \sim N(0, \gamma ^\le (\theta )^2 \lambda _1^{-1} \sigma _1^2 + (1-\gamma ^\le (\theta ))^2 \lambda _2^{-1} \sigma _2^2 ), \end{aligned}$$
as \(n \rightarrow \infty\), where \(\sigma ^2( \gamma )\) is given in (29). \(\square\)

Remark 4.2

Under the assumptions of Theorem 4.1, the CLT follows directly from the asymptotic representation as a linear statistic shown there.

The CLT suggest the following estimator for the variance of the common mean estimator \({\widehat{ \mu }}_n(\gamma )\):
$$\begin{aligned} {\widehat{ {\mathrm{Var\,}} }}({\widehat{ \mu }}_n(\gamma )) = \frac{1}{n} \left[ {\widehat{ \gamma }}_n^2 \frac{n}{n_1} S_1^2 + (1-{\widehat{ \gamma }}_n )^2 \frac{n}{n_2} S_2^2 \right] , \end{aligned}$$
(31)
where \({\widehat{ \gamma }}_n = \gamma ^{\le }(n_1/n, n_2/n, S_1^2, S_2^2 )\), if \({\widetilde{ S }}_1^2 \le {\widetilde{ S }}_2^2\), and \(= \gamma ^{>}(n_1/n, n_2/n, S_1^2, S_2^2 )\) otherwise. The alternative formula
$$\begin{aligned} {\widehat{ {\mathrm{Var\,}} }}({\widehat{ \mu }}_n(\gamma )) = {\widehat{ \gamma }}_n^2 \frac{S_1^2}{n_1} + (1-{\widehat{ \gamma }}_n )^2 \frac{S_2^2}{n_2} \end{aligned}$$
shows how the standard errors \(S_i^2/n_i\), \(i = 1, 2\), of the samples are weighted with the sample fractions \(n_i/n\), \(i = 1, 2\), and the squared convex weights.

5 Simulations and data analysis

We investigated the proposed method by simulations and analyzed three data sets to illustrate both the method and its wide applicability. The data come from the fields of natural science (physics), technology (quality engineering), and social information.

5.1 Simulation study

The simulation study aims at comparing the accuracy of the jackknife with the accuracy of the bootstrap taking into account the computational costs. Bootstrapping is a commonly used tool. Recall that the nonparametric bootstrap draws B random samples of sizes \(n_1\) and \(n_2\) from the given data sets and then estimates the variance of a test statistics, the GD estimator in our study, by the sample variance of the B replicates of the statistic. We consider the balanced design where \(N = n_1 = n_2\). In this case, the computational costs of the bootstrap, measured as the number of times the test statistic has to be evaluated, are equal to the costs of the jackknife, if \(B = N\). In our study, we investigate the cases \(N = 25, 50, 75\) and \(B = 100, 200, \dots , 1000\), such that the computational costs of the bootstrap relative to those of the jackknife range from a factor of 4 / 3 to 40.

The simulations consider normally distributed data (model 1), the t(5) -distribution (model 2) as a distribution with fat tails, the \(U(-5,5)\)-distribution as a short-tailed law and \(\Gamma (a,b)\)-distributed samples with mean ab and variance \(a b^2\) (models 4 and 5) leading to skewed distributions. For model 4, observations distributed as \(\mu + \sigma _i ( \Gamma (a,\sigma _i) - a \sigma _i)\), \(i = 1, 2\), were simulated with \(a = 1.5\). Model 5 uses \(a = 2.5\). The common mean equals \(\mu = 10\). The results are provided in Table 1, which shows the coverage probabilities of the confidence intervals calculated based on the corresponding variance estimate for a nominal confidence level of \(95\%\). Each entry is based on \(S = 20,000\) runs.

It can be seen that, in all cases studied here, the accuracy of the jackknife confidence interval in terms of the coverage probability is substantially higher than for the bootstrap intervals, which suffer from a lower than nominal coverage. Although the improvement, of course, diminishes for larger values of B, the bootstrap intervals are even worse for \(B = 1000\). The results, therefore, demonstrate that jackknifing is a highly efficient tool when it comes to calculating confidence intervals on a large scale where the computational costs matter, as it is the case when analyzing big data.
Table 1

Accuracy of the jackknife and the bootstrap variance estimators in terms of the coverage probability for the confidence level 0.95

Model

N

Jack

100

200

300

400

500

600

700

800

900

1000

1

25

0.946

0.935

0.938

0.932

0.937

0.933

0.938

0.936

0.938

0.940

0.937

1

50

0.951

0.941

0.942

0.943

0.944

0.943

0.946

0.946

0.945

0.944

0.941

1

75

0.951

0.942

0.943

0.945

0.947

0.945

0.946

0.945

0.946

0.945

0.944

2

25

0.949

0.934

0.936

0.934

0.935

0.937

0.938

0.937

0.935

0.939

0.935

2

50

0.949

0.941

0.942

0.941

0.940

0.943

0.943

0.942

0.944

0.942

0.941

2

75

0.949

0.941

0.945

0.943

0.946

0.943

0.944

0.946

0.944

0.943

0.946

3

25

0.946

0.940

0.940

0.940

0.941

0.937

0.937

0.940

0.941

0.943

0.942

3

50

0.948

0.941

0.944

0.946

0.947

0.944

0.945

0.943

0.946

0.945

0.944

3

75

0.948

0.946

0.947

0.944

0.947

0.948

0.949

0.950

0.949

0.948

0.946

4

25

0.921

0.908

0.915

0.912

0.911

0.910

0.913

0.916

0.912

0.913

0.912

4

50

0.933

0.928

0.926

0.930

0.924

0.931

0.931

0.931

0.926

0.928

0.928

4

75

0.940

0.932

0.934

0.933

0.937

0.937

0.935

0.937

0.933

0.934

0.934

5

25

0.930

0.918

0.919

0.920

0.918

0.919

0.917

0.922

0.923

0.920

0.920

5

50

0.938

0.932

0.929

0.933

0.936

0.931

0.934

0.935

0.932

0.932

0.935

5

75

0.942

0.935

0.936

0.938

0.940

0.938

0.939

0.936

0.937

0.939

0.939

5.2 Physics: acceleration due to gravity

In physics, the common mean model frequently applies when observing or measuring a time-invariant physical phenomenon. As an example, we analyze the Heyl and Cook measurements of the acceleration due to gravity, Heyl and Cook (1936). Two of those data sets, taken from Cressie (1997), are given by \(x_1 = (78, 78, 78, 86, 87, 81, 73, 67, 75, 82, 83)'\) and \(x_2 = (84, 86, 85, 82, 77, 76, 80, 83, 81, 78, 78, 78)\).

These measurements are deviations from the value \(g = 980,060 \times 10^3\) cm/s\(^2\). The classical F test for homogeneity of variances accepts the null hypothesis of equal variances. The common mean assumption is obviously satisfied, since the same physical (invariant) phenomenon is measured; systematic errors can be excluded from considerations, due to the great amount of care taken in the experiments to avoid systematic errors. The Graybill–Deal estimator for this data is \({\widehat{ \mu }}^{(GD)} = 80.26123\). To estimate its variance and set up confidence intervals valid also under non-normal underlying distributions, we used the asymptotic approach based on the CLT, i.e., (31), and the jackknife for unequal sample sizes. Table 2 provides those estimates and the resulting confidence intervals for a confidence level of \(95\%\) for the common mean based on the central limit theorem.
Table 2

Common mean estimation using the GD estimator: standard errors and confidence intervals using asymptotics and the jackknife, respectively

Method

Est. sd of \({\widehat{ \mu }}^{(GD)}\)

Confidence interval

Width

Asymptotics

0.8455307

[78.60399, 81.91847]

3.31448

Jackknife

0.8492987

[78.5966, 81.92585]

3.329251

It can be seen that the jackknife and the asymptotic formula lead to quite similar results for the GD estimator. Table 3 shows the results for Nair’s estimator:
Table 3

Common mean estimation using Nair’s estimator: standard errors and confidence intervals using asymptotics and the jackknife, respectively

Method

Est. sd of \({\widehat{ \mu }}^{(N)}\)

Confidence interval

Width

Asymptotics

1.279913

[77.31746, 81.91847]

5.017259

Jackknife

0.9752919

[77.91451, 81.92585]

3.823144

Nair’s estimator uses the weights \(n_1/n = 11/23\) and \(n_2/n =12/23\), since the variance estimates \(s_1^2 = 34.09091\) and \(s_2^2 = 11.15152\) are not ordered, which substantially differ from the stochastic weights used by the GD estimator. The jackknife variance estimate provides a tighter confidence interval than the asymptotic approach in this case.

5.3 Technology: chip production data

Chips as used in electronic circuits are cut using a cutting machine. The critical quantity is the width of cut out chips. Figure 1 depicts kernel density estimates (with cross-validated bandwidth selection) and boxplots of two series of \(n = m = 240\) of width measurements using different cutting saws. Both data sets are non-normal, as confirmed by Shapiro–Wilk tests leading to p values \(< 10^{-3}\), and there are some outliers present which are, however, not treated for the present analysis. The sample means are \(\overline{x} = 6.293254\) and \(\overline{y} = 6.292667\) with estimated standard deviations \(s_x = 0.003785844\) and \(s_y = 0.004962341\). Table 4 provides the results when assuming \(\sigma _1^2 < \sigma _2^2\).
Fig. 1

Density estimates and boxplots of width measurements of sliced chips using two cutting saws

Table 4

Nair’s estimator for the chip data: standard errors and confidence intervals using asymptotics and the jackknife, respectively

Method

Est. sd of \({\widehat{ \mu }}^{(N)}\)

Confidence interval

Width

Asymptotics

0.0001942891

[6.292657, 6.293419]

0.0007616134

Jackknife

0.0001941049

[6.292658, 6.293418]

0.000760891

When interchanging the samples, such that the variance estimates are not ordered, the results for Nair’s estimator differ; see Table 5. Now, the jackknife results in a substantially smaller variance estimate and a tighter confidence interval compared to the approach based on the asymptotics.
Table 5

Results for Nair’s estimator for the chip data when switching the samples

Method

Est. sd of \({\widehat{ \mu }}^{(N)}\)

Confidence interval

Width

Asymptotics

0.0002921816

[6.292388, 6.293419]

0.001145352

Jackknife

0.0001984456

[6.292571, 6.293418]

0.0007779067

5.4 Social information: Japanese child data

In Japan, the strength of 8-year-old boys and girls was investigated in the six prefectures (Aomori, Iwate, Miyagi, Akita, Yamagata, and Fukushima) of the Touhoku region on Honshu, the largest island of Japan, and the prefecture Hokkaido located in the north of Japan. The observations for the boys are 52.55, 54.08, 54.25, 52.92, 56.31, 53.63, 52.52 and those for girls 52.95, 55.72, 56.14, 54.24, 58.19, 55.32, 54.45. For those data sets, the Graybill–Deal estimator is given by 54.34878.

Table 6 provides the results when assigning the girls’ data to \(x_1\) and the boys’ data to \(x_2\). The results for the GD estimator are as follows.
Table 6

Results for the GD common estimator for the Japanese child data: standard errors and confidence intervals using asymptotics and the jackknife

Method

Est. sd of \({\widehat{ \mu }}^{(GD)}\)

Confidence interval

Width

Asymptotics

0.3921168

[53.58023, 55.11733]

1.537098

Jackknife

0.6874476

[53.00139, 55.69618]

2.694794

For Nair’s estimator, we obtain different results, since the variance estimates are not ordered; see Table 7. For the chip data above, the jackknife leads to a smaller variance estimate and a tighter confidence interval.
Table 7

The Nair estimator for Japanese child data: standard errors and confidence intervals using asymptotics and the jackknife

Method

Est. sd of \({\widehat{ \mu }}^{(N)}\)

Confidence Interval

Width

Asymptotics

0.5919936

[53.35898, 55.11733]

2.320615

Jackknife

0.5593932

[53.42288, 55.69618]

2.192821

Notes

Acknowledgements

This work was supported by JSPS Kakenhi grants #JP26330047 and #JP8K11196. Parts of this paper have been written during research visits of the first author at Mejiro University, Tokyo. He thanks for the warm hospitality. Both authors thank Hideo Suzuki, Keio University at Yokohama, for invitations to his research seminar, Shinozaki Nobuo, Takahisa Iida, Shun Matsuura and the participants for comments and discussion. The authors gratefully acknowledge the support of Prof. Takenori Takahashi, Mejiro University and Keio University Graduate School, and Akira Ogawa, Mejiro University, for providing and discussing the chip manufacturing data. They would like to thank anonymous referees for the helpful comments.

References

  1. Bhattarcharya, C. (1980). Estimation of a common mean and recovery of interblock information. The Annals of Statistics, 8, 205–211.MathSciNetCrossRefGoogle Scholar
  2. Brown, L. D., & Cohen, A. (1974). Point and confidence interval estimation of a common mean and recovery of interblock information. The Annals of Statistics, 2, 963–976.MathSciNetCrossRefzbMATHGoogle Scholar
  3. Cemer, I. (2011). Noise measurement. Sensors online. https://www.sensorsmag.com/embedded/noise-measurement.
  4. Chang, Y.-T., Oono, Y., & Shinozaki, N. (2012). Improved estimators for the common mean and ordered means of two normal distributions with ordered variances. Journal of Statistical Planning and Inference, 142(9), 2619–2628.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Chang, Y.-T., & Shinozaki, N. (2008). Estimation of linear functions of ordered scale parameters of two gamma distributions under entropy loss. Journal of the Japan Statistical Society, 38(2), 335–347.MathSciNetCrossRefGoogle Scholar
  6. Chang, Y-T., & Shinozaki, N. (2015). Estimation of two ordered normal means under modified Pitman nearness criterion. Annals of the Institute of Statistical Mathematics, 67, 863–883.  https://doi.org/10.1007/s10463-014-0479-4.MathSciNetCrossRefzbMATHGoogle Scholar
  7. Cochran, W. (1937). Problems arising in the analysis of a series of similar experiments. JASA, 4, 172–175.Google Scholar
  8. Cressie, N. (1997). Jackknifing in the presence of inhomogeneity. Technometrics, 39(1), 45–51.MathSciNetCrossRefzbMATHGoogle Scholar
  9. Degerli, Y. (2000). Analysis and reduction of signal readout circuitry temporal noise in CMOS image sensors for low light levels. IEEE Transactions on Electron Devices, 47(5), 949–962.CrossRefGoogle Scholar
  10. Efron, B. (1982). The jackknife, the bootstrap and other resampling plans (Vol. 38)., CBMS-NSF regional conference series in applied mathematics Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM).CrossRefzbMATHGoogle Scholar
  11. Efron, B., & Hastie, T. (2016). Computer age statistical inference: Algorithms, evidence, and data science (Vol. 5)., Institute of mathematical statistics (IMS) monographs New York: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  12. Efron, B., & Stein, C. (1981). The jackknife estimate of variance. The Annals of Statistics, 9(3), 586–596.MathSciNetCrossRefzbMATHGoogle Scholar
  13. Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap (Vol. 57)., Monographs on statistics and applied probability New York: Chapman and Hall.CrossRefzbMATHGoogle Scholar
  14. Elfessi, A., & Pal, N. (1992). A note on the common mean of two normal populations with order restrictions in location-scale families. Communications in Statistics—Theory and Methods, 21(11), 3177–3184.CrossRefzbMATHGoogle Scholar
  15. Fisher, R. (1932). Statistical methods for research workers (4th ed.). London: Oliver and Boyd.zbMATHGoogle Scholar
  16. Heyl, P., & Cook, G. (1936). The value of gravity in Washington. Journal of Research of the U.S. Bureau of Standards, 17, 805–839.CrossRefGoogle Scholar
  17. Keller, T., & Olkin, I. (2004). Combining correlated unbiased estimators of the mean of a normal distribution. A Festschrift for Herman Rubin, 45, 218–227.MathSciNetCrossRefzbMATHGoogle Scholar
  18. Kubokawa, T. (1989). Closer estimation of a common mean in the sense of Pitman. Annals of the Institute of Statistical Mathematics, 41(3), 477–484.MathSciNetCrossRefzbMATHGoogle Scholar
  19. Lee, Y. (1991). Jackknife variance estimators in the one-way random effects model. Annals of the Institute of Statistical Mathematics, 43(4), 707–714.MathSciNetCrossRefGoogle Scholar
  20. Lin, D. (2010). Quantified temperature effect in a CMOS image sensor. IEEE Transactions on Electron Devices, 57(2), 422–428.CrossRefGoogle Scholar
  21. Mehta, J., & Gurland, J. (1969). Combinations of unbiased estimators of the mean which consider inequality of unknown variances. JASA, 64(327), 1042–1055.MathSciNetCrossRefzbMATHGoogle Scholar
  22. Miller, R. G. (1974). The jackknife—A review. Biometrika, 61, 1–15.MathSciNetzbMATHGoogle Scholar
  23. Nair, K. (1980). Variance and distribution of the Graybill–Deal estimator of the common mean of two normal populations. The Annals of Statistics, 8(1), 212–216.MathSciNetCrossRefzbMATHGoogle Scholar
  24. Nair, K. (1982). An estimator of the common mean of two normal populations. Journal of Statistical Planning and Inference, 6, 119–122.MathSciNetCrossRefzbMATHGoogle Scholar
  25. Pitman, E. (1937). The closest estimates of statistical parameters. Proceedings of the Cambridge Philosophical Society, 33, 212–222.CrossRefzbMATHGoogle Scholar
  26. Quenouille, M. (1949). Approximate tests of correlation in time series. Mathematical Proceedings of the Cambridge Philosophical Society, 11, 68–84.MathSciNetzbMATHGoogle Scholar
  27. Shao, J. (1993). Differentiability of statistical functionals and consistency of the jackknife. The Annals of Statistics, 21(1), 61–75.MathSciNetCrossRefzbMATHGoogle Scholar
  28. Shao, J., & Wu, C. F. (1989). A general theory for jackknife variance estimation. The Annals of Statistics, 17, 1176–1197.MathSciNetCrossRefzbMATHGoogle Scholar
  29. Sinha, B. (1985). Unbiased estimation of the variance of the GD estimator of the common mean of several normal populations. Canadian Journal of Statistics, 13(3), 243–247.MathSciNetCrossRefzbMATHGoogle Scholar
  30. Steland, A. (2015). Vertically weighted averages in Hilbert spaces and applications to imaging: Fixed sample asymptotics and efficient sequential two-stage estimation. Sequential Analysis, 34(3), 295–323.MathSciNetCrossRefzbMATHGoogle Scholar
  31. Steland, A. (2017). Fusing photovoltaic data for improved confidence intervals. AIMS Energy, 5, 113–136.CrossRefGoogle Scholar
  32. Tippett, L. (1931). The method of statistics. London: Williams and Norgate.Google Scholar
  33. Tukey, J. W. (1958). Bias and confidence in not quite large samples (abstract). Annals of Mathematical Statistics, 29, 614.CrossRefGoogle Scholar
  34. van Eeden, C. (2006). Restricted parameter space estimation problems., Lecture notes in statistics Berlin: Springer.CrossRefzbMATHGoogle Scholar
  35. Voinov, V. (1984). Variance and its unbiased estimator for the common mean of several normal populations. Sankhya: The Indian Journal of Statistics, Series B, 46, 291–300.MathSciNetGoogle Scholar

Copyright information

© Japanese Federation of Statistical Science Associations 2019

Authors and Affiliations

  1. 1.Institute of StatisticsRWTH Aachen UniversityAachenGermany
  2. 2.Department of Social InformationMejiro UniversityTokyoJapan

Personalised recommendations