# Bounds for Tail Probabilities of the Sample Variance

Open Access
Research Article

## Abstract

We provide bounds for tail probabilities of the sample variance. The bounds are expressed in terms of Hoeffding functions and are the sharpest known. They are designed having in mind applications in auditing as well as in processing data related to environment.

### Keywords

Convex Function Central Limit Theorem Sample Variance Elementary Calculation Point Distribution

## 1. Introduction and Results

Let be a random sample of independent identically distributed observations. Throughout we write

for the mean, variance, and the fourth central moment of , and assume that . Some of our results hold only for bounded random variables. In such cases without loss of generality we assume that . Note that is a natural condition in audit applications.

The sample variance of the sample is defined as
where is the sample mean, . We can rewrite (1.2) as
We are interested in deviations of the statistic from its mean , that is, in bounds for the tail probabilities of the statistic ,

The paper is organized as follows. In the introduction we give a description of bounds, some comments, and references. In Section 2 we obtain sharp upper bounds for the fourth moment. In Section 3 we give proofs of all facts and results from the introduction.

If , then the range of interest in (1.5) is , where

The restriction on the range of in (1.4) (resp., in (1.5) in cases where the condition is fulfilled) is natural. Indeed, for , due to the obvious inequality . Furthermore, in the case of we have for since (see Proposition 2.3 for a proof of the latter inequality).

The asymptotic (as ) properties of (see Section 3 for proofs of (1.7) and (1.8)) can be used to test the quality of bounds for tail probabilities. Under the condition the statistic is asymptotically normal provided that is not a Bernoulli random variable symmetric around its mean. Namely, if , then
If (which happens if and only if is a Bernoulli random variable symmetric around its mean), then asymptotically has type distribution, that is,

where is a standard normal random variable, and is the standard normal distribution function.

Let us recall already known bounds for the tail probabilities of the sample variance (see (1.19)–(1.21)). We need notation related to certain functions coming back to Hoeffding [1]. Let and . Write
For we define . For we set . Note that our notation for the function is slightly different from the traditional one. Let . Introduce as well the function
(1.10)
and for . One can check that
(1.11)

All our bounds are expressed in terms of the function . Using (1.11), it is easy to replace them by bounds expressed in terms of the function , and we omit related formulations.

Let and . Assume that
(1.12)
Let be a Bernoulli random variable such that and . Then and . The function is related to the generating function (the Laplace transform) of binomial distributions since
(1.13)
(1.14)
where are independent copies of . Note that (1.14) is an obvious corollary of (1.13). We omit elementary calculations leading to (1.13). In a similar way
(1.15)

where is a Poisson random variable with parameter .

The functions and satisfy a kind of the Central Limit Theorem. Namely, for given and we have
(1.16)
(we omit elementary calculations leading to (1.16)). Furthermore, we have [1]
(1.17)
and we also have [2]
(1.18)
Using the introduced notation, we can recall the known results (see [2, Lemma ]). Let be the integer part of . Assume that . If is known, then
(1.19)
The right-hand side of (1.19) is an increasing function of (see Section 3 for a short proof of (1.19) as a corollary of Theorem 1.1). If is unknown but is known, then
(1.20)
Using the obvious estimate , the bound (1.20) is implied by (1.19). In cases where both and are not known, we have
(1.21)

as it follows from (1.19) using the obvious bound .

Let us note that the known bounds (1.19)–(1.21) are the best possible in the framework of an approach based on analysis of the variance, usage of exponential functions, and of an inequality of Hoeffding (see (3.3)), which allows to reduce the problem to estimation of tail probabilities for sums of independent random variables. Our improvement is due to careful analysis of the fourth moment which appears to be quite complicated; see Section 2. Briefly the results of this paper are the following: we prove a general bound involving , , and the fourth moment ; this general bound implies all other bounds, in particular a new precise bound involving and ; we provide as well bounds for lower tails ; we compare the bounds analytically, mostly as is sufficiently large.

From the mathematical point of view the sample variance is one of the simplest nonlinear statistics. Known bounds for tail probabilities are designed having in mind linear statistics, possibly also for dependent observations. See a seminal paper of Hoeffding [1] published in JASA. For further development see Talagrand [3], Pinelis [4, 5], Bentkus [6, 7], Bentkus et al. [8, 9], and so forth. Our intention is to develop tools useful in the setting of nonlinear statistics, using the sample variance as a test statistic.

Theorem 1.1 extends and improves the known bounds (1.19)–(1.21). We can derive (1.19)–(1.21) from this theorem since we can estimate the fourth moment via various combinations of and using the boundedness assumption .

Theorem 1.1.

If and , then
(1.22)
with
(1.23)
If and , then
(1.24)
with
(1.25)

Both bounds and are increasing functions of , and .

Remark 1.2.

In order to derive upper confidence bounds we need only estimates of the upper tail (see [2]). To estimate the upper tail the condition is sufficient. The lower tail has a different type of behavior since to estimate it we indeed need the assumption that is a bounded random variable.

For Theorem 1.1 implies the known bounds (1.19)–(1.21) for the upper tail of . It implies as well the bounds (1.26)–(1.29) for the lower tail. The lower tail has a bit more complicated structure, (cf. (1.26)–(1.29) with their counterparts (1.19)–(1.21) for the upper tail).

If is known, then
(1.26)
One can show (we omit details) that the bound is not an increasing function of . A bit rougher inequality
(1.27)
has the monotonicity property since is an increasing function of . If is known, then using the obvious inequality , the bound (1.27) yields
(1.28)
If we have no information about and , then using , the bound (1.27) implies
(1.29)

The bounds above do not cover the situation where both and are known. To formulate a related result we need additional notation. In case of we use the notation

(1.30)
In view of the well-known upper bound for the variance of , we can partition the set
(1.31)
of possible values of and into a union of three subsets
(1.32)
and ; see Figure 1.

Theorem 1.3.

Write . Assume that .

The upper tail of the statistic satisfies
(1.33)
with , where
(1.34)
and where one can write
(1.35)
The lower tail of satisfies
(1.36)

with , where , and is defined by (1.34).

The bounds above are obtained using the classical transform ,
(1.37)

of survival functions (cf. definitions (1.13) and (1.14) of the related Hoeffding functions). The bounds expressed in terms of Hoeffding functions have a simple analytical structure and are easily numerically computable.

All our upper and lower bounds satisfy a kind of the Central Limit Theorem. Namely, if we consider an upper bound, say (resp., a lower bound ) as a function of , then there exist limits
(1.38)
with some positive and . The values of and can be used to compare the bounds—the larger these constants, the better the bound. To prove (1.38) it suffices to note that with
(1.39)
The Central Limit Theorem in the form of (1.7) restricts the ranges of possible values of and . Namely, using (1.7) it is easy to see that and have to satisfy
(1.40)

We provide the values of these constants for all our bounds and give the numerical values of them in the following two cases.

(i) is a random variable uniformly distributed in the interval . The moments of this random variable satisfy
(1.41)

For defined by (1.41), the constants and we give as .

(ii) is uniformly distributed in , and in this case
(1.42)

For the constants and with defined by (1.42) we give as .

We have
(1.43)
(1.44)
(1.45)
(1.46)

while calculating the constants in (1.44) and (1.46) we choose . The quantity in (1.43) and (1.45) is defined by (1.34).

Conclusions

Our new bounds provide a substantial improvement of the known bounds. However, from the asymptotic point of view these bounds seem to be still rather crude. To improve the bounds further one needs new methods and approaches. Some preliminary computer simulations show that in applications where is finite and random variables have small means and variances (like in auditing, where a typical value of is ), the asymptotic behavior is not related much to the behavior for small . Therefore bounds specially designed to cover the case of finite have to be developed.

## 2. Sharp Upper Bounds for the Fourth Moment

Recall that we consider bounded random variables such that , and that we write and . In Lemma 2.1 we provide an optimal upper bound for the fourth moment of given a shift , a mean , and a variance . The maximizers of the fourth moment are either Bernoulli or trinomial random variables. It turns out that their distributions, say , are of the following three types (i)–(iii):

(i)a two point distribution such that
(ii)a family of three point distributions depending on such that
where we write

notice that (2.4) supplies a three-point probability distribution only in cases where the inequalities and hold;

(iii)a two point distribution such that

Note that the point in (2.2)–(2.7) satisfies and that the probability distribution has mean and variance .

Introduce the set
Using the well-known bound valid for , it is easy to see that
Let . We represent the set as a union of three subsets setting
(2.10)

and , where and are given in (2.5). Let us mention the following properties of the regions.

(a)If , then since for such obviously for all . The set is a one-point set. The set is empty.

(b)If , then since for such clearly for all . The set is a one-point set. The set is empty.

For all three regions , , are nonempty sets. The sets and have only one common point , that is, .

Lemma 2.1.

Let . Assume that a random variable satisfies
(2.11)
Then
(2.12)

with a random variable satisfying (2.11) and defined as follows:

(i)if , then is a Bernoulli random variable with distribution (2.2);

(ii)if , then is a trinomial random variable with distribution (2.4);

(iii)if , then is a Bernoulli random variable with distribution (2.7).

Proof.

Writing , we have to prove that if
(2.13)
then
(2.14)

with . Henceforth we write , so that can assume only the values , , with probabilities , , defined in (2.2)–(2.7), respectively. The distribution is related to the distribution as for all .

Formally in our proof we do not need the description (2.17) of measures satisfying (2.15). However, the description helps to understand the idea of the proof. Let and . Assume that a signed measure of subsets of is such that the total variation measure is a discrete measure concentrated in a three-point set and
(2.15)
Then is a uniquely defined measure such that
(2.16)
satisfy
(2.17)

We omit the elementary calculations leading to (2.17). The calculations are related to solving systems of linear equations.

Let . Consider the polynomial
(2.18)
It is easy to check that
(2.19)
The proofs of (i)–(iii) differ only in technical details. In all cases we find , , and (depending on , and ) such that the polynomial defined by (2.18) satisfies for , and such that the coefficient in (2.18) vanishes, . Using , the inequality is equivalent to , which obviously leads to . We note that the random variable assumes the values from the set
(2.20)
Therefore we have
(2.21)

which proves the lemma.

(i)Now . We choose and . In order to ensure (cf. (2.19)) we have to take
(2.22)
If , then for all . The inequality is equivalent to
(2.23)

To complete the proof we note that the random variable with defined by (2.2) assumes its values in the set . To find the distribution of we use (2.17). Setting in (2.17) we obtain and , as in (2.2).

(ii)Now or, equivalently and . Moreover, we can assume that since only for such the region is nonempty. We choose and . Then for all . In order to ensure (cf. (2.19)) we have to take
(2.24)

By our construction . To find a distribution of supported by the set we use (2.17). It follows that has the distribution defined in (2.4).

(iii)We choose and . In order to ensure (cf. (2.19)) we have to take
(2.25)
If , then for all . The inequality is equivalent to
(2.26)

To conclude the proof we notice that the random variable with given by (2.7) assumes values from the set .

To prove Theorems 1.1 and 1.3 we apply Lemma 2.1 with . We provide the bounds of interest as Corollary 2.2. To prove the corollary it suffices to plug in Lemma 2.1 and, using (2.2)–(2.7), to calculate explicitly. We omit related elementary however cumbersome calculations. The regions , , and are defined in (1.32).

Corollary 2.2.

Let a random variable have mean and variance . Then
(2.27)

Proposition 2.3.

Let . Then, with probability , the sample variance satisfies with given by (1.6).

Proof.

Using the representation (1.3) of the sample variance as an -statistic, it suffices to show that the function ,
(2.28)
in the domain
(2.29)

satisfies . The function is convex. To see this, it suffices to check that restricted to straight lines is convex. Any straight line can be represented as with some . The convexity of on is equivalent to the convexity of the function of the real variable . It is clear that the second derivative is nonnegative since . Thus both and are convex.

Since both and are convex, the function attains its maximal value on the boundary of . Moreover, the maximal value of is attained on the set of extremal points of . In our case the set of the extremal points is just the set of vertexes of the cube . In other words, the maximal value of is attained when each of is either or . Since is a symmetric function, we can assume that the maximal value of is attained when and with some . Using (2.28), the corresponding value of is . Maximizing with respect to we get , if is even, and , if is odd, which we can rewrite as the desired inequality .

## 3. Proofs

We use the following observation which in the case of an exponential function comes back to Hoeffding [1, Section ]. Assume that we can represent a random variable, say , as a weighted mixture of other random variables, say , so that
where are nonrandom numbers. Let be a convex function. Then, using Jensen's inequality , we obtain
Moreover, if random variables are identically distributed, then
One can specialize (3.3) for -statistics of the second order. Let be a symmetric function of its arguments. For an i.i.d. sample consider the -statistic
Write
Then (3.3) yields
for any convex function . To see that (3.6) holds, let be a permutation of . Define as (3.5) replacing the sample by its permutation . Then (see [1, Section ])

which means that allows a representation of type (3.1) with and all identically distributed, due to our symmetry and i.i.d. assumptions. Thus, (3.3) implies (3.6).

Using (1.3) we can write
with . By an application of (3.6) we derive
for any convex function , where is a sum of i.i.d. random variables such that
(3.10)
Consider the following three families of functions depending on parameters :
(3.11)
(3.12)
(3.13)
Any of functions given by (3.11) dominates the indicator function of the interval . Therefore . Combining this inequality with (3.9), we get
(3.14)

with being a sum of i.i.d. random variables specified in (3.10). Depending on the choice of the family of functions given by (3.11), the in (3.14) is taken over or , respectively.

Proposition 3.1.

One has
(3.15)

Proof.

Let us prove (3.15). Using the i.i.d. assumption, we have
(3.16)
Let us prove that . If , then . Using (3.15) we have
(3.17)

which yields the desired bound for .

Proposition 3.2.

Let be a bounded random variable such that with some nonrandom . Then for any convex function one has
(3.18)

where is a Bernoulli random variable such that and .

If for some , and , , then (3.18) holds with
(3.19)
and a Bernoulli random variable such that , ,
(3.20)

Proof.

See [2, Lemmas and ].

Proof of Theorem 1.1.

The proof is based on a combination of Hoeffding's observation (3.6) using the representation (3.8) of as a -statistic, of Chebyshev's inequality involving exponential functions, and of Proposition 3.2. Let us provide more details. We have to prove (1.22) and (1.24).

Let us prove (1.22). We apply (3.14) with the family (3.13) of exponential functions . We get
(3.21)
By (3.10), the sum is a sum of copies of a random variable, say , such that
(3.22)
We note that
(3.23)
Indeed, the first two relations in (3.23) are obvious; the third one is implied by ,
(3.24)

and ; see Proposition 3.1.

Let stand for the class of random variables satisfying (3.23). Taking into account (3.21), to prove (1.22) it suffices to check that
(3.25)
where is a sum of independent copies of . It is clear that the left-hand side of (3.25) is an increasing function of . To prove (3.25), we apply Proposition 3.2. Conditioning times on all random variables except one, we can replace all random variables by Bernoulli ones. To find the distribution of the Bernoulli random variables we use (3.23). We get
(3.26)

where is a sum of independent copies of a Bernoulli random variable, say , such that and with as in (1.23), that is, . Note that in (3.26) we have the equality since .

Using (3.26) we have
(3.27)

To see that the third equality in (3.27) holds, it suffices to change the variable by . The fourth equality holds by definition (1.13) of the Hoeffding function since is a Bernoulli random variable with mean zero and such that . The relation (3.27) proves (3.25) and (1.22).

A proof of (1.24) repeats the proof of (1.22) replacing everywhere and by and , respectively. The inequality in (3.23) has to be replaced by , which holds due to our assumption . Respectively, the probability now is given by (1.25).

Proof.

The bound is an obvious corollary of Theorem 1.1 since by Proposition 3.1 we have , and therefore we can choose . Setting this value of into (1.22), we obtain (1.19).

Proof.

To prove (1.26), we set in (1.24). Such choice of is justified in the proof of (1.19).

To prove (1.27) we use (1.26). We have to prove that
(3.28)
and that the right-hand side of (3.28) is an increasing function of . By the definition of the Hoeffding function we have
(3.29)
where is a Bernoulli random variable such that and . It is easy to check that assumes as well the value with probability . Hence . Therefore , and we can write
(3.30)
where is the class of random variables such that and . Combining (3.29) and (3.30) we obtain
(3.31)
The definition of the latter in (3.31) shows that the right-hand side of (3.31) is an increasing function of . To conclude the proof of (1.27) we have to check that the right-hand sides of (3.28) and (3.31) are equal. Using (3.18) of Proposition 3.2, we get , where is a mean zero Bernoulli random variable assuming the values and with positive probabilities such that . Since , we have
(3.32)

Using the definition of the Hoeffding function we see that the right-hand sides of (3.28) and (3.31) are equal.

Proof of Theorem 1.3.

We use Theorem 1.1. In bounds of this theorem we substitute the value of being the right-hand side of (2.27), where a bound of type is given. We omit related elementary analytical manipulations.

Proof.

To describe the limiting behavior of we use Hoeffding's decomposition. We can write
(3.33)
with kernels and such that
(3.34)
To derive (3.33), use the representation of as a -statistic (3.8). The kernel functions and are degenerated, that is, and for all . Therefore
(3.35)
with
(3.36)
It follows that in cases where the statistic is asymptotically normal:
(3.37)
where is a standard normal random variable. It is easy to see that if and only if is a Bernoulli random variable symmetric around its mean. In this special case we have , and (3.33) turns to
(3.38)
where are i.i.d. Rademacher random variables. It follows that
(3.39)

which completes the proof of (1.7) and (1.8).

## Notes

### Acknowledgment

Figure 1 was produced by N. Kalosha. The authors thank him for the help. The research was supported by the Lithuanian State Science and Studies Foundation, Grant no. T-15/07.

### References

1. [1]
Hoeffding W: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 1963, 58: 13–30. 10.2307/2282952
2. [2]
Bentkus V, van Zuijlen M: On conservative confidence intervals. Lithuanian Mathematical Journal 2003,43(2):141–160. 10.1023/A:1024210921597
3. [3]
Talagrand M: The missing factor in Hoeffding's inequalities. Annales de l'Institut Henri Poincaré B 1995,31(4):689–702.
4. [4]
Pinelis I: Optimal tail comparison based on comparison of moments. In High Dimensional Probability (Oberwolfach, 1996), Progress in Probability. Volume 43. Birkhäuser, Basel, Switzerland; 1998:297–314.
5. [5]
Pinelis I: Fractional sums and integrals of -concave tails and applications to comparison probability inequalities. In Advances in Stochastic Inequalities (Atlanta, Ga, 1997), Contemporary Mathematics. Volume 234. American Mathematical Society, Providence, RI, USA; 1999:149–168.
6. [6]
Bentkus V: A remark on the inequalities of Bernstein, Prokhorov, Bennett, Hoeffding, and Talagrand. Lithuanian Mathematical Journal 2002,42(3):262–269. 10.1023/A:1020221925664
7. [7]
Bentkus V: On Hoeffding's inequalities. The Annals of Probability 2004,32(2):1650–1673. 10.1214/009117904000000360
8. [8]
Bentkus V, Geuze GDC, van Zuijlen M: Trinomial laws dominating conditionally symmetric martingales. Department of Mathematics, Radboud University Nijmegen; 2005.Google Scholar
9. [9]
Bentkus V, Kalosha N, van Zuijlen M: On domination of tail probabilities of (super)martingales: explicit bounds. Lithuanian Mathematical Journal 2006,46(1):3–54.