1 Introduction

Testing whether a sample of data has been generated from a hypothesized distribution is one of the fundamental problems in statistics and econometrics. Traditionally such tests have been constructed from the empirical distribution function (edf). Even under the simplest of sampling schemes such tests are known to be not asymptotically pivotal, e.g. see Stephens (1976), Conover (1999) and Babu and Rao (2004). Moreover, under more sophisticated sampling schemes such tests can become prohibitively complex, see Bai (2003) and Corradi and Swanson (2006).

Instead, this paper provides tests based on a generalization of the consistent series density estimator of Crain (1974) and Barron and Sheu (1991) . Consistency is maintained when nuisance parameters are estimated as a preliminary step. This, when applied to the infinite dimensional likelihood ratio test of Portnoy (1988) generalizes the tests of Claeskens and Hjort (2004) and Marsh (2007) to test for specification.

The proposed procedure offers three advantages over those tests based on the edf. First they are asymptotically pivotal, and numerical experiments are designed and reported in support of this. This also implies automatic validity, including second-order as in Beran (1988), of bootstrap critical values. Valid bootstrap critical values for the non-pivotal edf based tests, e.g. as in Kojadinovic and Yan (2012), do not benefit from this. Second, they are generally more powerful than the most commonly used edf based tests. Again numerical evidence is presented to support this. Lastly, because they are based on a consistent density estimator, in the event of rejection the density estimator itself can be used to, for instance, consistently estimate the quantiles of the underlying variable.

The plan for the paper is as follows. The next section presents the density estimator and demonstrates that it converges in relative entropy to the population density. A corollary provides consistent quantile estimation, with accuracy demonstrated in numerical experiments. Section 3 provides the nonparametric test, establishes that it is asymptotically pivotal and consistent against fixed alternatives. A corollary establishes validity of bootstrap critical values. Numerical experiments are presented in support of these results as well as demonstrating some superiority over edf based tests. Section 4 concludes while two appendices present the proofs of two theorems and tables containing the outcome of the experiments, respectively.

2 Consistent nonparametric estimation of possibly misspecified densities

2.1 Theoretical results

Suppose that our sample \(\underline{y}=\left\{ Y_{i}\right\} _{i=1}^{n}\) consists of independent copies of a random variable Y having distribution, \(G\left( y\right) =\Pr [Y\le y]\) and density \(g\left( y\right) =dG\left( y\right) /dy.\) For this sample we fit the parametric likelihood, \( L=\prod _{i=1}^{n}f\left( Y_{i};\beta \right) \) for some chosen density function \(f\left( y;\beta \right) ,\) where \(\beta \) is an unknown \(k\times 1\) parameter. Denote the (quasi) maximum likelihood estimator for \(\beta \) by \( {\hat{\beta }}_{n}.\)

In this context the hypothesis to be tested is:

$$\begin{aligned} H_{0}:G(y)=F\left( y;\beta _{0}\right) , \end{aligned}$$
(1)

where \(F\left( y;\beta \right) =\int _{-\infty }^{y}f\left( z;\beta \right) dz \) and for some (unknown) value \(\beta _{0}.\) Tests for \(H_{0}\) will be detailed in the next Section. First, however, we assume the following, whether or not \(H_{0}\) holds:

Assumption 1

  1. (i)

    The density \(f\left( y;\beta \right) \) is measurable in y for every \(\beta \in {\varvec{B}}\), a compact subset of \(p-\)dimensional Euclidean space, and is continuous in \( \beta \) for every y.

  2. (ii)

    \(G\left( y\right) \) is an absolutely continuous distribution function, \(E\left[ \log [g\left( y\right) \right] \)exists and \( \left| \log f\left( y,\beta \right) \right| <v\left( y\right) \) for all \(\beta \) where \(v\left( .\right) \) is integrable with respect to \(G\left( .\right) .\)

  3. (iii)

    Let

    $$\begin{aligned} I\left( \beta \right) =E\left[ \ln \left[ \frac{g\left( y\right) }{f\left( y,\beta \right) }\right] \right] =\int _{y}\ln \left[ \frac{g\left( y\right) }{f\left( y,\beta \right) }\right] g\left( y\right) , \end{aligned}$$

    then \(I\left( \beta \right) \)has a unique minimum at some \(\beta _{*}\in {\varvec{B}}.\) (iv) \(F\left( Y;\beta \right) \) is continuously differentiable with respect to \(\beta \), such that \(H\left( \beta \right) =\partial F\left( Y_{i},\beta \right) /\partial \beta \) is finite, for all \(\beta \) in a closed ball of radius \(\epsilon >0,\) around \(\beta _{*}.\) (v) Both \(\log \left[ g\left( y\right) \right] \) and \(\log \left[ f\left( y;\beta \right) \right] \) have \(r\ge 2\) derivatives in y which are absolutely continuous and square integrable.

Immediate from White (1982, Theorems 2.1, 2.2 and 3.2) is that under Assumption 1(i–iii) \({\hat{\beta }}_{n}\) exists and

$$\begin{aligned} {\hat{\beta }}_{n}=\beta _{*}+O(n^{-1/2}). \end{aligned}$$

That is \({\hat{\beta }}_{n}\) is a \(\sqrt{n}\) consistent Quasi maximum likelihood estimator for the pseudo-true value \(\beta _{*}.\) Note that under \(H_{0}\) we have \(\beta _{*}=\beta _{0}.\) To proceed denote \({\hat{X}} _{i}=F\left( Y_{i},{\hat{\beta }}_{n}\right) \) having mean value expansion,

$$\begin{aligned} {\hat{X}}_{i}=F\left( Y_{i},\beta _{*}\right) +\left( {\hat{\beta }} _{n}-\beta _{*}\right) ^{\prime }H\left( \beta ^{+}\right) , \end{aligned}$$

where \(\beta ^{+}\) lies on a line segment joining \({\hat{\beta }}_{n}\) and \( \beta _{*}\). As a consequence we can write

$$\begin{aligned} {\hat{X}}_{i}={\bar{X}}_{i}+e_{i}, \end{aligned}$$
(2)

where \({\bar{X}}_{i}=F\left( Y_{i},\beta _{*}\right) \) and by construction and as a consequence of Assumption 1 (iv),

$$ \begin{aligned} e_{i}\in (-1,1)\quad \& \quad e_{i}=O_{p}\left( n^{-1/2}\right) , \end{aligned}$$
(3)

that is \(e_{i}\) is both bounded and degenerate.

Since the \({\bar{X}}_{i}\) are IID denote their common distribution and density function by \(U\left( x\right) =\Pr \left[ {\bar{X}}<x\right] \) and \(u\left( x\right) =dU\left( x\right) /dx,\) respectively. Here we will apply the series density estimator of Crain (1974) and Barron and Sheu (1991) to consistently estimate \(u\left( x\right) \) and thus quantiles of \(U\left( x\right) \), from which the quantiles of \(G\left( y\right) \) can be consistently recovered. Application of the density estimator requires choice of approximating basis, here we choose the simplest polynomial basis, similar to Marsh (2007).

We will approximate \(u\left( x\right) \) via the exponential family,

$$\begin{aligned} p_{x}(\theta )=\exp \left\{ \sum _{k=1}^{m}\theta _{k}x^{k}-\psi _{m}\right\} ,\quad \psi _{m}\left( \theta \right) =\ln \int _{0}^{1}\exp \left\{ \sum _{k=1}^{m}\theta _{k}x^{k}\right\} dx, \end{aligned}$$
(4)

where \(\psi _{m}\left( \theta \right) \) is the cumulant function, defined so that \(\int _{0}^{1}p_{x}(\theta )dx=1.\)

From Assumption 1 \(\log \left[ u\left( x\right) \right] \) has, at least, \( r-1 \) absolutely continuous derivatives and its \(r^{th}\) derivative is square integrable. According to Barron and Sheu (1991) there exists a unique \(\theta _{\left( m\right) }=\left( \theta _{1},\ldots ,\theta _{m}\right) ^{\prime }\) satisfying

$$\begin{aligned} \int _{0}^{1}x^{k}p_{x}\left( \theta _{\left( m\right) }\right) dx=\mu _{k}=\int _{0}^{1}x^{k}u\left( x\right) dx\quad \text {for }k=1,2,\ldots ,m, \end{aligned}$$
(5)

and, as \(\,m\rightarrow \infty ,\)\(p_{x}\left( \theta _{\left( m\right) }\right) \) converges, in relative entropy, to \(u\left( x\right) \) at rate \( m^{-2r},\) meaning that

$$\begin{aligned} E_{U}\left[ \ln \left( \frac{u\left( x\right) }{p_{x}\left( \theta _{\left( m\right) }\right) }\right) \right] =\int _{0}^{1}\ln \left( \frac{u\left( x\right) }{p_{x}\left( \theta _{\left( m\right) }\right) }\right) u\left( x\right) dx=O\left( m^{-2r}\right) , \end{aligned}$$

as \(m\rightarrow \infty .\) Moreover, if a sample \(\left\{ {\bar{X}} _{i}\right\} _{1}^{n}\) were available then if \(m^{3}/n\rightarrow 0\) and letting \({\bar{\theta }}_{\left( m\right) }\) be the unique solution to

$$\begin{aligned} \int _{0}^{1}x^{k}p_{x}\left( {\bar{\theta }}_{\left( m\right) }\right) dx=\frac{ \sum _{i=1}^{n}{\bar{X}}_{i}^{k}}{n}\quad \text {for }k=1,2,\ldots ,m, \end{aligned}$$
(6)

then \(p_{x}\left( {\bar{\theta }}_{\left( m\right) }\right) \) converges in relative entropy to \(u\left( x\right) ,\)

$$\begin{aligned} E_{U}\left[ \ln \left( \frac{u\left( x\right) }{p_{x}\left( {\bar{\theta }} _{\left( m\right) }\right) }\right) \right] =\int _{0}^{1}\ln \left( \frac{ u\left( x\right) }{p_{x}\left( {\bar{\theta }}_{\left( m\right) }\right) } \right) u\left( x\right) dx=O_{p}\left( \frac{m}{n}+m^{-2r}\right) , \end{aligned}$$

see Theorem 1 of Barron and Sheu (1991).

Here, however, the sample \(\left\{ {\bar{X}}_{i}\right\} _{1}^{n}\) is not available, instead we only observe \(\left\{ {\hat{X}}_{i}\right\} _{1}^{n}\) and consequently have \({\hat{\theta }}_{\left( m\right) }\) as the unique solution to

$$\begin{aligned} \int _{0}^{1}x^{k}p_{x}\left( {\hat{\theta }}_{\left( m\right) }\right) dx=\frac{ \sum _{i=1}^{n}{\hat{X}}_{i}^{k}}{n}\quad \text {for }k=1,2,\ldots ,m. \end{aligned}$$
(7)

Note that the Eqs (5), (6) and (7 ) define one-to-one mappings between the sample space \(\varOmega _{\left( m\right) }\in {\mathbb {R}}^{m}\) and the parameter space \(\varTheta _{\left( m\right) }\in {\mathbb {R}}^{m}\) in the exponential family, see Barndorff-Nielsen (1978). We can therefore define three pairs of m dimensional parameter and statistics, respectively as \(\left\{ \theta _{\left( m\right) }:\mu _{\left( m\right) }\right\} ,\)\(\left\{ {\bar{\theta }} _{\left( m\right) }:{\bar{X}}_{\left( m\right) }\right\} \) and \(\left\{ \hat{ \theta }_{\left( m\right) }:{\hat{X}}_{\left( m\right) }\right\} ,\) where \(\mu _{\left( m\right) }=\left\{ \mu _{k}\right\} _{k=1}^{m}\), \({\bar{X}}_{\left( m\right) }=\left\{ n^{-1}\sum _{i=1}^{n}{\bar{X}}_{i}^{k}\right\} _{k=1}^{m}\) and \({\hat{X}}_{\left( m\right) }=\left\{ n^{-1}\sum _{i=1}^{n}{\hat{X}} _{i}^{k}\right\} _{k=1}^{m}.\) Generically these mappings can be expressed via

$$\begin{aligned} \left\{ \theta _{(m)}^{*}:\left\{ \mu _{k}^{*}\right\} _{1}^{m}\right\} \quad \text {where}\quad \int _{0}^{1}x^{k}p_{x}\left( \theta _{(m)}^{*}\right) dx=\mu _{k}^{*},\;k=1,\ldots ,m. \end{aligned}$$
(8)

The uniqueness of these mappings can be exploited in the following Theorem, proved in Appendix A, to show that the density estimator \(p_{x}\left( {\hat{\theta }}_{\left( m\right) }\right) \) converges in relative entropy at the same rate as \(p_{x}\left( {\bar{\theta }}_{\left( m\right) }\right) .\)

Theorem 1

Let \({\hat{\theta }}_{(m)}\) denote the estimated exponential parameter determined by (7) then under Assumption 1 and for \(m,n\rightarrow \infty \) with \(m^{3}/n\rightarrow 0,\)

$$\begin{aligned} E_{U}\left[ \ln \left( \frac{u\left( x\right) }{p_{x}\left( {\hat{\theta }} _{\left( m\right) }\right) }\right) \right] =\int _{0}^{1}\ln \left( \frac{ u\left( x\right) }{p_{x}\left( {\hat{\theta }}_{\left( m\right) }\right) } \right) u\left( x\right) dx=O_{p}\left( \frac{m}{n}+m^{-2r}\right) . \end{aligned}$$

According to Theorem 1, in terms of the density estimator, at least, the effect of observing \(\left\{ {\hat{X}}_{1},\ldots ,{\hat{X}}_{n}\right\} \) rather than \(\left\{ X_{1},\ldots ,X_{n}\right\} \) is asymptotically negligible under Assumption 1 and for either choice of basis. Moreover, if the goal were only nonparametric estimation of the density, then the optimal choice of the dimension m is the same as when no parameters are estimated, i.e. \( m_{opt}\propto n^{\frac{1}{1+2r}}\) (with a mini-max rate of \(m_{n}^{*}=O\left( n^{-1/5}\right) ,\) since \(r\ge 2\) by assumption). The optimal rate the rate of convergence of the estimator remains of order \(O_{p}\left( n^{-\frac{2r}{1+2r}}\right) .\) It should not be surprising that the rate of convergence is unaffected when parameters are replaced by \(\sqrt{n}\) consistent estimators. Theorem 1 thus generalises the results of Crain (1974) and Barron and Sheu (1991), as summarized in Lemma 1 of Marsh (2007), by permitting estimation of nuisance parameters as a preliminary step.

Additionally, we may recover the quantiles of Y from those implied by the approximating series density estimator. This is captured in the following Corollary, which follows immediately since convergence in relative entropy implies convergence in law.

Corollary 1

Let \({\hat{T}}_{n,m}\in \left( 0,1\right) \) be a random variable having density function \(p_{t}\left( {\hat{\theta }} _{\left( m\right) }\right) \) where \({\hat{\theta }}_{\left( m\right) } \) is defined by (7), then

$$\begin{aligned} {\hat{T}}_{n,m}\underset{\mathcal {L}}{\rightarrow }{\bar{X}}, \end{aligned}$$

as \(n,m\rightarrow \infty ,m^{3}/n\rightarrow 0.\) I.e. \({\hat{T}}_{n,m}\) converges in law to the random variable \({\bar{X}}. \)

2.2 Numerical application of a quantile estimator

The consequence of Corollary 1 is that the quantiles associated with \( T_{n,m} \) converge to those of Y,  i.e. letting \(q_{A}\left( \pi \right) ,\) for \(0<\pi <1,\) denote the quantile function of the random variable A,  we have

$$\begin{aligned} q_{F^{-1}\left( {\hat{T}}_{n,m};{\hat{\beta }}_{n}\right) }\left( \pi \right) =q_{Y}\left( \pi \right) +o_{p}\left( 1\right) . \end{aligned}$$
(9)

The following set of experiments compare the Mean Square Errors (MSE) of estimators for the quantiles of Y based on those of \({\hat{T}}_{n,m}\) for \( m=3,9\) and for quantiles calculated at the probabilities, \(\pi =.05,.25,.50,.75,.95\). We also compare the accuracy of estimated quantiles when unknown parameters are estimated against cases where they are not.

First suppose that \(Y_{i}\sim IID\,Y:=t_{\left( 4\right) }\) but we estimate the Gaussian likelihood implied by \(N\left( \mu ,\sigma ^{2}\right) \). Define

$$ \begin{aligned} X_{i}^{*}=\frac{1}{2}\left[ 1+\hbox {erf}\left( Y_{i}\right) \right] \quad \& \quad {\hat{X}}_{i}=\frac{1}{2}\left[ 1+\hbox {erf}\left( \frac{Y_{i}- {\bar{y}}}{{\hat{\sigma }}}\right) \right] ,\quad i=1,\ldots ,n, \end{aligned}$$

the first obtained from the (misspecified) Gaussian model imposing zero mean and unit variance, and the second from the Gaussian model with estimated mean and variance.

Following the development above, as well as that of Barron and Sheu (1991) , let \(\theta _{\left( m\right) }^{*}\) and \({\hat{\theta }}_{\left( m\right) } \) and denote the estimated parameters for the exponential series density estimators for the samples \(\left\{ X_{i}^{*}\right\} _{1}^{n}\) and \( \left\{ {\hat{X}}_{i}\right\} _{1}^{n}\), respectively. Let \(T_{n,m}^{*}\) have density \(p_{t}\left( \theta _{\left( m\right) }^{*}\right) \) (note that this is just straight forward application of the original set-up of Barron and Sheu 1991) and let \({\hat{T}}_{n,m}\) have density \(p_{t}\left( {\hat{\theta }}_{\left( m\right) }\right) ,\) as in Corollary 1. The pairs of estimated quantiles for Y are then constructed as in

$$\begin{aligned} q_{Y}^{*}\left( \pi \right) =\sqrt{2}\hbox {erf}{}^{-1}\left( 2q_{T_{n,m}^{*}}\left( \pi \right) -1\right) \quad \text {and}\quad {\hat{q}}_{Y}\left( \pi \right) ={\bar{y}}+{\hat{\sigma }}\sqrt{2}\hbox {erf}{}^{-1}\left( 2q_{{\hat{T}}_{n,m}}\left( \pi \right) -1\right) . \end{aligned}$$

The MSE of these quantiles, for each probability \(\pi ,\) are presented in Appendix B, Tables 1a for \(m=3\) and 1b for \(m=9.\)

Next suppose that \(Y_{i}\sim IID\,Y:=\varGamma \left( 1.2,1\right) \) and define

$$ \begin{aligned} X_{i}^{*}=1-e^{-Y_{i}}\quad \& \quad {\hat{X}}_{i}=1-e^{-Y_{i}/{\bar{y}} },\quad i=1,\ldots ,n. \end{aligned}$$

Analogous to above let \(T_{n,m}^{*}\) and \({\hat{T}}_{n,m}\) have densities \( p_{t}\left( \theta _{\left( m\right) }^{*}\right) \) and \(p_{t}\left( {\hat{\theta }}_{\left( m\right) }\right) \) and so pairs of estimated quantiles for Y are constructed via,

$$\begin{aligned} q_{Y}^{*}\left( \pi \right) =-\ln \left( 1-q_{T_{n,m}^{*}}\left( \pi \right) \right) \quad \text {and}\quad {\hat{q}}_{Y}\left( \pi \right) =-{\bar{y}} _{n}\ln \left( 1-q_{{\hat{T}}_{n,m}}\left( \pi \right) \right) . \end{aligned}$$

The MSE of these quantiles, for each probability \(\pi ,\) are presented in Table 1c \((m=3)\) and 1d \((m=9).\)

The consistency of the quantiles obtained from, in particular, \({\hat{T}} _{m,n} \) is illustrated clearly in Table 1. More relevant, however, is that estimating the parameters of the fitted model as a preliminary step produces quantile estimators that can be superior, as the sample size becomes large, to those obtained by simply imposing parameter values, as can be clearly seen by comparing the right and left panels in Table 1. Note also that although the larger value of m yields more accurate quantile estimates in these cases, this is at some computational cost and, in other cases, potential numerical instability. Although this latter possibility is greatly mitigated, since the \(\left\{ {\hat{X}}_{i}\right\} _{i=1}^{n}\) are bounded.

3 Consistent, asymptotically pivotal tests for goodness of fit

3.1 Main results

Here we provide a test of the null hypothesis that the fitted likelihood is correctly specified as in (1).The previous section generalized the Barron and Sheu (1991) series density estimator and the resulting nonparametric likelihood ratio test then generalizes the test of Marsh (2007).

To proceed note that when \(H_{0}\) is true then in Assumption 1, \(\beta _{*}=\beta _{0}\) and in (2) \({\bar{X}}_{i}=F\left( Y_{i},\beta _{0}\right) \sim IIDU\left[ 0,1\right] .\) Direct generalization of the principle in Marsh (2007) means that (1) can be tested via,

$$\begin{aligned} H_{0}:\lim _{m\rightarrow \infty }\theta _{\left( m\right) }=0_{(m)}, \end{aligned}$$
(10)

in the exponential family (4), where \(\theta _{\left( m\right) }\) is the solution to (5) and \(0_{(m)}\) is an \(m\times 1\) vector of zeros.

The likelihood ratio test of Portnoy (1988) applied via the density estimator of Crain (1974) and Barron and Sheu (1991)obtained from the sample \(\left\{ {\hat{X}}_{1},\ldots ,{\hat{X}}_{n}\right\} \) is

$$\begin{aligned} {\hat{\lambda }}_{m}=2\sum _{i=1}^{n}\log \left[ \frac{p_{{\hat{X}}_{i}}\left( {\hat{\theta }}_{(m)}\right) }{p_{{\hat{X}}_{i}}\left( 0_{(m)}\right) }\right] =2n \left[ {\hat{\theta }}_{(m)}^{\prime }{\hat{X}}_{(m)}-\psi _{m}\left( {\hat{\theta }} _{\left( m\right) }\right) \right] , \end{aligned}$$

The null hypothesis is rejected for large values of \({\hat{\lambda }}_{m}.\)

Under any fixed alternative \(H_{1}:G\left( y\right) \ne F\left( y;\beta _{0}\right) \) the distribution of \({\bar{X}}_{i}=F_{i}\left( Y_{i};\beta _{*}\right) \) will not be uniform, i.e. \(\theta _{\left( m\right) }\ne 0_{(m)}.\) For every fixed alternative distribution for Y there is a unique alternative distribution for X on (0, 1) and associated with that distribution will be another consistent density estimator given by say, \( p_{x}(\theta _{\left( m\right) }^{1}).\) In practice, of course, \(\theta _{\left( m\right) }^{1}\) will be neither specified nor known. The following Theorem, again proved in Appendix A, gives the asymptotic distribution of the likelihood ratio test statistic both under the null hypothesis (10) and also demonstrates consistency against any such fixed alternative.

Theorem 2

Suppose that Assumption 1 holds, we construct \(\left\{ {\hat{X}}_{i}\right\} _{i=1}^{n}\) as described in (2), and we let \(m,n\rightarrow \infty \) with \(m^{3}/n\rightarrow 0,\) then:

  1. (i)

    Under the null hypothesis, \(H_{0}:G\left( y\right) =F\left( y;\beta _{0}\right) ,\)

    $$\begin{aligned} \,{\hat{\varLambda }}_{m}=\,\frac{{\hat{\lambda }}_{m}-m}{\sqrt{2m}}\rightarrow _{d}N(0,1). \end{aligned}$$
  2. (ii)

    Under any fixed alternative \(H_{1}:G\left( y\right) \ne F\left( y;\beta \right) ,\) for any \(\beta ,\) and for any finite \(\kappa ,\)

    $$\begin{aligned} \,\Pr \left[ {\hat{\varLambda }}_{m}\ge \kappa \right] \rightarrow 1.\quad \end{aligned}$$

Theorem 2 generalizes the test of Marsh (2007) establishing asymptotic normality and consistency against fixed alternatives when \(\beta \) has to be estimated. Via Claeskens and Hjort (2004) it is demonstrated that as \( n\rightarrow \infty \) with \(m^{3}/n\rightarrow 0\), then the test \({\bar{\varLambda }}_{m}\) (i.e. the, here, unfeasible test based on the notional sample \(\left\{ {\bar{X}}_{i}\right\} _{1}^{n}\)) has power against local alternatives parametrized by \(\theta _{\left( m\right) }-0_{(m)}=c\sqrt{\frac{\sqrt{m}}{n} }\) with \(c^{\prime }c=1\). Heuristically, implicit from the proof of Theorem 2 the properties of the test follow from; \({\hat{\varLambda }}_{m}-{\bar{\varLambda }} _{m}=O_{p}\left( \sqrt{\frac{m}{n}}\right) ,\) and so \({\hat{\varLambda }}_{m}\) has power against that same rate of local-alternatives.

3.2 Testing for normality or exponentiality

The likelihood ratio test \({\hat{\varLambda }}_{m}\) is asymptotically pivotal, specifically standard normal. Competitor tests, such as KS, CM and AD (these tests are mathematically detailed in Stephens 1976 or Conover 1999)) are not pivotal, although asymptotic critical values are readily available for all cases of testing for Exponentiality and Normality.

First we will demonstrate that indeed asymptotic critical values for nonparametric likelihood tests do have close to nominal size for large values of n and m. We are interested in testing the null hypotheses

$$ \begin{aligned} H_{0}^{E}:Y\sim Exp\left( 1\right) \quad \& \quad H_{0}^{N}:Y\sim N\left( 0,1\right) , \end{aligned}$$

with nominal significance levels 10, 5 and \(1\%\) and based on sample sizes \(n=25,50,100\) and 200. Letting \({\bar{y}}_{n}\) and \({\hat{\sigma }} _{n}^{2}\) be the estimated mean and variance (i.e. \({\hat{\beta }}_{n}={\bar{y}} _{n}\) for \(H_{0}^{E}\) and \({\hat{\beta }}_{n}=\left( {\bar{y}}_{n},{\hat{\sigma }} _{n}^{2}\right) ^{\prime }\) for \(H_{0}^{N}\)) then the tests are constructed from the mapping to \(\left( 0,1\right) ;\)

$$\begin{aligned} {\hat{X}}_{i}=1-e^{-Y_{i}/{\bar{y}}_{n}}, \end{aligned}$$
(11)

to test \(H_{0}^{E},\) and

$$\begin{aligned} {\hat{X}}_{i}=\frac{1}{2}\left[ 1+\hbox {erf}\left( \frac{Y_{i}-{\bar{y}}_{n}}{ {\hat{\sigma }}_{n}}\right) \right] , \end{aligned}$$
(12)

to test \(H_{0}^{N}.\)

Table 2 in Appendix B provides rejection frequencies for the tests constructed for values of \(m=3,5,7,9,11,17.\) The left hand panel of numbers correspond to testing \(H_{0}^{E}\) and the right to \(H_{0}^{N},\) critical values at the 1, 5 and 10% significance level from the standard normal distribution are used throughout.

The purpose of these experiments is only to demonstrate that the finite sample performance of the tests clearly improves as both n and m increase, as predicted by Theorem 2(i). Note the use of three significance levels to better illustrate convergence for large values of both m and n.

Although competitor tests are not asymptotically pivotal (and therefore no comparisons under the null are made) instead Table 3 compares the \(5\%\) size corrected powers of two variants of the tests, with \(m=3\) and \(m=9\) with the three direct competitors for a single sample size of \(n=100\). Table 3a and b present rejection frequencies for these tests and the KS, CM and AD tests for testing \(H_{0}^{N}\) under alternatives that the data is instead drawn from,

$$\begin{aligned} H_{1}^{a}:Y\sim t_{\left( v\right) },\quad H_{1}^{b}:Y\sim \chi _{\left( v\right) }^{2}-v. \end{aligned}$$

Table 3c, d and e, consider alternatives where the moments of the data are not correctly specified, i.e.

$$\begin{aligned} H_{1}^{c}&:&Y_{i}|Y_{i-1}\sim N\left( vY_{i-1},1\right) , \\ H_{1}^{d}&:&Y_{i}|Y_{i-1}\sim N\left( 0,1+vY_{i-1}^{2}\right) , \\ H_{1}^{e}&:&Y_{i}\sim N\left( v\times 1\left( i>\left\lfloor n/2\right\rfloor \right) ,1\right) , \end{aligned}$$

where \(1\left( .\right) \) denotes the indicator function. These latter three alternatives represent simplistic variants of common types of misspecification in econometric or financial data, i.e. misspecification of a conditional mean, variance or the possibility of a break in the mean (here half way through the sample). Note that these models imply that (2) will not be IID on \(\left( 0,1\right) \), but ergodicity implies the sample moments will still converge. Finally, Table 3f considers instead testing \( H_{0}^{E}\) against the alternative

$$\begin{aligned} H_{1}^{f}:Y\sim \varGamma \left( 1,v\right) . \end{aligned}$$

In each table the left hand panel corresponds to the case where we construct the test imposing the parameter values specified in the null rather than estimating them (i.e. using the, unfeasible, test of Marsh 2007)). The right hand panel has the rejection frequencies for tests based on estimated values, i.e. using (11) and (12), respectively.

The outcomes in Table 3 imply the following broad conclusions. The nonparametric likelihood test based \({\hat{\varLambda }}_{3}\) is the most powerful almost uniformly, across all alternatives and whether parameters are estimated or not. The observed lack of power of the most commonly used test, KS, is particularly evident, it is consistently the poorest performing test. The other edf based tests and \({\hat{\varLambda }}_{9}\) are broadly comparable in terms of their rejection frequencies, although AD is perhaps on average slightly more powerful and CM less powerful.

3.3 Bootstrap critical values

The proposed tests require a choice of dimension, m. The results presented in Tables  2 and 3 suggest an inevitable compromise, larger values of m imply tests having size closer to nominal, while smaller values of m imply tests having greater power. In order to overcome this compromise we can instead consider the properties of these tests when bootstrap critical values are instead employed.

For these tests the bootstrap procedure is as follows: On obtaining the MLE \( {\hat{\beta }}_{n}\) and calculating \({\hat{\varLambda }}_{3},\) as described above;

  1. 1.

    Generate bootstrap samples \(Y_{i}^{b}\sim IID\,F\left( y;{\hat{\beta }} _{n}\right) \) for \(i=1,\ldots ,n.\)

  2. 2.

    Estimate, via ML, \({\hat{\beta }}_{n}^{b}\) and construct \({\hat{X}} _{i}^{b}=F\left( Y_{i}^{b};{\hat{\beta }}_{n}^{b}\right) \) for \(i=1,\ldots ,n.\)

  3. 3.

    Repeat 1 and 2 B times, obtaining bootstrap versions of the test \(\hat{ \varLambda }_{3}^{b}.\)

  4. 4.

    Order the \({\hat{\varLambda }}_{3}^{b}\) so the bootstrap critical value at size \(\alpha \) is \(\kappa _{B}={\hat{\varLambda }}_{3}^{\left\lfloor (1-\alpha )B/100\right\rfloor }.\)

  5. 5.

    Denote the indicator function \({\hat{I}}_{B}^{\varLambda }=\left\{ \begin{array}{c} 1\quad \text { if }\quad {\hat{\varLambda }}_{3}>\kappa _{\alpha }^{B} \\ 0\quad \text { if }\quad {\hat{\varLambda }}_{3}\le \kappa _{\alpha }^{B} \end{array} \right\} .\)

We then reject \(H_{0}\) if \({\hat{I}}_{B}^{\varLambda }=1.\) First, however, the required asymptotic justification for the bootstrap is automatic given that \( {\hat{\varLambda }}_{m}\rightarrow _{d}N\left( 0,1\right) \) giving the following corollary to Theorem 2.

Corollary 2

Under Assumption 1 and if \(n,m\rightarrow \infty \) with \(m^{3}/n\rightarrow 0,\) then

$$\begin{aligned} \text {{i)}}\Pr \left[ {\hat{I}}_{B}^{\varLambda }=1\,|\,H_{0}\right]\rightarrow & {} \alpha , \\ \text {{ii)}}\quad \Pr \left[ {\hat{I}}_{B}^{\varLambda }=1\,|\,H_{1}\right]\rightarrow & {} 1. \end{aligned}$$

Here we will compare the performance of bootstrap critical values for \({\hat{\varLambda }}_{3}\) with those of CM and AD by repeating many of the experiments of Kojadinovic and Yan (2012). In this sub-section all experiments described in this sub-section are performed on the basis of \(B=200\) bootstrap replications. All nuisance parameters were estimated via maximum likelihood using Mathematica 8’s own numerical optimization algorithm.

The first set of experiments mimic those presented in Kojadinovic and Yan (2012, Table  1). Specifically we define the following Normal, Logistic, Gamma and Weibull Distributions;

$$ \begin{aligned} N^{*}\sim & {} N\left( 10,1\right) ,\quad L^{*}\sim L\left( 10,0.572\right) \,, \nonumber \\ \quad \varGamma ^{*}\sim & {} \varGamma \left( 98.671,1/9.866\right) \quad \& \quad W^{*}\sim W\left( 10.618,10.452\right) . \end{aligned}$$
(13)

The specific parameter values for \(L^{*},\varGamma ^{*}\) and \(W^{*}\) are chosen to minimize relative entropy (\(I\left( \beta \right) \) in Assumption 1(iii)) for each family to the distribution of \(N^{*}.\) Sample sizes of \( n=25,50,100,200\) are used in the experiments described below.

Table 4a contains the finite sample size of each test. It is clear that, under \(H_{0},\) the parametric bootstrap provides highly accurate critical values for all of the tests. On size alone there is nothing to choose between them. It is however, worth reporting, the computational time of each bootstrap critical value. For the \({\hat{\varLambda }}_{3}\) test critical values were obtained after 2.0 and 3.2 seconds for sample sizes \(n=100\) and 200,  respectively. The times for the other tests were similar to each other, taking around 0.9 and 2.9 seconds, respectively.

Table 4b and c contain the finite sample rejection frequencies under various alternative hypotheses, covering all pairwise permutations of the distributions in (13). As with the finite sample sizes it is not possible to pick a clear winner, moreover where they overlap the results are in line with those of Kojadinovic and Yan (2012). There is, of course, no uniformly most powerful test of goodness-of-fit so it is not surprising that the power of \({\hat{\varLambda }}_{3}\) is not always the largest. However its performance over this range of nulls and alternatives is far less volatile and in no circumstance is the test dominated by any of the other two.

4 Conclusions

This paper has generalized the series density estimator of Barron and Sheu (1991) to cover the case where parameters are estimated in the context of misspecified models. The nonparametric likelihood ratio tests of Marsh (2007) can be thus extended to cover the case of estimated parameters. The general aim has been to provide a testing procedure which overcomes the three main criticisms of edf based tests, i.e. that they are not pivotal, have low power, and offer no direction in case of rejection.

Instead the tests of this paper are shown to be asymptotically standard normal and they have power advantages over edf tests, whether critical values are size corrected or obtained by a consistent bootstrap. This suggests the proposed tests will be much simpler to generalize to the settings of Bai (2003) or Corradi and Swanson (2006). Finally, in the event of rejection, the series density estimator upon which the tests are built may be employed to consistently estimate the quantiles of the density from which the sample is taken.