Testing in the Presence of Nuisance Parameters: Some Comments on Tests Post-Model-Selection and Random Critical Values

Leeb, Hannes; Pötscher, Benedikt M.

doi:10.1007/978-3-319-41573-4_4

Hannes Leeb² &
Benedikt M. Pötscher²

Part of the book series: Contributions to Statistics ((CONTRIB.STAT.))

3503 Accesses
12 Citations

Abstract

We point out that the ideas underlying some test procedures recently proposed for testing post-model-selection (and for some other test problems) in the econometrics literature have been around for quite some time in the statistics literature. We also sharpen some of these results in the statistics literature. Furthermore, we show that some intuitively appealing testing procedures, that have found their way into the econometrics literature, lead to tests that do not have desirable size properties, not even asymptotically.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This framework obviously allows for “one-sided” as well as for “two-sided” alternatives (when these concepts make sense) by a proper definition of the test-statistic.
2.
While Andrews and Guggenberger [1] do not consider a finite-sample framework but rather a “moving-parameter” asymptotic framework, the underlying idea is nevertheless exactly the same.
3.
Loh [8] actually considers the random critical value $c_{n,\eta _{n},\mathrm{Loh}^{{\ast}}}(\delta )$ given by $\sup _{\beta \in I_{n}}c_{n,\beta }(\delta )$, which typically does not lead to a level δ test in finite samples in view of Proposition 1 (since $c_{n,\eta _{n},\mathrm{Loh}^{{\ast}}}(\delta ) \leq c_{n,\sup }(\delta )$). However, Loh [8] focuses on the case where η _n → 0 and shows that then the size of the test converges to δ; that is, the test is asymptotically level δ if η _n → 0. See also Remark 4.
4.
This construction is no longer suggested in [11].
5.
The corresponding calculation in previous versions of this paper had erroneously omitted the term $\rho \left (1 -\rho ^{2}\right )^{-1/2}\gamma ^{\max }$ from the expression on the far right-hand side of the subsequent display. This is corrected here by accounting for this term. Alternatively, one could drop the probability involving $\left \vert \hat{\gamma }(U)\right \vert \leq c$ altogether from the proof and work with the resulting lower bound.

References

Andrews, D.W.K., Guggenberger, P.: Hybrid and size-corrected subsampling methods. Econometrica 77, 721–762 (2009)
Article MathSciNet MATH Google Scholar
Berger, R.L., Boos, D.D.: P values maximized over a confidence set for the nuisance parameter. J. Am. Stat. Assoc. 89 1012–1016 (1994)
MathSciNet MATH Google Scholar
Bickel, P.J., Doksum, K.A.: Mathematical Statistics: Basic Ideas and Selected Topics. Holden-Day, Oakland (1977)
MATH Google Scholar
DiTraglia, F.J.: Using invalid instruments on purpose: focused moment selection and averaging for GMM. Working Paper, Version November 9, 2011 (2011)
Google Scholar
Kabaila, P., Leeb, H.: On the large-sample minimal coverage probability of confidence intervals after model selection. J. Am. Stat. Assoc. 101, 619–629 (2006)
Article MathSciNet MATH Google Scholar
Leeb, H., Pötscher, B.M.: The finite-sample distribution of post-model-selection estimators and uniform versus non-uniform approximations. Economet. Theor. 19, 100–142 (2003)
Article MATH Google Scholar
Leeb, H., Pötscher, B.M.: Model selection and inference: facts and fiction. Economet. Theor. 21, 29–59 (2005)
Article MathSciNet MATH Google Scholar
Loh, W.-Y.: A new method for testing separate families of hypotheses. J. Am. Stat. Assoc. 80, 362–368 (1985)
Article MathSciNet MATH Google Scholar
Liu, C.-A.: A plug-in averaging estimator for regressions with heteroskedastic errors. Working Paper, Version October 29, 2011 (2011)
Google Scholar
McCloskey, A.: Powerful procedures with correct size for test statistics with limit distributions that are discontinuous in some parameters. Working Paper, Version October 2011 (2011)
Google Scholar
McCloskey, A.: Bonferroni-based size correction for nonstandard testing problems. Working Paper, Brown University (2012)
Google Scholar
Romano, J.P., Shaikh, A., Wolf, M.: A practical Two-step method for testing moment inequalities. Econometrica 82, 1979–2002 (2014)
Article MathSciNet Google Scholar
Silvapulle, M.J.: A test in the presence of nuisance parameters. J. Am. Stat. Assoc. 91, 1690–1693 (1996) (Correction, ibidem 92 (1997) 801)
Google Scholar
Williams, D.A.: Discrimination between regression models to determine the pattern of enzyme synthesis in synchronous cell cultures. Biometrics 26, 23–32 (1970)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Vienna, Vienna, Austria
Hannes Leeb & Benedikt M. Pötscher

Authors

Hannes Leeb
View author publications
You can also search for this author in PubMed Google Scholar
Benedikt M. Pötscher
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hannes Leeb .

Editor information

Editors and Affiliations

Department of Mathematics & Statistics, Brock University, St. Catherines, Ontario, Canada
S. Ejaz Ahmed

Appendix

Lemma 5

Suppose a random variable $\hat{c}_{n}$ satisfies $\Pr \left (\hat{c}_{n} \leq c^{{\ast}}\right ) = 1$ for some real number c ^∗ as well as $\Pr \left (\hat{c}_{n} < c^{{\ast}}\right ) > 0$ . Let S be real-valued random variable. If for every non-empty interval J in the real line

$$\displaystyle{ \Pr \left (S \in J\mid \hat{c}_{n}\right ) > 0 }$$

(16)

holds almost surely, then

$$\displaystyle{ \Pr \left (\hat{c}_{n} < S \leq c^{{\ast}}\right ) > 0. }$$

The same conclusion holds if in (16) the conditioning variable $\hat{c}_{n}$ is replaced by some variable w _n , say, provided that $\hat{c}_{n}$ is a measurable function of w _n.

Proof

Clearly

$$\displaystyle{ \Pr \left (\hat{c}_{n} < S \leq c^{{\ast}}\right ) = E\left [\Pr \left (S \in (\hat{c}_{ n},c^{{\ast}}]\mid \hat{c}_{ n}\right )\right ] = E\left [\Pr \left (S \in (\hat{c}_{n},c^{{\ast}}]\mid \hat{c}_{ n}\right )\boldsymbol{1}\left (\hat{c}_{n} < c^{{\ast}}\right )\right ], }$$

the last equality being true, since the first term in the product is zero on the event $\hat{c}_{n} = c^{{\ast}}$. Now note that the first factor in the expectation on the far right-hand side of the above equality is positive almost surely by (16) on the event $\left \{\hat{c}_{n} < c^{{\ast}}\right \}$, and that the event $\left \{\hat{c}_{n} < c^{{\ast}}\right \}$ has positive probability by assumption. ■

Recall that $\bar{c}_{\gamma }(v)$ has been defined in the proof of Theorem 2.

Lemma 6

Assume ρ _n ≡ρ ≠ 0. Suppose 0 < v < 1. Then the map $\gamma \rightarrow \bar{ c}_{\gamma }(v)$ is continuous on $\mathbb{R}$ . Furthermore, $\lim _{\gamma \rightarrow \infty }\bar{c}_{\gamma }(v) =\lim _{\gamma \rightarrow -\infty }\bar{c}_{\gamma }(v) = \Phi ^{-1}(1 - v)$.

Proof

If γ _l → γ, then $\bar{h}_{\gamma _{l}}$ converges to $\bar{h}_{\gamma }$ pointwise on $\mathbb{R}$. By Scheffé’s Lemma, $\bar{H}_{\gamma _{l}}$ then converges to $\bar{H}_{\gamma }$ in total variation distance. Since $\bar{H}_{\gamma }$ is strictly increasing on $\mathbb{R}$, convergence of the quantiles $\bar{c}_{\gamma _{l}}(v)$ to $\bar{c}_{\gamma }(v)$ follows. The second claim follows by the same argument observing that $\bar{h}_{\gamma }$ converges pointwise to a standard normal density for γ → ±∞. ■

Lemma 7

Assume ρ _n ≡ρ ≠ 0.

(i)
Suppose 0 < v ≤ 1∕2. Then for some $\gamma \in \mathbb{R}$ we have that $\bar{c}_{\gamma }(v)$ is larger than $\Phi ^{-1}(1 - v)$.
(ii)
Suppose 1∕2 ≤ v < 1. Then for some $\gamma \in \mathbb{R}$ we have that $\bar{c}_{\gamma }(v)$ is smaller than $\Phi ^{-1}(1 - v)$.

Proof

Standard regression theory gives

$$\displaystyle{ \hat{\alpha }_{n}(U) =\hat{\alpha } _{n}(R) +\rho \sigma _{\alpha,n}\hat{\beta }_{n}(U)/\sigma _{\beta,n}, }$$

with $\hat{\alpha }_{n}(R)$ and $\hat{\beta }_{n}(U)$ being independent; for the latter cf., e.g., [6], Lemma A.1. Consequently, it is easy to see that the distribution of T _n(α ₀) under $P_{n,\alpha _{0},\beta }$ is the same as the distribution of

$$\displaystyle\begin{array}{rcl} T^{{\prime}}& =& T^{{\prime}}(\rho,\gamma ) = \left (\sqrt{1 -\rho ^{2}}W +\rho Z\right )\boldsymbol{1}\left \{\left \vert Z+\gamma \right \vert > c\right \} {}\\ & & +\left (W -\rho \frac{\gamma } {\sqrt{1 -\rho ^{2}}}\right )\boldsymbol{1}\left \{\left \vert Z+\gamma \right \vert \leq c\right \}, {}\\ \end{array}$$

where, as before, γ = n ^1∕2 β∕σ _β, n, and where W and Z are independent standard normal random variables.

We now prove (i): Let q be shorthand for $\Phi ^{-1}(1 - v)$ and note that q ≥ 0 holds by the assumption on v. It suffices to show that $\Pr \left (T^{{\prime}}\leq q\right ) < \Phi (q)$ for some γ. We can now write

$$\displaystyle\begin{array}{rcl} \Pr \left (T^{{\prime}}\leq q\right )& =& \Pr \left (\sqrt{1 -\rho ^{2}}W +\rho Z \leq q\right ) -\Pr \left (\left \vert Z+\gamma \right \vert \leq c,W \leq \frac{q -\rho Z} {\sqrt{1 -\rho ^{2}}}\right ) {}\\ & & +\Pr \left (\left \vert Z+\gamma \right \vert \leq c,W \leq q + \frac{\rho \gamma } {\sqrt{1 -\rho ^{2}}}\right ) {}\\ & =& \Phi (q) -\Pr (A) +\Pr (B). {}\\ \end{array}$$

Here, A and B are the events given in terms of W and Z. Picturing these two events as subsets of the plane (with the horizontal axis corresponding to Z and the vertical axis corresponding to W), we see that A corresponds to the vertical band where | Z +γ | ≤ c, truncated above the line where $W = (q -\rho Z)/\sqrt{1 -\rho ^{2}}$; similarly, B corresponds to the same vertical band | Z +γ | ≤ c, truncated now above the horizontal line where $W = q +\rho \gamma /\sqrt{1 -\rho ^{2}}$.

We first consider the case where ρ > 0 and distinguish two cases:

Case 1::: $\rho c \leq \left (1 -\sqrt{1 -\rho ^{2}}\right )q$.

In this case the set B is contained in A for every value of γ, with A∖B being a set of positive Lebesgue measure. Consequently, $\Pr (A) >\Pr (B)$ holds for every γ, proving the claim.

Case 2::: $\rho c > \left (1 -\sqrt{1 -\rho ^{2}}\right )q$.

In this case choose γ so that −γ − c ≥ 0, and, in addition, such that also $(q -\rho (-\gamma - c))/\sqrt{1 -\rho ^{2}} < 0$, which is clearly possible. Recalling that ρ > 0, note that the point where the line $W = (q -\rho Z)/\sqrt{1 -\rho ^{2}}$ intersects the horizontal line $W = q +\rho \gamma /\sqrt{1 -\rho ^{2}}$ has as its first coordinate $Z = -\gamma + (q/\rho )(1 -\sqrt{1 -\rho ^{2}})$, implying that the intersection occurs in the right half of the band where | Z +γ | ≤ c. As a consequence, $\Pr (B) -\Pr (A)$ can be written as follows:

$$\displaystyle{ \Pr (B) -\Pr (A) =\Pr (B\setminus A) -\Pr (A\setminus B) }$$

where

$$\displaystyle\begin{array}{rcl} B\setminus A& =& \left \{-\gamma + (q/\rho )(1 -\sqrt{1 -\rho ^{2}}) \leq Z \leq -\gamma + c,\right. {}\\ & & \left.(q -\rho Z)/\sqrt{1 -\rho ^{2}} < W \leq q +\rho \gamma /\sqrt{1 -\rho ^{2}}\right \} {}\\ \end{array}$$

and

$$\displaystyle\begin{array}{rcl} A\setminus B& =& \left \{-\gamma - c \leq Z \leq -\gamma + (q/\rho )(1 -\sqrt{1 -\rho ^{2}}),\right. {}\\ & & \left.q +\rho \gamma /\sqrt{1 -\rho ^{2}} < W \leq (q -\rho Z)/\sqrt{1 -\rho ^{2}}\right \}. {}\\ \end{array}$$

Picturing A∖B and B∖A as subsets of the plane as in the preceding paragraph, we see that these events correspond to two triangles, where the triangle corresponding to A∖B is larger than or equal (in Lebesgue measure) to that corresponding to B∖A. Since γ was chosen to satisfy −γ − c ≥ 0 and $(q -\rho (-\gamma - c))/\sqrt{1 -\rho ^{2}} < 0$, we see that each point in the triangle corresponding to A∖B is closer to the origin than any point in the triangle corresponding to B∖A. Because the joint Lebesgue density of (Z, W), i.e., the bivariate standard Gaussian density, is spherically symmetric and radially monotone, it follows that $\Pr (B\setminus A) -\Pr (A\setminus B) < 0$, as required.

The case ρ < 0 follows because T ^′(ρ, γ) has the same distribution as T ^′(−ρ, −γ).

Part (ii) follows, since T ^′(ρ, γ) has the same distribution as − T ^′(−ρ, γ). ■

Remark 8

If ρ _n ≡ ρ ≠ 0 and v = 1∕2, then $\bar{c}_{0}(1/2) = \Phi ^{-1}(1/2) = 0$, since $\bar{h}_{0}$ is symmetric about zero.

Remark 9

If ρ _n ≡ ρ = 0, then T _n(α ₀) is standard normally distributed for every value of β, and hence $\bar{c}_{\gamma }(v) = \Phi ^{-1}(1 - v)$ holds for every γ and v.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Leeb, H., Pötscher, B.M. (2017). Testing in the Presence of Nuisance Parameters: Some Comments on Tests Post-Model-Selection and Random Critical Values. In: Ahmed, S. (eds) Big and Complex Data Analysis. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-41573-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-41573-4_4
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41572-7
Online ISBN: 978-3-319-41573-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Testing in the Presence of Nuisance Parameters: Some Comments on Tests Post-Model-Selection and Random Critical Values

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Lemma 5

Proof

Lemma 6

Proof

Lemma 7

Proof

Remark 8

Remark 9

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation