Skip to main content

Heavy-Tail and Plug-In Robust Consistent Conditional Moment Tests of Functional Form

  • Chapter
  • First Online:

Abstract

We present asymptotic power-one tests of regression model functional form for heavy-tailed time series. Under the null hypothesis of correct specification the model errors must have a finite mean, and otherwise only need to have a fractional moment. If the errors have an infinite variance then in principle any consistent plug-in is allowed, depending on the model, including those with non-Gaussian limits and/or a sub-\(\sqrt{n}\)-convergence rate. One test statistic exploits an orthogonalized test equation that promotes plug-in robustness irrespective of tails. We derive chi-squared weak limits of the statistics, we characterize an empirical process method for smoothing over a trimming parameter, and we study the finite sample properties of the test statistics.

The author thanks an anonymous referee and Co-Editor Xiaohong Chen for constructive remarks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    We use the term “revealing” in the sense of “generically totally revealing” in Stinchcombe and White (Stinchcombe and White 1998, p. 299). A member \(h\) of a function space \( \mathcal H \) reveals misspecification \(E[y|x] \ne f\) when \(E[(y - f)h] \ne 0\). A space \(\mathcal H \) is generically totally revealing if all but a negligible number of \(h \in \) \(\mathcal H \) have this property. In the index function case \(h(x)\) \(= F(\gamma ^{\prime }\psi (x))\), where the weight \(h\) aligns with \(\gamma \) and the class \(\mathcal H \) with \( \Gamma \), this is equivalent to saying all \(\gamma \in \Gamma /S\) where \(S\) has Lebesgue measure zero.

  2. 2.

    Slow variation implies \(\lim _{n\rightarrow \infty }L(\lambda n)/L(n) = 1\) for any \(\lambda > 0\) (e.g. a constant, or \((\ln (n))^{a}\) for finite \( a > 0\): see Resnick 1987). In this chapter we always assume \(L(n) \rightarrow \infty \).

  3. 3.

    Consider if \(\epsilon _{t}\) is iid and asymmetric under \(H_{0}\), but symmetrically and non-negligibly trimmed with Tuesday, May 22, 2012 at 12:37 pm\(k_{1,\epsilon ,n} = k_{2,\epsilon ,n} \sim \lambda n\) where \(\lambda \in \) \((0,1)\). Then \(\hat{T}_{n}(\gamma ) \overset{p}{\rightarrow }\) \(\infty \) under \(H_{0}\) is easily verified. The test statistic reveals misspecification due entirely to trimming itself.

  4. 4.

    Under the alternative \(\beta ^{0}\) is the unique probability limit of \(\hat{ \beta }_{n}\), a “quasi-true” point that optimizes a discrepancy function, for example, a likelihood function, method of moments criterion or the Kullback–Leibler Information Criterion. See White (1982) amongst many others.

  5. 5.

    The rate of convergence for some minimum discrepancy estimators may be below \(n^{1/2}\), even for thin tailed data, in contexts involving weak identification, kernel smoothing, and in-fill asymptotics. We implicitly ignored such cases here.

  6. 6.

    Other over-identifying restrictions can easily be included, but the GMTTM rate may differ from what we cite in the proof of Lemma 4.2 if they are not lags of \(y_{t}\). See Hill and Renault (2010).

  7. 7.

    If \(n = 800\) then \(k_{n} = [0.025\times 800/\ln (800)] = 2\) for each \(\{\epsilon _{t},y_{t-1},...,y_{t-5}\}\). Hence at most \(2\times 6 = 12\) observations are trimmed, which is \(1.5\,\%\) of \(800\).

  8. 8.

    See Hong and White (1995, Theorem 3.2) for defense of a slowly varying series length \(\ln (n).\)

  9. 9.

    See Hoffmann-Jørgensen (1991), cf. Dudley (1978).

  10. 10.

    LTTS and GMTTM require trimming fractiles for estimation: GMTTM requires fractiles \(\tilde{k}_{i,n}\) for each estimating equation \(\tilde{m}_{i,n,t}\) , and LTTS requires fractiles \(\tilde{k}_{\epsilon ,n}\) and \(\tilde{k}_{y,n}\) for \(\epsilon _{t}\) and \(y_{t-i}\). The given rates of convergence apply if for GMTTM \(\tilde{k}_{i,n} \sim \lambda \ln (n)\) (Hill and Renault 2010), and for LTTS \(\tilde{k}_{\epsilon ,n} \sim \lambda n/\ln (n)\) and \(\tilde{k}_{y,n} \sim \lambda \ln (n)\) (Hill 2011b), where \(\lambda > 0\) is chosen by the analyst and may be different in different places.

References

  • An H.Z., Huang F.C. (1996) The Geometrical Ergodicity of Nonlinear Autoregressive Models, Stat. Sin. 6, 943–956.

    Google Scholar 

  • Arcones M., Giné E. (1989) The Bootstrap of the Mean with Arbitrary Bootstrap Sample Size, Ann.I H. P. 25, 457–481.

    Google Scholar 

  • Bai J. (2003) Testing Parametric Conditional Distributions of Dynamic Models, Rev. Econ. Stat. 85, 531–549.

    Article  Google Scholar 

  • Bierens H.J. (1982) Consistent Model Specification Tests, J. Econometric 20, 105–13.

    Article  Google Scholar 

  • Bierens H.J. (1990) A Consistent Conditional Moment Test of Functional Form, Econometrica 58, 1443–1458.

    Article  Google Scholar 

  • Bierens H.J., Ploberger W. (1997) Asymptotic Theory of Integrated Conditional Moment Tests, Econometrica 65, 1129–1151.

    Article  Google Scholar 

  • Brock W.A., Dechert W.D., Scheinkman J.A., LeBaron B. (1996) A Test for Independence Based on the Correlation Dimension, Econometric Rev. 15, 197–235.

    Article  Google Scholar 

  • Brockwell P.J., Cline D.B.H. (1985) Linear Prediction of ARMA Processes with Infinite Variance, Stoch. Proc. Appl. 19, 281–296.

    Article  Google Scholar 

  • Carrasco M., Chen X. (2002) Mixing and Moment Properties of Various GARCH and Stochastic Volatility Models, Econometric Theory 18, 17–39.

    Article  Google Scholar 

  • Chan K.S. (1990) Testing for Threshold Autoregression, Ann. Stat. 18, 1886–1894.

    Article  Google Scholar 

  • Chen X. and Fan Y. (1999) Consistent Hypothesis Testing in Semiparametric and Nonparametric Models for Econometric Time Series, J. Econometrics 91, 373–401.

    Article  Google Scholar 

  • Corradi V., Swanson N.R. (2002) A Consistent Test for Nonlinear Out-of-Sample Predictive Accuracy, J. Econometrics 110, 353–381.

    Article  Google Scholar 

  • Csörgo S., Horváth L., Mason D. (1986) What Portion of the Sample Makes a Partial Sum Asymptotically Stable or Normal? Prob. Theory Rel. Fields 72, 1–16.

    Article  Google Scholar 

  • Davidson R., MacKinnon J., White H. (1983) Tests for Model Specification in the Presence of Alternative Hypotheses: Some Further Results, J. Econometrics 21, 53–70.

    Article  Google Scholar 

  • Davies R.B. (1977) Hypothesis Testing when a Nuisance Parameter is Present Only under the Alternative, Biometrika 64, 247–254.

    Google Scholar 

  • Davis R.A., Knight K., Liu J. (1992) M-Estimation for Autoregressions with Infinite Variance, Stoch. Proc. Appl. 40, 145–180.

    Article  Google Scholar 

  • de Jong R.M. (1996) The Bierens Test under Data Dependence, J. Econometrics 72, 1–32.

    Article  Google Scholar 

  • de Jong R.M., Davidson J. (2000) Consistency of Kernel Estimators of Heteroscedastic and Autocorrelated Covariance Matrices, Econometrica 68, 407–423.

    Article  Google Scholar 

  • de Lima P.J.F. (1997) On the Robustness of Nonlinearity Tests to Moment Condition Failure, J. Econometrics 76, 251–280.

    Article  Google Scholar 

  • Dehling, H., M. Denker, W. Phillip (1986) Central Limit Theorems for Mixing Sequences of Random Variables under Minimal Conditions, Ann. Prob. 14, 1359–1370.

    Article  Google Scholar 

  • Dette H. (1999) A Consistent Test for the Functional Form of a Regression Based on a Difference of Variance Estimators, Ann. Stat.27, 1012–1040.

    Article  Google Scholar 

  • Dufour J.M., Farhat A., Hallin M. (2006) Distribution-Free Bounds for Serial Correlation Coefficients in Heteroscedastic Symmetric Time Series, J. Econometrics 130, 123–142.

    Article  Google Scholar 

  • Doukhan P., Massart, P., Rio E. (1995) Invariance Principles for Absolutely Regular Empirical Processes, Ann. I. H. P. 31, 393–427.

    Google Scholar 

  • Dudley R. M. (1978) Central Limit Theorem for Empirical Processes. Ann. Prob. 6, 899–929.

    Article  Google Scholar 

  • Embrechts P., Goldie C.M. (1980) On Closure and Factorization Properties of Subexponential Distributions, J. Aus. Math. Soc. A, 29, 243–256.

    Article  Google Scholar 

  • Embrechts P., Klüppelberg C., Mikosch T. (1997) Modelling Extremal Events for Insurance and Finance. Springer-Verlag: Frankfurt.

    Google Scholar 

  • Eubank R., Spiegelman S. (1990) Testing the Goodness of Fit of a Linear Model via Nonparametric Regression Techniques, J. Amer. Stat. Assoc. 85, 387–392.

    Article  Google Scholar 

  • Fan Y., Li Q. (1996) Consistent Model Specification Tests: Omitted Variables and Semiparametric Functional Forms, Econometrica 64, 865–890.

    Article  Google Scholar 

  • Fan Y., Li Q. (2000) Consistent Model Specification Tests: Kernel-Based Tests Versus Bierens’ ICM Tests, Econometric Theory 16, 1016–1041.

    Article  Google Scholar 

  • Finkenstadt B., Rootzén H. (2003) Extreme Values in Finance, Telecommunications and the Environment. Chapman and Hall: New York. Parameter Space, Econometric Theory 26, 965–993.

    Google Scholar 

  • Gabaix, X. (2008) Power Laws, in The New Palgrave Dictionary of Economics, 2nd Edition, S. N. Durlauf and L. E. Blume (eds.), MacMillan.

    Google Scholar 

  • Gallant A.R. (1981) Unbiased Determination of Production Technologies. J. Econometrics, 20, 285–323.

    Article  Google Scholar 

  • Gallant A.R., White H. (1989) There Exists a Neural Network That Does Not Make Avoidable Mistakes, Proceedings of the Second Annual IEEE Conference on Neural Net., I:657–664.

    Google Scholar 

  • Hall P., Yao Q. (2003) Inference in ARCH and GARCH Models with Heavy-Tailed Errors, Econometrica 71, 285–317.

    Article  Google Scholar 

  • Hahn M.G., Weiner D.C., Mason D.M. (1991) Sums, Trimmed Sums and Extremes, Birkhäuser: Berlin.

    Book  Google Scholar 

  • Hansen B.E. (1996). Inference When a Nuisance Parameter Is Not Identified Under the Null Hypothesis, Econometrica 64, 413–430.

    Article  Google Scholar 

  • Härdle W., Mammen E. (1993) Comparing Nonparametric Versus Parametric Regression Fits, Ann. Stat. 21, 1926–1947.

    Article  Google Scholar 

  • Hausman J.A. (1978) Specification Testing in Econometrics, Econometrica 46, 1251–1271.

    Article  Google Scholar 

  • Hill J.B. (2008a) Consistent and Non-Degenerate Model Specification Tests Against Smooth Transition and Neural Network Alternatives, Ann. D’Econ. Statist. 90, 145–179.

    Google Scholar 

  • Hill J.B. (2008b) Consistent GMM Residuals-Based Tests of Functional Form, Econometric Rev.: forthcoming.

    Google Scholar 

  • Hill J.B. (2011a) Tail and Non-Tail Memory with Applications to Extreme Value and Robust Statistics, Econometric Theory 27, 844–884.

    Article  Google Scholar 

  • Hill J.B. (2011b) Robust M-Estimation for Heavy Tailed Nonlinear AR-GARCH, Working Paper, University of North Carolina - Chapel Hill.

    Google Scholar 

  • Hill J.B. (2011c) Supplemental Appendix for Heavy-Tail and Plug-In Robust Consistent Conditional Moments Tests of Functional Form, www.unc.edu/jbhill/ap_cmtrim.pdf.

  • Hill J.B. (2012) Stochastically Weighted Average Conditional Moment Tests of Functional Form: Stud. Nonlin. Dyn. Econometrics 16: forthcoming.

    Google Scholar 

  • Hill, J.B., Aguilar M. (2011) Moment Condition Tests for Heavy Tailed Time Series, J. Econometrics: Annals Issue on Extreme Value Theory: forthcoming.

    Google Scholar 

  • Hill J.B., Renault E. (2010) Generalized Method of Moments with Tail Trimming, submitted; Dept. of Economics, University of North Carolina - Chapel Hill.

    Google Scholar 

  • Hoffmann-Jørgensen J. (1991) Convergence of Stochastic Processes on Polish Spaces, Various Publication Series Vol. 39, Matematisk Institute, Aarhus University.

    Google Scholar 

  • Hong Y., White H. (1995) Consistent Specification Testing Via Nonparametric Series Regression, Econometrica 63, 1133–1159.

    Article  Google Scholar 

  • Hong Y., Lee Y-J. (2005) Generalized Spectral Tests for Conditional Mean Models in Time Series with Conditional Heteroscedasticity of Unknown Form, Rev. Econ. Stud. 72, 499–541.

    Article  Google Scholar 

  • Hornik K., Stinchcombe M., White H. (1989) Multilayer Feedforward Networks are Universal Approximators, Neural Net. 2, 359–366.

    Article  Google Scholar 

  • Hornik K., Stinchcombe M., White H. (1990) Universal Approximation of an Unknown Mapping and Its Derivatives Using Multilayer Feedforward Networks, Neural Net., 3, 551–560.

    Article  Google Scholar 

  • Ibragimov R., Müller U.K. (2010) t-Statistic based Correlation and Heterogeneity Robust Inference, J. Bus. Econ. Stat. 28, 453–468.

    Article  Google Scholar 

  • Lahiri S.N. (1995) On the Asymptotic Beahiour of the Moving Block Bootstrap for Normalized Sums of Heavy-Tailed Random Variables, Ann. Stat. 23, 1331–1349.

    Article  Google Scholar 

  • Leadbetter M.R., Lindgren G., Rootzén H. (1983) Extremes and Related Properties of Random Sequences and Processes. Springer-Verlag: New York.

    Book  Google Scholar 

  • Lee T., White H., Granger C.W.J. (1993) Testing for Neglected Nonlinearity in Time-Series Models: A Comparison of Neural Network Methods and Alternative Tests, J. Econometrics 56, 269–290.

    Article  Google Scholar 

  • Ling S. (2005) Self-Weighted LAD Estimation for Infinite Variance Autoregressive Models, J. R. Stat. Soc. B 67, 381–393.

    Article  Google Scholar 

  • McLeod A.I., Li W.K. (1983) Diagnostic Checking ARMA Time Series Models Using Squared Residual Autocorrelations, J. Time Ser. Anal. 4, 269–273.

    Article  Google Scholar 

  • Meitz M., Saikkonen P. (2008) Stability of Nonlinear AR-GARCH Models, J. Time Ser. Anal. 29, 453–475.

    Article  Google Scholar 

  • Newey W.K. (1985) Maximum Likelihood Specification Testing and Conditional Moment Tests, Econometrica 53, 1047–1070.

    Article  Google Scholar 

  • Peng L., Yao Q. (2003) Least Absolute Deviation Estimation for ARCH and GARCH Models, Biometrika 90, 967–975.

    Article  Google Scholar 

  • Pham T., Tran L. (1985) Some Mixing Properties of Time Series Models. Stoch. Proc. Appl. 19, 297–303.

    Article  Google Scholar 

  • Pollard D. (1984) Convergence of Stochastic Processes. Springer-Verlang, New York.

    Book  Google Scholar 

  • Ramsey J.B. (1969) Tests for Specification Errors in Classical Linear Least-Squares Regression, J. R. Stat. Soc. B 31, 350–371.

    Google Scholar 

  • Resnick S.I. (1987) Extreme Values, Regular Variation and Point Processes. Springer-Verlag: New York.

    Google Scholar 

  • Stinchcombe M., White H. (1989) Universal Approximation Using Feedforward Networks with Non-Sigmoid Hidden Layer Activation Functions, Proceedings of the International Joint Conference on Neural Net., I, 612–617.

    Google Scholar 

  • Stinchcombe M.B., White H. (1992) Some Measurability Results for Extrema of Random Functions Over Random Sets, Rev. Economic Stud., 59, 495–514.

    Article  Google Scholar 

  • Stinchcombe M.B., White H. (1998) Consistent Specification Testing with Nuisance Parameters Present Only Under the Alternative, Econometric Theory 14, 295–325.

    Article  Google Scholar 

  • Tsay R. (1986) Nonlinearity Tests for Time Series, Biometrika 73, 461–466.

    Article  Google Scholar 

  • White H. (1981) Consequences and Detection of Misspecified Nonlinear Regression Models, J. Amer. Stat. Assoc. 76, 419–433.

    Article  Google Scholar 

  • White H. (1982) Maximum Likelihood Estimation of Misspecified Models, Econometrica 50, 1–25.

    Article  Google Scholar 

  • White H. (1987) Specification Testing in Dynamic Models, in Truman Bewley, ed., Advances in Econometrics. Cambridge University Press: New York.

    Google Scholar 

  • White H. (1989a) A Consistent Model Selection Procedure Based on m-Testing, in C.W.J. Granger, ed., Modelling Economic Series: Readings in Econometric Methodology, p. 369–403. Oxford University Press: Oxford.

    Google Scholar 

  • White H. (1989b) Some Asymptotic Results for Learning in Single Hidden Layer Feedforward Network Models, J. Amer. Stat. Assoc., 84, 1003–1013.

    Article  Google Scholar 

  • White H. (1990) Connectionist Nonparametric Regression: Multilayer Feedforward Networks Can Learn Arbitrary Mappings, Neural Net., 3, 535–549.

    Article  Google Scholar 

  • Wooldridge J.M. (1990) A Unified Approach to Robust, Regression-Based Specification Tests, Econometric Theory 6, 17–43.

    Google Scholar 

  • Yatchew A.J. (1992) Nonparametric Regression Tests Based on Least Squares, Econometric Theory 8, 435–451.

    Article  Google Scholar 

  • Zheng J.X. (1996) A Consistent Test of Functional Form via Nonparametric Estimation Techniques, J. Econometrics 75, 263–289.

    Article  Google Scholar 

Download references

Acknowledgments

The author thanks an anonymous referee and Co-Editor Xiaohong Chen for constructive remarks.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonathan B. Hill .

Editor information

Editors and Affiliations

Appendices

We ignore for notational economy measurability issues that arise when taking a supremum over an index set. Assume all functions in this chapter satisfy Pollard (1984) permissibility criteria, the measure space that governs all random variables is complete, and therefore all majorants are measurable. Probability statements are therefore with respect to outer probability, and expectations over majorants are outer expectations. Cf. Dudley (1978) and Stinchcombe and White (1992).

Appendix A: Assumptions

We ignore for notational economy measurability issues that arise when taking a supremum over an index set. Assume all functions in this chapter satisfy Pollard (1984) permissibility criteria, the measure space that governs all random variables is complete, and therefore all majorants are measurable. Probability statements are therefore with respect to outer probability, and expectations over majorants are outer expectations. Cf. Dudley (1978) and Stinchcombe and White (1992).

Write thresholds and fractiles compactly \(c_{z,n}(\cdot ) = \max \{l_{z,n}(\cdot ),u_{z,n}(\cdot )\}\) and \(k_{j,n} = \max \left\{ k_{j,\epsilon ,n},k_{j,1,n},...,k_{j,q,n}\right\} \), define \(\sigma _{n}^{2}(\beta ,\gamma ) := E\left[ m_{n,t}^{*2}\left( \beta ,\gamma \right) \right] \) and

$$\begin{aligned}&J_{t}(\beta ,\gamma ):=-g_{t}\left( \beta \right) F\left( \gamma ^{\prime }\psi _{t}\right) \text{,} J_{n,t}^{*}(\beta ,\gamma ):=J_{t}(\beta ,\gamma )I_{n,t}(\beta )\text{,} \\&\hat{J}_{n,t}^{*}(\beta ,\gamma )=J_{t}(\beta )\hat{I}_{n,t}(\beta ) J_{n}^{*}(\beta ,\gamma ):=\frac{1}{n}\text{sum} _{t=1}^{n}J_{n,t}^{*}(\beta ,\gamma )\text{,} \\&\hat{J}_{n}^{*}(\beta ,\gamma ):=\frac{1}{n} \text{sum} _{t=1}^{n}\hat{J}_{n,t}^{*}(\beta ,\gamma ). \end{aligned}$$

Drop \(\beta ^{0}\), define \(\mathfrak I _{t} = \sigma (x_{\tau +1},y_{\tau } :\) \(\tau \le t)\), and let \(\Gamma \) be any compact subset of \( \mathbb R ^{p}\) with positive Lebesgue measure. Six sets of assumptions are employed. First, the test weight is revealing.

W1 (weight). \(a. F : \mathbb R \rightarrow \mathbb R \) is Borel measurable, analytic, and nonpolynomial on some open interval \(R_{0} \subseteq \mathbb R \) containing \(0\) . \(b. \text{sup}_{u\in U}|F(u)| \le K\) and \(\text{inf}_{u\in U}|F(u)| > 0\) on any compact subset \(U \subset S_{F}\), with \(S_{F}\) the support of \(F\) .

Remark

The W1.b upper bound allows us to exclude \(F(\gamma ^{\prime }\psi _{t})\) from the trimming indicators which greatly simplifies proving test consistency under trimming, and is mild since it applies to repeatedly cited weights (exponential, logistic, sine, cosine). The lower bound in W1.b helps to establish a required stochastic equicontinuity condition for weak convergence when \(\epsilon _{t}\) may be heavy tailed, and is easily guaranteed by centering \(F(\gamma ^{\prime }\psi _{t})\) if necessary.

Second, the plug-in \(\hat{\beta }_{n}\) is consistent. Let \(\tilde{m}_{n,t}\) be \(\mathfrak I _{t}\)-measurable mappings from \(\mathcal B \subset \mathcal R^{q}\) to \(\mathcal R^{r}\), \(r \ge q\), and \(\{\tilde{V}_{n}\}\) a sequence of non-random matrices \(\tilde{V}_{n} \in \mathbb R ^{q\times q}\) where \(\tilde{V}_{i,i,n}\;\rightarrow \; \infty \). Stack equations \(\mathcal M _{n,t}^{*}(\beta ,\gamma ) := [m_{n,t}^{*}\left( \beta ,\gamma \right) ,\tilde{m}_{n,t}^{\prime }(\beta )]^{\prime } \in \mathcal R^{r+1}\), and define the covariances \(\tilde{S}_{n}\left( \beta \right) := \text{sum} _{s,t=1}^{n}E[\{\tilde{m}_{n,s}(\beta ) - E[\tilde{m} _{n,s}(\beta )]\} \times \{\tilde{m}_{n,t}(\beta ) - E[\tilde{m} _{n,t}(\beta )]\}^{\prime }]\) and \(\mathfrak S _{n}^{*}(\beta ,\gamma )\) \(:= \text{sum} _{s,t=1}^{n}E[\{\mathcal M _{n,s}^{*}(\beta ,\gamma ) - E[ \mathcal M _{n,s}^{*}(\beta ,\gamma )]\} \times \{\mathcal M _{n,t}^{*}(\beta ,\gamma ) - E[\mathcal M _{n,t}^{*}(\beta ,\gamma )]\}^{\prime }]\), hence \([\mathfrak S _{i,j,n}^{*}(\beta ,\gamma )]_{i=2,j=2}^{r+1,r+1} = \tilde{S}_{n}\left( \beta \right) \). We abuse notation since \(\mathfrak S _{n}^{*}(\beta ,\gamma )\) may not exist for some or any \(\beta \). Let f.d.d. denote finite dimensional distributions.

  • P1 (fast (non)linear plug-ins). \(\tilde{V} _{n}^{1/2}(\hat{\beta }_{n} - \beta ^{0})= O_{p}(1)\) and \(\text{sup}_{\gamma \in \Gamma }||V_{n}(\gamma )\tilde{V}_{n}^{-1}|| \rightarrow 0\).

  • P2 (slow linear plug-ins). \(\mathfrak S _{n}^{*}(\gamma )\) exists for each \(n\) , specifically \( \text{sup}_{\gamma \in \Gamma }||\mathfrak S _{n}^{*}(\gamma )||<\infty \) and \(\lim \text{inf}_{n\rightarrow \infty }\text{inf}_{\gamma \in \Gamma }\lambda _{\min }(\mathfrak S _{n}^{*}(\gamma ))> 0\). Further:

    1. a.

      \(\tilde{V}_{n}^{1/2}(\hat{\beta }_{n} - \beta ^{0})\) \(= O_{p}(1)\) and \(\tilde{V}_{n} \sim \mathcal K (\gamma )V_{n}(\gamma )\) , where \(\mathcal K : \Gamma \rightarrow \mathbb R ^{q\times q}\) and \(\text{inf}_{\gamma \in \Gamma }\lambda _{\min }( \mathcal K (\gamma )) > 0\) .

    2. b.

      \(\tilde{V}_{n}^{1/2}(\hat{\beta }_{n} - \beta ^{0}) = \tilde{A}_{n}\text{sum} _{t=1}^{n}\{\tilde{m}_{n,t}\)\(- E[\tilde{m}_{n,t}]\} \times \)\((1 + o_{p}\left( 1\right) ) +\)\(o_{p}\left( 1\right) \) where nonstochastic \(\tilde{A}_{n} \in \mathbb R ^{q\times r}\) has full column rank and \(\tilde{A}_{n}\tilde{S} _{n}^{-1}\tilde{A}_{n}^{\prime } \rightarrow I_{q}\).

    3. c.

      The f.d.d. of \(\mathfrak S _{n}^{*}\left( \gamma \right) ^{-1/2}\{\mathcal M _{n,t}^{*}(\gamma ) - E[ \mathcal M _{n,t}^{*}(\gamma )]\}\) belong to the same domain of attraction as the f.d.d. of \(S_{n}^{-1}(\gamma )\{m_{n,t}^{*}(\gamma )\)\(- E[m_{n,t}^{*}(\gamma )]{\ }\).

  • P3 (orthogonal equations and (non)linear plug-ins). \(\tilde{V}_{n}^{1/2}(\hat{\beta }_{n} - \beta ^{0}) =\) \(O_{p}(1)\) and \(\lim \text{sup}_{n\rightarrow \infty }\text{sup}_{\gamma \in \Gamma }||V_{n}^{\perp }(\gamma )\tilde{V} _{n}^{-1}|| < \infty \).

Remark

\(\hat{\beta }_{n}\) effects the limit distribution of \( {\hat{\mathcal T }}_{n}(\gamma )\) under P2 hence we assume \(\hat{\beta }_{n}\) is linear. P3 is invoked for orthogonalized equations \(\hat{m}_{n,t}^{\bot }(\beta ,\gamma )\).

Third, identification under trimming.

I1 (identification by \(m_{n,t}^{*}(\gamma )\)). Under the null \(\text{sup}_{\gamma \in \Gamma }|nS_{n}^{-1}(\gamma )E[m_{n,t}^{*}\left( \gamma \right) ]|\) \(\rightarrow 0\).

Remark

If \(m_{t}(\gamma )\) is asymmetric there is no guarantee \(E[m_{n,t}^{*}\left( \gamma \right) ] = 0\), although \( E[m_{n,t}^{*}\left( \gamma \right) ] \rightarrow 0\) under \(H_{0}\) by trimming negligibility and dominated convergence. The fractiles \( \{k_{j,\epsilon ,n},k_{j,i,n}\}\) must therefore promote I1 for asymptotic normality in view of expansion (5) and mean centering. Since \( \text{sup}_{\gamma \in \Gamma }\{S_{n}(\gamma )/n\} = o(1)\) by Lemma B.1, below, I1 implies identification of \(H_{0}\) sufficiently fast. The property is superfluous if \(E[\epsilon _{t}] = 0\) under either hypothesis, \(\epsilon _{t}\) is independent of \(x_{t}\) under \(H_{0}\), and re-centering is used since then \(E[m_{n,t}^{*}\left( \gamma \right) ] = 0\) under \(H_{0}\) (see Sect. 3).

Fourth, the DGP and properties of regression model components.

  • R1 (response). \(f(\cdot ,\beta )\) is for each \( \beta \in \mathcal B \) a Borel measurable function, continuous, and differentiable on \(\mathcal B \) with Borel measurable gradient \(g_{t}(\beta ) = g(x_{t},\beta ) := (\partial /\partial \beta )f(x_{t},\beta )\).

  • R2 (moments). \(E|y_{t}| < \infty \), and \( E(\text{sup}_{\beta \in \mathcal B }|f(x_{t},\beta )|^{\iota }) < \infty \) and \(E(\text{sup}_{\beta \in \mathcal B }|(\partial /\partial \beta _{i})f(x_{t},\beta )|^{\iota }) < \infty \) for each \(i\) and some tiny \(\iota > 0\).

  • R3 (distribution).

    1. a.

      The finite dimensional distributions of \(\{y_{t},x_{t}\}\)are strictly stationary, non-degenerate, and absolutely continuous. The density function of \(\epsilon _{t}(\beta )\) is uniformly bounded \(\text{sup}_{\beta \in \mathcal B }\text{sup}_{a\in \mathbb R }\{(\partial /\partial a)P(\epsilon _{t}(\beta ) \le a)\} < \infty \).

    2. b.

      Define \(\kappa _{\epsilon }(\beta ) := \mathrm argsup _{\alpha \ > \ 0}\{E|\epsilon _{t}(\beta )|^{\alpha } < \infty \} \in (0,\infty ]\), write \(\kappa _{\epsilon } = \kappa _{\epsilon }(\beta ^{0})\), and let \(\mathcal B _{2,\epsilon }\) denote the set of \( \beta \) such that the error variance is infinite \(\kappa _{\epsilon }(\beta ) \le 2\). If \(\kappa _{\epsilon }(\beta ) \le 2\) then\(P(|\epsilon _{t}(\beta )|\)\(> c) = d(\beta )\epsilon ^{-\kappa _{\epsilon }(\beta )}(1 + o(1))\) where \(\text{inf}_{\beta \in \mathcal B _{2,\epsilon }}d(\beta ) > 0\) and \( \text{inf}_{\beta \in \mathcal B _{2,\epsilon }}\kappa _{\epsilon }(\beta )\)\(> 0\), and \(o(1)\) is not a function of \(\beta \), hence\(\lim _{c\rightarrow \infty }\text{sup}_{\beta \in \mathcal B _{2,\epsilon }}|d(\beta )^{-1}\epsilon ^{\kappa _{\epsilon }(\beta )}P(|\epsilon _{t}(\beta )| > c) - 1| = 0\).

  • R4 (mixing). \(\{y_{t},x_{t}\}\) are geometrically \( \beta \) -mixing: \(\text{sup}_{\mathcal A \subset \mathfrak I _{t+l}^{+\infty }}E|P( \mathcal A |\mathfrak I _{-\infty }^{t}) - P(\mathcal A )|\) \(= o(\rho ^{l})\) for \(\rho \in (0,1)\).

Remark 1

Response function smoothness R1 coupled with distribution continuity and boundedness R3.a imply \(\text{sum} _{t=1}^{n}\hat{m} _{n,t}^{*}(\hat{\beta }_{n},\gamma )\) can be asymptotically expanded around \(\beta ^{0}\), cf. Hill (2011b, Appendices B and C). Power-law tail decay R3.b is mild since it includes weakly dependent processes that satisfy a central limit theorem (Leadbetter et al. 1983), and simplifies characterizing tail-trimmed variances in heavy-tailed cases by Karamata’s Theorem.

Remark 2

The mixing property characterizes nonlinear AR with nonlinear random volatility errors (Pham and Tran 1985; An and Huang 1996; Meitz and Saikkonen 2008).

Fifth, we restrict the fractiles and impose nondegeneracy under trimming. Recall \(k_{j,n} = \max \{k_{j,\epsilon ,n}, k_{j,1,n}, ...,k_{j,q,n}\}\), the R3.b moment supremum \(\kappa _{\epsilon } > 0\), and \(\sigma _{n}^{2}(\beta ,\gamma ) = E[m_{n,t}^{*2}(\beta ,\gamma )]\).

  • F1 (fractiles).

    1. a.

      \(k_{j,\epsilon ,n}/\ln (n) \rightarrow \infty \);

    2. b.

      If \(\kappa _{\epsilon } \in (0,1)\) then \(k_{j,\epsilon ,n}/n^{2\left( 1-\kappa _{\epsilon }\right) /\left( 2-\kappa _{\epsilon }\right) } \rightarrow \infty \).

  • F2 (nondegenerate trimmed variance). \(\lim \text{inf}_{n\rightarrow \infty }\text{inf}_{\beta \in \mathcal B ,\gamma \in \Gamma }\{S_{n}^{2}(\beta ,\gamma )/n\} > 0\)and\( \text{sup}_{\beta \in \mathcal B ,\gamma \in \Gamma }\{n\sigma _{n}^{2}(\beta ,\gamma )/S_{n}^{2}(\beta ,\gamma )\} = O(1)\).

Remark 1

F1.a sets a mild lower bound on \(k_{\epsilon ,n}\) that is useful for bounding trimmed variances \(\sigma _{n}^{2}(\beta ,\gamma )\) and \(S_{n}^{2}(\beta ,\gamma )\). F1.b sets a harsh lower bound on \( k_{\epsilon ,n}\) if, under misspecification, \(\epsilon _{t}\) is not integrable: as \(\kappa _{\epsilon } \searrow 0\) we must trim more \( k_{\epsilon ,n} \nearrow n\) in order to prove a LLN for \(m_{n,t}^{*}(\gamma )\) which is used to prove \({\hat{\mathcal T }}_{n}(\gamma )\) is consistent. Any \(k_{\epsilon ,n} \sim n/L(n)\) for slowly varying \(L(n) \rightarrow \) \(\infty \) satisfies F1.

Remark 2

Distribution nondegeneracy under R3.a coupled with trimming negligibility ensure trimmed moments are not degenerate for sufficiently large \(n\), for example \(\lim \text{inf}_{n\rightarrow \infty }\text{inf}_{\beta \in \mathcal B ,\gamma \in \Gamma }\sigma _{n}^{2}(\beta ,\gamma ) > 0\). The long-run variance \(S_{n}^{2}(\beta ,\gamma )\), however, can in principle be degenerate due to negative dependence, hence F2 is imposed. F2 is standard in the literature on dependent CLT’s and exploited here for a CLT for \(m_{n,t}^{*}(\beta ,\gamma )\), cf. Dehling et al. (1986) .

Finally, the kernel \(\omega (\cdot )\) and bandwidth \(b_{n}\).

K1 (kernel and bandwidth). \(\omega (\cdot )\) is integrable, and a member of the class \(\omega : \mathbb R \rightarrow [-1,1] | \omega (0) = 1, \omega (x) = \omega (-x) \forall x \in \mathbb R , \int _{-\infty }^{\infty }|\omega (x)|dx < \infty , \int _{-\infty }^{\infty }|\vartheta (\xi )|d\xi < \infty , \omega (\cdot )\) is continuous at \(0\) and all but a finite number of points \(\},\) where \(\vartheta (\xi ) := (2\pi )^{-1}\int _{-\infty }^{\infty }\omega (x)e^{i\xi x}dx<\infty \). Further \( \sum \nolimits _{s,t=1}^{n}|\omega ((s - t)/b_{n})| = o(n^{2}), \text{max} _{1\le s\le n}|\sum \nolimits _{t=1}^{n}\omega ((s - t)/b_{n})| =\)\(o(n)\) and \(b_{n} = o(n)\).

Remark

Assumption K1 includes Bartlett, Parzen, Quadratic Spectral, Tukey-Hanning, and other kernels. See Jong and Davidson (2000) and their references.

Appendix B: Proofs of Main Results

We require several preliminary results proved in the supplemental appendix Hill (2011c, Sect. C.3). Throughout the terms \(o_{p}(1)\), \(O_{p}(1)\), \( o(1) \) and \(O(1)\), do not depend on \(\beta \), \(\gamma \), and \(t\). We only state results that concern \(\hat{m}_{n,t}^{*}(\beta ,\gamma )\), and \( m_{n,t}^{*}(\beta ,\gamma )\), since companion results extend to \(\hat{m} _{n,t}^{\perp }(\beta ,\gamma )\), and \(m_{n,t}^{\perp }(\beta ,\gamma )\). Let F1–F2, K1, R1–R4, and W1.b hold. Recall \(\sigma _{n}^{2}(\beta ,\gamma ) =\) \(E[m_{n,t}^{*2}(\beta ,\gamma )]\).

Lemma B.1

(variance bounds)

  1. a.

    \(\sigma _{n}^{2}(\beta ,\gamma ) = o\left( n\max \left\{ 1,(E[m_{n,t}^{*}(\beta ,\gamma )])^{2}\right\} \right) \) , \( \sup \limits _{\gamma \in \Gamma }\left\{ \dfrac{\sigma _{n}^{2}(\gamma )}{ \max \left\{ 1,(E[m_{n,t}^{*}(\gamma )])^{2}\right\} }\right\} = o(n/\ln (n));\)

  2. b.

    \(S_{n}^{2}(\gamma ) = \mathfrak L _{n}n\sigma _{n}^{2}(\gamma ) =\)\(o(n^{2})\) for some sequence \(\{\mathfrak L _{n}\}\) that satisfies \(\lim \text{inf}_{n\rightarrow \infty }\mathfrak L _{n} > 0\), \( \mathfrak L _{n} = K\)if \(\epsilon _{t}\) is finite dependent or\(E[\epsilon _{t}^{2}] <\infty \), and otherwise \( \mathfrak L _{n} \le K\ln (n/\text{min}_{j\in \{1,2\}}\{k_{j,\epsilon ,n}\}) \le K\ln (n)\).

Lemma B.2

(variance bounds)

  1. a.

    \(\text{sup}_{\gamma \in \Gamma }|S_{n}^{-1}(\gamma )\text{sum} _{t=1}^{n}\{\hat{m}_{n,t}^{*}(\gamma )-m_{n,t}^{*}(\gamma )\}| \ = \ o_{p}(1)\).

  2. b.

    Define \(\hat{\mu }_{n,t}^{*}(\beta ,\gamma ) := \hat{m} _{n,t}^{*}(\beta ,\gamma ) - \hat{m}_{n}^{*}(\beta ,\gamma )\) and \(\mu _{n,t}^{*}(\beta ,\gamma ) := m_{n,t}^{*}(\beta ,\gamma ) - m_{n}^{*}(\beta ,\gamma )\). If additionally P1 or P2 holds \(\text{sup}_{\gamma \in \Gamma }|S_{n}^{-2}(\gamma )\text{sum} _{s,t=1}^{n}\omega ((s - t)/b_{n})\{\hat{\mu }_{n,s}^{*}(\hat{ \beta }_{n},\gamma )\hat{\mu }_{n,t}^{*}(\hat{\beta }_{n},\gamma ) - \mu _{n,s}^{*}(\gamma )\mu _{n,t}^{*}(\gamma )\}| =\)\(o_{p}(1)\).

Lemma B.3

(variance bounds) Let \(\beta ,\tilde{\beta } \in \mathcal B \). For some sequence \(\{\beta _{n,*}\}\) in \(\mathcal B \) satisfying \(||\beta _{n,*} -\) \(\tilde{\beta }|| \le ||\beta - \tilde{\beta }||\), and for some tiny \(\iota > 0\) and arbitrarily large finite \(\delta > 0\) we have \(\text{sup}_{\gamma \in \Gamma }|\hat{m}_{n}^{*}(\beta ,\gamma ) - \hat{m} _{n}^{*}(\tilde{\beta },\gamma ) - \hat{J}_{n}^{*}(\beta _{n,*},\gamma )^{\prime }(\beta - \tilde{\beta })| = n^{-\delta } \times ||\beta - \tilde{\beta }||^{1/\iota } \times o_{p}(1)\).

Lemma B.4

(Jacobian) Under P1 or P2 \(\text{sup}_{\gamma \in \Gamma }||J_{n}^{*}(\hat{\beta }_{n},\gamma ) -\) \(J_{n}(\gamma )(1 + o_{p}(1))|| = o_{p}(1)\).

Lemma B.5

(HAC) Under P1 or P2 \(\text{sup}_{\gamma \in \Gamma }|\hat{S}_{n}^{2}(\hat{\beta }_{n},\gamma )/S_{n}^{2}(\gamma )\) \(- 1| \overset{p}{\rightarrow } 0\).

Lemma B.6

(ULLN) Let \(\text{inf}_{n\ge N}|E[m_{n,t}^{*}(\gamma )]| > 0\) for some \(N\in \mathbb N \) and all \(\gamma \in \Gamma /S\) where \(S\) has measure zero. Then \(\text{sup}_{\gamma \in \Gamma /S}\{1/n\text{sum} _{t=1}^{n}m_{n,t}^{*}(\gamma )/E[ m_{n,t}^{*}(\gamma )] \} \overset{p}{\rightarrow }\,1\).

Lemma B.7

(UCLT) \(\{S_{n}^{-1}(\gamma )\text{sum} _{t=1}^{n}(m_{n,t}^{*}(\gamma ) - E[m_{n,t}^{*}(\gamma )])\) \(: \gamma \in \Gamma \} \Longrightarrow \{z(\gamma ) : \gamma \in \Gamma \}\), a scalar \((0,1)\)-Gaussian process on \(\mathcal C [\Gamma ]\) with covariance function \(E[z(\gamma _{1})z(\gamma _{2})]\) and a.s. bounded sample paths. If P2 also holds then \(\{\mathfrak S _{n}^{-1/2}(\gamma )\text{sum} _{t=1}^{n}\{\mathcal M _{n,t}^{*}(\gamma ) -\) \(E[\mathcal M _{n,t}^{*}(\gamma )] : \gamma \) \(\in \Gamma \} \Longrightarrow \) \(\{\mathcal Z (\gamma ) : \gamma \in \Gamma \}\) an \(r + 1\) dimensional Gaussian process on \(\mathcal C [\Gamma ]\) with zero mean, covariance \(I_{r+1}\) , and covariance function \(E[\mathcal Z (\gamma _{1})\mathcal Z (\gamma _{2})^{\prime }]\).

Proof of Lemma

We only prove the claims for \( m_{n,t}^{*}(\beta ,\gamma )\). In view of the \(\sigma (x_{t})\) -measurability of \(\mathcal P _{n,t}(\gamma )\) and \(\text{sup}_{\gamma \in \Gamma }E|\mathcal P _{n,t}(\gamma )| < \infty \) the proof extends to \( m_{n,t}^{\perp }(\beta ,\gamma )\) with few modifications. Under \(H_{0}\) the claim follows from trimming negligibility and Lebesgue’s dominated convergence: \(E[m_{n,t}^{*}(\gamma )] \rightarrow \) \(E[m_{t}(\gamma )] = 0\).

Under the alternative there are two cases: \(E|\epsilon _{t}| <\) \(\infty \), or \(E|\epsilon _{t}| = \infty \) such that \(E[\epsilon _{t}|x_{t}]\) may not exist.

Case 1 (\(E|\epsilon _{t}|<\infty \)). Property W1, compactness of \(\Gamma \), and boundedness of \(\psi \) imply \(F(\gamma ^{\prime }\psi _{t})\) is uniformly bounded and revealing: \(E[\epsilon _{t}F(\gamma ^{\prime }\psi _{t})] \ne 0\) for all \(\gamma \in \Gamma /S\) where \(S\) has Lebesgue measure zero. Now invoke boundedness of \( F(\gamma ^{\prime }\psi _{t})\) with Lebesgue’s dominated convergence theorem and negligibility of trimming to deduce \(|E[\epsilon _{t}(1 - I_{n,t}(\beta ^{0}))F(\gamma ^{\prime }\psi _{t})]| \rightarrow 0\), hence \(E[\epsilon _{t}I_{n,t}(\beta ^{0})F(\gamma ^{\prime }\psi _{t})] =\)\(E\left[ \epsilon _{t}F(\gamma ^{\prime }\psi _{t})\right] + o(1) \ne 0\) for all \(\gamma \in \Gamma /S\) and all \(n \ge N\) for sufficiently large \(N\).

Case 2 (\(E|\epsilon _{t}| = \infty \)). Under \( H_{1}\) since \(I_{n,t}(\beta ) \rightarrow 1\,a.s.\) and \(E|\epsilon _{t}| = \infty \), by the definition of conditional expectations there exists sufficiently large \(N\) such that \(\text{min}_{n\ge N}|E[\epsilon _{t}I_{n,t}(\beta ^{0})|x_{t}]| > 0\) with positive probability \(\forall n \ge N\). The claim therefore follows by Theorem 1 of Bierens and Ploberger (1997) and Theorem 2.3 of Stinchcombe and White (1998): \(\lim \text{inf}_{n\rightarrow \infty }|E[\epsilon _{t}I_{n,t}(\beta ^{0})F(\gamma ^{\prime }\psi _{t})]| > 0\) for all \(\gamma \in \Gamma /S\). \( \mathcal{QED} \).

Proof of Theorem

Define \(M_{n,t}^{*}(\beta ,\gamma ) := m_{n,t}^{*}(\beta ,\gamma ) - E[m_{n,t}^{*}(\beta ,\gamma )]\) and \(\hat{M}_{n,t}^{*}(\beta ,\gamma ) := \hat{m}_{n,t}^{*}(\beta ,\gamma ) - E[\hat{m}_{n,t}^{*}(\beta ,\gamma )]\). We first state some required properties. Under plug–in properties P1 or P2 \(\hat{ \beta }_{n} - \beta ^{0} = o_{p}\left( 1\right) \). Identification I1 imposes under \(H_{0}\)

$$\begin{aligned} \ \sup _{\gamma \in \Gamma }\left|S_{n}^{-1}(\gamma )E[m_{n,t}^{*}\left( \gamma \right) ]\right|=o(1/n), \end{aligned}$$
(12)

which implies the following long-run variance relation uniformly on \(\Gamma \):

$$\begin{aligned} E\left( \sum _{t=1}^{n}M_{n,t}^{*}(\gamma )\right) ^{2}=S_{n}^{2}(\gamma )-n^{2}\left( E\left[ m_{n,t}^{*}(\beta ,\gamma )\right] \right) ^{2}=S_{n}^{2}(\gamma )\left( 1+o\left( 1\right) \right) . \end{aligned}$$
(13)

Uniform expansion Lemma B.3, coupled with Jacobian consistency Lemma B.4 and \(\hat{\beta }_{n} \overset{p}{\rightarrow } \beta ^{0}\) imply for any arbitrarily large finite \(\delta > 0\),

$$\begin{aligned} \sup _{\gamma \in \Gamma }\left|\frac{1}{n}\sum _{t=1}^{n}\left\{ \hat{m} _{n,t}^{*}(\hat{\beta }_{n},\gamma )-\hat{m}_{n,t}^{*}(\gamma )\right\} -J_{n}\left( \gamma \right) ^{\prime }\left( \hat{\beta }_{n}-\beta ^{0}\right) \left( 1+o_{p}\left( 1\right) \right) \right|=o_{p}\left( n^{-\delta }\right) . \end{aligned}$$
(14)

Finally, by uniform approximation Lemma B.2.a

$$\begin{aligned} \sup _{\gamma \in \Gamma }\left|\frac{1}{S_{n}(\gamma )} \sum _{t=1}^{n}\left\{ \hat{m}_{n,t}^{*}(\gamma )-m_{n,t}^{*}\left( \gamma \right) \right\} \right|=o_{p}\left( 1\right) , \end{aligned}$$
(15)

and by Lemma B.5 we have uniform HAC consistency:

$$\begin{aligned} \sup _{\gamma \in \Gamma }\left|\hat{S}_{n}^{2}(\hat{\beta }_{n},\gamma )/S_{n}^{2}(\gamma )-1\right|=o_{p}(1). \end{aligned}$$
(16)

Claim i ( \(\hat{T}_{n}\left( \gamma \right) :\) Null \(H_{0}\)). Under fast plug-in case P1 we assume \( \text{sup}_{\gamma \in \Gamma }||V_{n}(\gamma )\tilde{V}_{n}^{-1}|| \rightarrow \) \(0\), hence

$$\begin{aligned} \sup _{\gamma \in \Gamma }\left|nS_{n}^{-1}\left( \gamma \right) J_{n}\left( \gamma \right) ^{\prime }\left( \hat{\beta }_{n}-\beta ^{0}\right) \right|=o_{p}(1). \end{aligned}$$
(17)

Since \(\delta > 0\) in (B.3) may be arbitrarily large, \(\lim \text{inf}_{n\rightarrow \infty }\text{inf}_{\gamma \in \Gamma }S_{n}(\gamma )>0\) by nondegeneracy F2, and Eqs. (B.1)–(B.6) are uniform properties, it follows uniformly on \(\Gamma \)

$$\begin{aligned}&{\hat{\mathcal T }}_{n}\left( \gamma \right) \overset{p}{\sim }\left( \frac{1 }{S_{n}(\gamma )}\sum _{t=1}^{n}M_{n,t}^{*}(\gamma )+\frac{nJ_{n}\left( \gamma \right) ^{\prime }}{S_{n}(\gamma )}\left( \hat{\beta }_{n}-\beta ^{0}\right) +o_{p}\left( \frac{n}{S_{n}(\gamma )}n^{-\delta }\right) \right) ^{2}\quad \nonumber \\&=\left( \frac{1}{S_{n}(\gamma )} \sum _{t=1}^{n}M_{n,t}^{*}(\gamma )+o_{p}\left( 1\right) \right) ^{2}= \mathcal M _{n}^{2}\left( \gamma \right) , \end{aligned}$$
(18)

say. Now apply variance relation (B.2), UCLT Lemma B.7 and the mapping theorem to conclude \(E[\mathcal M _{n}^{2}\left( \gamma \right) ] \rightarrow 1\) and \(\{{\hat{\mathcal T }}_{n}\left( \gamma \right) : \gamma \in \Gamma \} \Longrightarrow \{z^{2}(\gamma ) : \gamma \in \Gamma \}\), where \(z(\gamma )\) is \((0,1)\)-Gaussian process on \(\mathcal C [\Gamma ]\) with covariance function \(E[z(\gamma _{1})z(\gamma _{2})]\).

Under slow plug-in case P2 a similar argument applies in lieu of plug-in linearity and UCLT Lemma B.7. Since the steps follow conventional arguments we relegate the proof to Hill (Hill 2011c, Sect. C.2).

Claim ii (\(\hat{T}_{n}\left( \gamma \right) :\)Alternative \(H_{1}\)). Lemma 2.1 ensures \(\text{inf}_{n\ge N}\left|E[m_{n,t}^{*}(\gamma )]\right|> 0\) for some \(N \in \mathbb N \) and all \(\gamma \in \Gamma /S\) where \(S \subset \Gamma \) has Lebesgue measure zero. Choose any \(\gamma \in \Gamma /S\), assume \(n \ge N\) and write

$$\begin{aligned} {\hat{\mathcal T }}_{n}\left( \gamma \right)&=\left( \frac{1}{\hat{S}_{n}(\hat{ \beta }_{n},\gamma )}\sum _{t=1}^{n}\hat{m}_{n,t}^{*}(\hat{\beta } _{n},\gamma )\right)^{2}\\&=\frac{n^{2}\left( E\left[ m_{n,t}^{*}(\gamma ) \right] \right) ^{2}}{\hat{S}_{n}^{2}(\hat{\beta }_{n},\gamma )}\left( \frac{ \left|1/n\sum _{t=1}^{n}\hat{m}_{n,t}^{*}(\hat{\beta }_{n},\gamma )\right|}{\left|E\left[ m_{n,t}^{*}(\gamma )\right] \right|}\right) ^{2}. \end{aligned}$$

In lieu of (B.5) and the Lemma B.1.a,b variance property \( n|E[m_{n,t}^{*}(\gamma )]|/ S_{n}(\gamma ) \rightarrow \infty \), the proof is complete if we show \(\mathcal M _{n}(\hat{\beta }_{n},\gamma ) :=\)\(|1/n\text{sum} _{t=1}^{n}\hat{m}_{n,t}^{*}(\hat{\beta }_{n}, \gamma )|/|E[m_{n,t}^{*}(\gamma )]| \overset{p}{\rightarrow } 1\).

By (B.3), (B.4) and the triangle inequality \(\mathcal M _{n}(\hat{ \beta }_{n},\gamma )\) is bounded by

$$\begin{aligned}&\frac{1}{\left|E\left[ m_{n,t}^{*}(\gamma )\right] \right|} \left|\frac{1}{n}\sum _{t=1}^{n}m_{n,t}^{*}(\gamma )\right|+ \frac{1}{\left|E\left[ m_{n,t}^{*}(\gamma )\right] \right|}\\&\left|J_{n}\left( \gamma \right) ^{\prime }\left( \hat{\beta }_{n}-\beta ^{0}\right) \left( 1+o_{p}\left( 1\right) \right) \right|+o_{p}\left( \frac{S_{n}\left( \gamma \right) }{n\left|E\left[ m_{n,t}^{*}(\gamma )\right] \right|}\right) , \end{aligned}$$

where \(\text{sup}_{\gamma \in \Gamma /S}\{1/n\text{sum} _{t=1}^{n}m_{n,t}^{*}(\gamma )/E[m_{n,t}^{*}(\gamma )]\} \overset{p}{\rightarrow } 1\) by Lemma B.6. Further, combine fast or slow plug-in P1 or P2, the construction of \( V_{n}\left( \gamma \right) \) and variance relation Lemma B.1.a,b to obtain

$$\begin{aligned}&\frac{\left|J_{n}\left( \gamma \right) ^{\prime }\left( \hat{\beta } _{n}-\beta ^{0}\right) \left( 1+o_{p}\left( 1\right) \right) \right|}{ \left|E\left[ m_{n,t}^{*}(\gamma )\right] \right|}\le \frac{ S_{n}\left( \gamma \right) }{n\left|E\left[ m_{n,t}^{*}(\gamma ) \right] \right|}nJ_{n}\left( \gamma \right)^{\prime }S_{n}^{-1}\left( \gamma \right)\\&V_{n}^{-1/2}\left( \gamma \right)\sim K\frac{S_{n}\left( \gamma \right) }{n\left|E\left[ m_{n,t}^{*}(\gamma )\right] \right|}=o\left( 1\right) . \end{aligned}$$

Therefore \(\mathcal M _{n}(\hat{\beta }_{n},\gamma ) \overset{p}{ \rightarrow } 1\).

Claim iii (\(\hat{T}_{n}^{\perp }\left( \gamma \right) \) ). The argument simply mimics claims (\(i\)) and (\(ii\)) since under plug-in case P3 it follows \(\hat{S}_{n}^{\perp }(\hat{\beta } _{n},\gamma )^{-1}\text{sum} _{t=1}^{n}\hat{m}_{n,t}^{\perp }(\hat{\beta } _{n},\gamma ) \overset{p}{\sim } S_{n}^{\perp }(\gamma )^{-1}\text{sum} _{t=1}^{n}m_{n,t}^{\perp }(\gamma )\) by construction of the orthogonal equations (Wooldridge 1990), and straightforward generalizations of the supporting lemmas. \(\mathcal{QED} \).

The remaining proofs exploit the fact that for each \(z_{t} \in \{\epsilon _{t},g_{i,t}\}\) the product \(z_{t}F\left( \gamma ^{\prime }\psi _{t}\right) \) has the same tail decay rate as \(z_{t}\): by weight boundedness W1.b \(P(|z_{t}\text{sup}_{u\in \mathbb R }F(u)| > c) \ge P(|z_{t}F_{t}\left( \gamma \right) | > c) \ge P(|z_{t}\text{inf}_{u\in \mathbb R }F(u)| > c)\). Further, use \(I_{n,t} = I_{\epsilon ,n,t}I_{g,n,t}\), dominated convergence and each \(I_{z,n,t} \overset{a.s.}{\rightarrow } 1\) to deduce \(E[|z_{t}F(\gamma ^{\prime }\psi _{t})|^{r}I_{n,t}] = E[|z_{t}F(\gamma ^{\prime }\psi _{t})|^{r}I_{z,n,t}] \times (1 + o(1))\) for any \(r>0\). Hence higher moments of \(z_{t}F(\gamma ^{\prime }\psi _{t})I_{n,t}\) and \(z_{t}I_{z,n,t}\) are equivalent up to a constant scale.

Proof of Theorem

The claim under \(H_{1}\) follows from Theorem 2.2. We prove \(\tau _{n}(\alpha ) \overset{d}{ \rightarrow } (1 - \underline{\lambda })^{-1}\int _{ \underline{\lambda }}^{1}I(u(\lambda )<\alpha )d\lambda \) under \(H_{0}\) for plug-in case P1 since the remaining cases follow similarly. Drop \(\gamma \) and write \(\hat{m}_{n,t}^{*}(\hat{\beta }_{n},\lambda )\) and \(\hat{S} _{n}^{2}(\hat{\beta }_{n},\lambda )\) to express dependence on \(\lambda \in \Lambda := [\underline{\lambda },1]\). Define \(\hat{Z} _{n}(\lambda ) := \hat{S}_{n}^{-1}(\hat{\beta }_{n},\lambda )\text{sum} _{t=1}^{n}\hat{m}_{n,t}^{*} (\hat{\beta }_{n},\lambda )\). We exploit weak convergence on a Polish spaceFootnote 9: we write \(\{\hat{Z} _{n}(\lambda ) : \lambda \in \Lambda \} \Longrightarrow ^{*}\) \(\{z(\lambda ) : \lambda \in \Lambda \}\) on \(l_{\infty }(\Lambda )\), where \(\{z(\lambda ) : \lambda \in \Lambda \}\) is a Gaussian process with a version that has uniformly bounded and uniformly continuous sample paths with respect to \(||\cdot ||_{2}\), if \(\hat{Z}_{n}(\lambda )\) converges in f.d.d. and tightness applies: \(\text{lim} _{\delta \rightarrow 0}\lim\text{sup} _{n\rightarrow \infty }P(\text{sup}_{||\lambda -\tilde{\lambda }||\le \delta }|\hat{Z}_{n}(\lambda ) - \hat{Z}_{n}(\tilde{\lambda })| > \varepsilon ) = 0 \forall \varepsilon > 0\).

We need only prove \(\{\hat{Z}_{n}(\lambda ) : \lambda \in \) \(\Lambda \} \Longrightarrow ^{*} \{z(\lambda ) : \lambda \in \Lambda \}\) since the claim follows from multiple applications of the mapping theorem. Convergence in f.d.d. follows from \(\text{sup}_{\lambda \in \Lambda }|\hat{S}_{n}^{-1}(\hat{\beta }_{n},\lambda )\text{sum} _{t=1}^{n}\hat{m} _{n,t}^{*}(\hat{\beta }_{n},\lambda ) - S_{n}^{-1}(\lambda )\text{sum} _{t=1}^{n}m_{n,t}^{*}(\lambda )| \overset{p}{\rightarrow } 0\) by (B.3)–(B.5) under plug-in case P1, and the proof of UCLT Lemma B.7.

Consider tightness and notice by (B.3)–(B.6) and plug-in case P1

$$\begin{aligned} \sup _{\lambda \in \Lambda }\left|\hat{Z}_{n}(\lambda )-\mathcal Z _{n}\left( \lambda \right) \right|\overset{p}{\rightarrow }0 \text{ where} \mathcal Z _{n}\left( \lambda \right) :=\sum _{t=1}^{n}\frac{1}{ S_{n}(\lambda )}m_{t}I_{n,t}\left( \lambda \right) =\sum _{t=1}^{n}\mathcal Z _{n,t}\left( \lambda \right) , \end{aligned}$$

hence we need only to consider \(\mathcal Z _{n}\left( \lambda \right) \) for tightness. By Lemma B.1.b and \(\inf \{\Lambda \} > 0\) it is easy to verify \(\text{inf}_{\lambda \in \Lambda }S_{n}^{2}(\lambda ) = n\sigma _{n}^{2} \) for some sequence \(\{\sigma _{n}^{2}\}\) that satisfies \(\lim \text{inf}_{n\rightarrow \infty }\sigma _{n}^{2} > 0\). Therefore

$$\begin{aligned}&\left|\sum _{t=1}^{n}\left\{ \mathcal Z _{n,t}\left( \lambda \right) - \mathcal Z _{n,t}(\tilde{\lambda })\right\} \right|\le \left|\frac{ 1}{n^{1/2}\sigma _{n}}\sum _{t=1}^{n}m_{t}\left\{ I_{n,t}\left( \lambda \right) -I_{n,t}(\tilde{\lambda })\right\} \right|\\&+\left|\frac{S_{n}(\lambda )}{S_{n}(\tilde{\lambda })}-1\right|\times \left|\frac{1}{S_{n}(\lambda )}\sum _{t=1}^{n}m_{t}I_{n,t}( \lambda )\right|=\mathcal A _{1,n}(\lambda ,\tilde{\lambda })\\&+\mathcal A _{2,n}(\lambda ,\tilde{\lambda }). \end{aligned}$$

By subadditivity it suffices to prove each \(\text{lim}_{\delta \rightarrow 0}\lim\text{sup}_{n\rightarrow \infty }P(\text{sup}_{||\lambda -\tilde{\lambda }||\le \delta }\mathcal A _{i,n}(\lambda ,\tilde{\lambda }) > \varepsilon ) =\) \(0 \forall \varepsilon > 0\).

Consider \(\mathcal A _{1,n}(\lambda ,\tilde{\lambda })\) and note \( I_{n,t}(\lambda )\) can be approximated by a sequence of continuous, differentiable functions. Let \(\{\mathcal N _{n}\}\) be a sequence of positive numbers to be chosen below, and define a smoothed version of \(I_{n,t}(\lambda )\),

$$\begin{aligned} \mathfrak I _{\mathcal N _{n},n,t}(\lambda )&:=\int \limits _{0}^{1}I_{n,t}(\varpi ) \mathrm S \left( \mathcal N _{n}\left( \varpi -\lambda \right) \right) d\varpi \\&=\int \limits _{\lambda -1/\mathcal N _{n}}^{\lambda +1/\mathcal N _{n}}I_{n,t}(\varpi )\left\{ \frac{e^{-1/(1-\mathcal N _{n}^{2}(\varpi -\lambda )^{2})}}{\int _{-1}^{1}e^{-1/(1-w^{2})}dw}\times \frac{\mathcal N _{n}}{e^{\varpi ^{2}/\mathcal N _{n}^{2}}}\right\} d\varpi , \end{aligned}$$

where \(\mathrm S (u)\) is a so-called “smudge” function used to blot out \( I_{n,t}(\varpi )\) when \(\varpi \) is outside the interval \((\lambda - 1/ \mathcal N _{n},\lambda + 1/\mathcal N _{n})\). The term \(\{\cdot \}\) after the second equality defines \(\mathrm S (u)\) on \([-1,1]\). The random variable \(\mathfrak I _{\mathcal N _{n},n,t}(\lambda )\) is \(\mathfrak I _{t}\) -measurable, uniformly bounded, continuous, and differentiable for each \( \mathcal N _{n}\), and since \(k_{n}(\lambda ) \ge k_{n}(\tilde{\lambda } ) \) for \(\lambda \ge \tilde{\lambda }\) then \(\mathfrak I _{\mathcal N _{n},n,t}(\lambda ) \le \mathfrak I _{\mathcal N _{n},n,t}(\tilde{ \lambda }) a.s.\) Cf. Phillips (1995).

Observe \(\mathcal A _{1,n}(\lambda ,\tilde{\lambda }) = \mathcal B _{1, \mathcal N _{n},n}(\lambda ,\tilde{\lambda }) + \mathcal B _{2,\mathcal N _{n},n}(\lambda ) + \mathcal B _{2,\mathcal N _{n},n}(\tilde{\lambda })\) where

$$\begin{aligned} \mathcal B _{1,\mathcal N _{n},n}(\lambda ,\tilde{\lambda })&=\sum _{t=1}^{n} \frac{m_{t}\left\{ \mathfrak I _{\mathcal N _{n},n,t}\left( \lambda \right) - \mathfrak I _{\mathcal N _{n},n,t}(\tilde{\lambda })\right\} }{n^{1/2}\sigma _{n}}\text{,} \\ \mathcal B _{2,\mathcal N _{n},n}(\lambda )&=\sum _{t=1}^{n} \frac{m_{t}\left\{ I_{n,t}\left( \lambda \right) -\mathfrak I _{\mathcal N _{n},n,t}(\lambda )\right\} }{n^{1/2}\sigma _{n}}. \end{aligned}$$

Consider \(\mathcal B _{1,\mathcal N _{n},n}(\lambda ,\tilde{\lambda })\), define \(\mathcal D _{\mathcal N _{n},n,t}(\lambda ) := (\partial /\partial \lambda )\mathfrak I _{\mathcal N _{n},n,t}(\lambda )\), and let \( \{b_{n}(\lambda ,\iota )\}\) for infinitessimal \(\iota > 0\) be any sequence of positive numbers that satisfies \(P(|m_{t}| > b_{n}(\lambda ,\iota )) \rightarrow \lambda - \iota \in (0,1)\), hence \( \text{lim} _{n\rightarrow \infty }\text{sup}_{\lambda \in \Lambda }b_{n}(\lambda ,\iota )\) \(< \infty \). By the mean-value-theorem \(\mathfrak I _{\mathcal N _{n},n,t}(\lambda )-\mathfrak I _{\mathcal N _{n},n,t}(\tilde{\lambda }) =\) \(\mathcal D _{\mathcal N _{n},n,t}(\lambda _{*})(\lambda -\tilde{\lambda } )\) for some \(\lambda _{*} \in \Lambda \), \(|\lambda - \lambda _{*}| \le |\lambda - \tilde{\lambda }|\). But since \( \text{sup}_{\lambda \in \Lambda }|I_{n,t}(\lambda ) - 1| \overset{a.s.}{ \rightarrow } 0\) it must be the case that \(\text{sup}_{\lambda \in \Lambda }| \mathcal D _{\mathcal N _{nn},n,t}(\lambda )| \rightarrow 0 a.s.\) as \( n \rightarrow \infty \) for any \(\mathcal N _{n} \rightarrow \infty \). Therefore, for \(N\) sufficiently large, all \(n \ge N\), any \(p > 0\) and some \(\{b_{n}(\lambda ,\iota )\}\) we have \(\text{sup}_{\lambda \in \Lambda }E|m_{t}\mathcal D _{\mathcal N _{n},n,t}(\lambda )|^{p} \le K\text{sup}_{\lambda \in \Lambda }E|m_{t}I(|m_{t}| \le b_{n}(\lambda ,\iota ) )|^{p}\le K\text{sup}_{\lambda \in \Lambda }b_{n}^{p}(\lambda ,\iota )\) which is bounded on \( \mathbb N \). This implies \(m_{t}\mathcal D _{\mathcal N _{n},n,t}(\lambda )\) is \(L_{p}\) -bounded for any \(p > 2\) uniformly on \(\Lambda \times \mathbb N \), and geometrically \(\beta \)-mixing under R4. In view of \(\lim \text{inf}_{n\rightarrow \infty }\sigma _{n}^{2} > 0\) we may therefore apply Lemma 3 in Doukhan et al. (1995) to obtain \(\text{sup}_{\lambda \in \Lambda }|n^{-1/2}\sigma _{n}^{-1}\text{sum} _{t=1}^{n}m_{t}\mathcal D _{\mathcal N _{n},n,t}(\lambda )| = O_{p}(1)\). This suffices to deduce \(\text{lim} _{\delta \rightarrow 0}\lim\text{sup} _{n\rightarrow \infty }P(\text{sup}_{||\lambda -\tilde{\lambda }||\le \delta }|\mathcal B _{1,\mathcal N _{n},n}(\lambda ,\tilde{\lambda } )| > \varepsilon )\) is bounded by

$$\begin{aligned} \lim _{\delta \rightarrow 0}\limsup _{n\rightarrow \infty }P\left( K\sup _{\lambda \in \Lambda }\left|\frac{1}{n^{1/2}\sigma _{n}} \sum _{t=1}^{n}m_{t}\mathcal D _{\mathcal N _{n},n,t}(\lambda )\right|\times \delta >\varepsilon \right) =0. \end{aligned}$$

Further, since the rate \(\mathcal N _{n} \rightarrow \infty \) is arbitrary, we can always let \(\mathcal N _{n} \rightarrow \) \(\infty \) so fast that \(\lim\text{sup} _{n\rightarrow \infty }P(\text{sup}_{\lambda \in \Lambda }|\mathcal B _{2,\mathcal N _{n},n}(\lambda )| > \varepsilon ) = 0\), cf. Phillips (1995). By subadditivity this proves \(\text{lim} _{\delta \rightarrow 0}\lim\text{sup} _{n\rightarrow \infty }P(\text{sup}_{||\lambda - \tilde{\lambda }||\le \delta }\mathcal A _{1,n}(\lambda ,\tilde{\lambda }) > \varepsilon ) = 0 \forall \varepsilon > 0\).

Now consider \(\mathcal A _{2,n}(\lambda ,\tilde{\lambda })\). By UCLT Lemma B.7 \(\text{sup}_{\lambda \in \Lambda }|S_{n}^{-1}(\lambda )\text{sum} _{t=1}^{n}m_{t}I_{n,t} (\lambda )|=O_{p}(1)\) for any compact subset \(\Lambda \) of \((0,1]\). The proof is therefore complete if we show \( |S_{n}(\lambda )/S_{n}(\tilde{\lambda }) - 1| \le K|\lambda - \tilde{\lambda }|^{1/2}\). By Lemma B.1.b \(S_{n}^{2}(\lambda ) = \mathfrak L _{n}(\lambda )nE[m_{t}^{2}I_{n,t}(\lambda )]\). Compactness of \(\Lambda \subset (0,1]\) ensures \(\lim \text{inf}_{n\rightarrow \infty }\text{inf}_{\lambda \in \Lambda }\mathfrak L _{n}\) \((\lambda )>0\) and \(\text{sup}_{\lambda \in \Lambda } \mathfrak L _{n}(\lambda ) = O(\ln (n))\), and by distribution continuity \(E[m_{t}^{2}I_{n,t}(\lambda )]\) is differentiable, hence \(|S_{n}(\lambda )/S_{n}(\tilde{\lambda }) - 1| \le K(\text{sup}_{\lambda \in \Lambda }\{|G_{n}(\lambda )|\}/E[m_{t}^{2}I_{n,t}(\lambda )])^{1/2} \times |\lambda - \tilde{\lambda }|^{1/2} =: \mathcal E _{n}|\lambda - \tilde{\lambda }|^{1/2}\) where \(G_{n}(\lambda ) := \left( \partial /\partial \lambda \right) E[m_{t}^{2}I_{n,t}(\lambda )]\). Since \(k_{n} \sim \lambda n/\ln (n)\) it is easy to verify \(\lim \text{sup}_{n\rightarrow \infty }\text{sup}_{\lambda \in \Lambda }\mathcal E _{n}< \infty \): if \( E[m_{t}^{2}] < \infty \) then the bound is trivial, and if \(E[m_{t}^{2}]\) \(= \infty \) then use \(c_{\epsilon ,n} = K(n/k_{n})^{1/\kappa } = K(\ln (n))^{1/\kappa }\lambda ^{-1/\kappa }\) and Karamata’s Theorem (Resnick 1987, Theorem 0.6). \(\mathcal{QED} \).

Proof of Lemma 4.1

By Lemma B.7 in Hill (2011b) \( J_{n}(\gamma ) = -E[g_{t}F_{t}(\gamma )I_{n,t}] \times (1 + o(1))\) hence it suffices to bound \((E[g_{i,t}F_{t}\left( \gamma \right) I_{n,t}])^{2}/S_{n}^{2}(\gamma )\). The claim follows from Lemma B.1.b, and the following implication of Karamata’s theorem (e.g. Resnick 1987, Theorem 0.6): if any random variable \(w_{t}\) has tail \(P(|w_{t}| > w) = dw^{-\kappa }(1 + o(1))\), and \(w_{n,t}^{*} := w_{t}I(|w_{t}| \le c_{w,n})\), \(P(|w_{t}| > c_{w,n}) = k_{w,n}/n = o(1)\) and \(k_{w,n} \rightarrow \infty \), then \(E|w_{n,t}^{*}|^{p}\) is slowly varying if \(p = \kappa \), and \(E|w_{n,t}^{*}|^{p} \sim Kc_{w,n}^{p}(k_{w,n}/n) = K(n/k_{w,n})^{p/\kappa -1}\) if \(p > \kappa \). \(\mathcal{QED} \).

Proof of Lemma 4.2

First some preliminaries. Integrability of \(\epsilon _{t}\) is assured by \(\kappa > 1\), and \(y_{t}\) has tail (11) with the same tail index \(\kappa \) (Brockwell and Cline 1985). Stationarity ensures \(\epsilon _{t}(\beta ) = \text{sum} _{i=0}^{\infty }\psi _{i}(\beta )\epsilon _{t-i}\), where \(\text{sup}_{\beta \in \mathcal B }|\psi _{i}(\beta )| \le K\rho ^{i}\) for \(\rho \in (0,1)\), \(\psi _{0}(\beta ^{0}) = 1\) and \(\psi _{i}(\beta ^{0}) = 0 \forall i \ge 1\). Since \(\epsilon _{t}\) is iid with tail (11) it is easy to show \(\epsilon _{t}(\beta )\) satisfies uniform power law property R3.b by exploiting convolution tail properties developed in Embrechts and Goldie (1980). Use (4) and (11) to deduce \(c_{\epsilon ,n} = K\left( n/k_{n}\right) ^{1/\kappa }\).

F2 follows from the stationary AR data generating process and distribution continuity. I1 holds since \(E[m_{n,t}^{*}(\gamma )] = 0\) by independence, symmetry, and symmetric trimming. R1 and R2 hold by construction; (11) and the stated error properties ensure R3; see Pham and Tran (1985) for R4.

Now P1–P3. OLS and LAD are \(n^{1/\kappa }\)-convergent if \(\kappa \in (1,2]\) (Davis et al. 1992); LTTS and GMTTM are \(n^{1/\kappa }/L(n)\) -convergent if \(\kappa \in (1,2]\) (Hill and Renault 2010; Hill 2011b)Footnote 10; and LWAD is \(n^{1/2}\)-convergent in all cases (Ling 2005). It remains to characterize \(V_{n}(\gamma )\). Each claim follows by application of Lemma 4.1. If \(\kappa > 2\) then \(V_{n}(\gamma ) \sim Kn\), so OLS, LTTS and GMTTM satisfy P2 [LAD and LWAD are not linear: see Davis et al. (1992)]. If \(\kappa \in (1,2)\) then \(V_{n}(\gamma ) \sim Kn\left( k_{n}/n\right) ^{2/\kappa -1} = o(n)\), while each \(\hat{\beta }_{n}\) satisfies \(\tilde{V}_{i,i,n}^{1/2}/n^{1/2} \rightarrow \infty \), hence P1 applies for any intermediate order \(\{k_{n}\}\). The case \(\kappa = 2\) is similar.

Finally, Lemma 4.1 can be shown to apply to \(V_{n}^{\bot }(\gamma )\) by exploiting the fact that \(\epsilon _{t}g_{i,t} = \epsilon _{t}y_{t-i}\) have the same tail index as \(\epsilon _{t}\) (Embrechts and Goldie 1980). The above arguments therefore extend to \(m_{n,t}^{\perp }(\beta ,\gamma )\) under P3. \(\mathcal{QED} \).

Proof of Lemma 4.3

The ARCH process \(\{y_{t}\}\) is stationary geometrically \(\beta \)-mixing (Carrasco and Chen 2002). In lieu of re-centering after trimming and error independence, all conditions except P1–P3 hold by the arguments used to prove Lemma 4.2.

Consider P1–P3. Note \(\epsilon _{t} = u_{t}^{2} - 1\) is iid, it has tail index \(\kappa _{u}/2 \in (1,2]\) if \(E[u_{t}^{4}] = \infty \), and \((\partial /\partial \beta )\epsilon _{t}(\beta )|_{\beta ^{0}} = -u_{t}^{2}x_{t}/h_{t}^{2}\) is integrable. Further \(S_{n}^{2}(\gamma ) = nE[m_{n,t}^{*2}(\gamma )]\) by independence and re-centering. Thus \( V_{n}(\gamma ) \sim Kn\) if \(E[u_{t}^{4}] < \infty \), and otherwise apply Lemma 4.1 to deduce \(V_{n}(\gamma ) \sim Kn\left( k_{n}/n\right) ^{4/\kappa _{u}-1}\) if \(\kappa _{u} < 4\), and \(V_{n}(\gamma ) \sim n/L(n)\) if \(\kappa _{u} = 4\).

GMTTM with QML-type equations and QMTTL have a scale \(||\tilde{V}_{n}|| \sim n/L(n)\) if \(E[u_{t}^{4}] = \infty \), hence P1, otherwise \(|| \tilde{V}_{n}|| \sim Kn\) hence P2 (Hill and Renault 2010; Hill 2011b). Log-LAD is \(n^{1/2}\)-convergent if \(E[u_{t}^{2}] < \infty \), hence P1 if \(\kappa _{u} \le 4\), and if \(\kappa _{u} > 4\) then it does not satisfy P2 since it is not linear. QML is \(n^{1/2}\)-convergent if \( E[u_{t}^{4}] < \infty \) hence P2, and if \(E[u_{t}^{4}] = \infty \) then the rate is \(n^{1-2/\kappa _{u}}/L(n)\) when \(\kappa _{u} \in (2,4]\) (Hall and Yao 2003, Theorem 2.1). But if \(\kappa _{u} < 4\) then \( n(k_{n}/n)^{4/\kappa _{u}-1} = k_{n}^{4/\kappa _{u}-1}n^{2-4/\kappa _{u}} > n^{2-4/\kappa _{u}}/L(n)\) for any slowly varying \(L(n) \rightarrow \infty \) and intermediate order \(\{k_{n}\}\) hence QML does not satisfy P1 or P2. Synonymous arguments extend to \(m_{n,t}^{\perp }(\gamma )\) under P3 by exploiting Lemma 4.1. \(\mathcal{QED} \).

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Hill, J. (2013). Heavy-Tail and Plug-In Robust Consistent Conditional Moment Tests of Functional Form. In: Chen, X., Swanson, N. (eds) Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1653-1_10

Download citation

Publish with us

Policies and ethics