Abstract
We present asymptotic power-one tests of regression model functional form for heavy-tailed time series. Under the null hypothesis of correct specification the model errors must have a finite mean, and otherwise only need to have a fractional moment. If the errors have an infinite variance then in principle any consistent plug-in is allowed, depending on the model, including those with non-Gaussian limits and/or a sub-\(\sqrt{n}\)-convergence rate. One test statistic exploits an orthogonalized test equation that promotes plug-in robustness irrespective of tails. We derive chi-squared weak limits of the statistics, we characterize an empirical process method for smoothing over a trimming parameter, and we study the finite sample properties of the test statistics.
The author thanks an anonymous referee and Co-Editor Xiaohong Chen for constructive remarks.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
We use the term “revealing” in the sense of “generically totally revealing” in Stinchcombe and White (Stinchcombe and White 1998, p. 299). A member \(h\) of a function space \( \mathcal H \) reveals misspecification \(E[y|x] \ne f\) when \(E[(y - f)h] \ne 0\). A space \(\mathcal H \) is generically totally revealing if all but a negligible number of \(h \in \) \(\mathcal H \) have this property. In the index function case \(h(x)\) \(= F(\gamma ^{\prime }\psi (x))\), where the weight \(h\) aligns with \(\gamma \) and the class \(\mathcal H \) with \( \Gamma \), this is equivalent to saying all \(\gamma \in \Gamma /S\) where \(S\) has Lebesgue measure zero.
- 2.
Slow variation implies \(\lim _{n\rightarrow \infty }L(\lambda n)/L(n) = 1\) for any \(\lambda > 0\) (e.g. a constant, or \((\ln (n))^{a}\) for finite \( a > 0\): see Resnick 1987). In this chapter we always assume \(L(n) \rightarrow \infty \).
- 3.
Consider if \(\epsilon _{t}\) is iid and asymmetric under \(H_{0}\), but symmetrically and non-negligibly trimmed with Tuesday, May 22, 2012 at 12:37 pm\(k_{1,\epsilon ,n} = k_{2,\epsilon ,n} \sim \lambda n\) where \(\lambda \in \) \((0,1)\). Then \(\hat{T}_{n}(\gamma ) \overset{p}{\rightarrow }\) \(\infty \) under \(H_{0}\) is easily verified. The test statistic reveals misspecification due entirely to trimming itself.
- 4.
Under the alternative \(\beta ^{0}\) is the unique probability limit of \(\hat{ \beta }_{n}\), a “quasi-true” point that optimizes a discrepancy function, for example, a likelihood function, method of moments criterion or the Kullback–Leibler Information Criterion. See White (1982) amongst many others.
- 5.
The rate of convergence for some minimum discrepancy estimators may be below \(n^{1/2}\), even for thin tailed data, in contexts involving weak identification, kernel smoothing, and in-fill asymptotics. We implicitly ignored such cases here.
- 6.
Other over-identifying restrictions can easily be included, but the GMTTM rate may differ from what we cite in the proof of Lemma 4.2 if they are not lags of \(y_{t}\). See Hill and Renault (2010).
- 7.
If \(n = 800\) then \(k_{n} = [0.025\times 800/\ln (800)] = 2\) for each \(\{\epsilon _{t},y_{t-1},...,y_{t-5}\}\). Hence at most \(2\times 6 = 12\) observations are trimmed, which is \(1.5\,\%\) of \(800\).
- 8.
See Hong and White (1995, Theorem 3.2) for defense of a slowly varying series length \(\ln (n).\)
- 9.
- 10.
LTTS and GMTTM require trimming fractiles for estimation: GMTTM requires fractiles \(\tilde{k}_{i,n}\) for each estimating equation \(\tilde{m}_{i,n,t}\) , and LTTS requires fractiles \(\tilde{k}_{\epsilon ,n}\) and \(\tilde{k}_{y,n}\) for \(\epsilon _{t}\) and \(y_{t-i}\). The given rates of convergence apply if for GMTTM \(\tilde{k}_{i,n} \sim \lambda \ln (n)\) (Hill and Renault 2010), and for LTTS \(\tilde{k}_{\epsilon ,n} \sim \lambda n/\ln (n)\) and \(\tilde{k}_{y,n} \sim \lambda \ln (n)\) (Hill 2011b), where \(\lambda > 0\) is chosen by the analyst and may be different in different places.
References
An H.Z., Huang F.C. (1996) The Geometrical Ergodicity of Nonlinear Autoregressive Models, Stat. Sin. 6, 943–956.
Arcones M., Giné E. (1989) The Bootstrap of the Mean with Arbitrary Bootstrap Sample Size, Ann.I H. P. 25, 457–481.
Bai J. (2003) Testing Parametric Conditional Distributions of Dynamic Models, Rev. Econ. Stat. 85, 531–549.
Bierens H.J. (1982) Consistent Model Specification Tests, J. Econometric 20, 105–13.
Bierens H.J. (1990) A Consistent Conditional Moment Test of Functional Form, Econometrica 58, 1443–1458.
Bierens H.J., Ploberger W. (1997) Asymptotic Theory of Integrated Conditional Moment Tests, Econometrica 65, 1129–1151.
Brock W.A., Dechert W.D., Scheinkman J.A., LeBaron B. (1996) A Test for Independence Based on the Correlation Dimension, Econometric Rev. 15, 197–235.
Brockwell P.J., Cline D.B.H. (1985) Linear Prediction of ARMA Processes with Infinite Variance, Stoch. Proc. Appl. 19, 281–296.
Carrasco M., Chen X. (2002) Mixing and Moment Properties of Various GARCH and Stochastic Volatility Models, Econometric Theory 18, 17–39.
Chan K.S. (1990) Testing for Threshold Autoregression, Ann. Stat. 18, 1886–1894.
Chen X. and Fan Y. (1999) Consistent Hypothesis Testing in Semiparametric and Nonparametric Models for Econometric Time Series, J. Econometrics 91, 373–401.
Corradi V., Swanson N.R. (2002) A Consistent Test for Nonlinear Out-of-Sample Predictive Accuracy, J. Econometrics 110, 353–381.
Csörgo S., Horváth L., Mason D. (1986) What Portion of the Sample Makes a Partial Sum Asymptotically Stable or Normal? Prob. Theory Rel. Fields 72, 1–16.
Davidson R., MacKinnon J., White H. (1983) Tests for Model Specification in the Presence of Alternative Hypotheses: Some Further Results, J. Econometrics 21, 53–70.
Davies R.B. (1977) Hypothesis Testing when a Nuisance Parameter is Present Only under the Alternative, Biometrika 64, 247–254.
Davis R.A., Knight K., Liu J. (1992) M-Estimation for Autoregressions with Infinite Variance, Stoch. Proc. Appl. 40, 145–180.
de Jong R.M. (1996) The Bierens Test under Data Dependence, J. Econometrics 72, 1–32.
de Jong R.M., Davidson J. (2000) Consistency of Kernel Estimators of Heteroscedastic and Autocorrelated Covariance Matrices, Econometrica 68, 407–423.
de Lima P.J.F. (1997) On the Robustness of Nonlinearity Tests to Moment Condition Failure, J. Econometrics 76, 251–280.
Dehling, H., M. Denker, W. Phillip (1986) Central Limit Theorems for Mixing Sequences of Random Variables under Minimal Conditions, Ann. Prob. 14, 1359–1370.
Dette H. (1999) A Consistent Test for the Functional Form of a Regression Based on a Difference of Variance Estimators, Ann. Stat.27, 1012–1040.
Dufour J.M., Farhat A., Hallin M. (2006) Distribution-Free Bounds for Serial Correlation Coefficients in Heteroscedastic Symmetric Time Series, J. Econometrics 130, 123–142.
Doukhan P., Massart, P., Rio E. (1995) Invariance Principles for Absolutely Regular Empirical Processes, Ann. I. H. P. 31, 393–427.
Dudley R. M. (1978) Central Limit Theorem for Empirical Processes. Ann. Prob. 6, 899–929.
Embrechts P., Goldie C.M. (1980) On Closure and Factorization Properties of Subexponential Distributions, J. Aus. Math. Soc. A, 29, 243–256.
Embrechts P., Klüppelberg C., Mikosch T. (1997) Modelling Extremal Events for Insurance and Finance. Springer-Verlag: Frankfurt.
Eubank R., Spiegelman S. (1990) Testing the Goodness of Fit of a Linear Model via Nonparametric Regression Techniques, J. Amer. Stat. Assoc. 85, 387–392.
Fan Y., Li Q. (1996) Consistent Model Specification Tests: Omitted Variables and Semiparametric Functional Forms, Econometrica 64, 865–890.
Fan Y., Li Q. (2000) Consistent Model Specification Tests: Kernel-Based Tests Versus Bierens’ ICM Tests, Econometric Theory 16, 1016–1041.
Finkenstadt B., Rootzén H. (2003) Extreme Values in Finance, Telecommunications and the Environment. Chapman and Hall: New York. Parameter Space, Econometric Theory 26, 965–993.
Gabaix, X. (2008) Power Laws, in The New Palgrave Dictionary of Economics, 2nd Edition, S. N. Durlauf and L. E. Blume (eds.), MacMillan.
Gallant A.R. (1981) Unbiased Determination of Production Technologies. J. Econometrics, 20, 285–323.
Gallant A.R., White H. (1989) There Exists a Neural Network That Does Not Make Avoidable Mistakes, Proceedings of the Second Annual IEEE Conference on Neural Net., I:657–664.
Hall P., Yao Q. (2003) Inference in ARCH and GARCH Models with Heavy-Tailed Errors, Econometrica 71, 285–317.
Hahn M.G., Weiner D.C., Mason D.M. (1991) Sums, Trimmed Sums and Extremes, Birkhäuser: Berlin.
Hansen B.E. (1996). Inference When a Nuisance Parameter Is Not Identified Under the Null Hypothesis, Econometrica 64, 413–430.
Härdle W., Mammen E. (1993) Comparing Nonparametric Versus Parametric Regression Fits, Ann. Stat. 21, 1926–1947.
Hausman J.A. (1978) Specification Testing in Econometrics, Econometrica 46, 1251–1271.
Hill J.B. (2008a) Consistent and Non-Degenerate Model Specification Tests Against Smooth Transition and Neural Network Alternatives, Ann. D’Econ. Statist. 90, 145–179.
Hill J.B. (2008b) Consistent GMM Residuals-Based Tests of Functional Form, Econometric Rev.: forthcoming.
Hill J.B. (2011a) Tail and Non-Tail Memory with Applications to Extreme Value and Robust Statistics, Econometric Theory 27, 844–884.
Hill J.B. (2011b) Robust M-Estimation for Heavy Tailed Nonlinear AR-GARCH, Working Paper, University of North Carolina - Chapel Hill.
Hill J.B. (2011c) Supplemental Appendix for Heavy-Tail and Plug-In Robust Consistent Conditional Moments Tests of Functional Form, www.unc.edu/jbhill/ap_cmtrim.pdf.
Hill J.B. (2012) Stochastically Weighted Average Conditional Moment Tests of Functional Form: Stud. Nonlin. Dyn. Econometrics 16: forthcoming.
Hill, J.B., Aguilar M. (2011) Moment Condition Tests for Heavy Tailed Time Series, J. Econometrics: Annals Issue on Extreme Value Theory: forthcoming.
Hill J.B., Renault E. (2010) Generalized Method of Moments with Tail Trimming, submitted; Dept. of Economics, University of North Carolina - Chapel Hill.
Hoffmann-Jørgensen J. (1991) Convergence of Stochastic Processes on Polish Spaces, Various Publication Series Vol. 39, Matematisk Institute, Aarhus University.
Hong Y., White H. (1995) Consistent Specification Testing Via Nonparametric Series Regression, Econometrica 63, 1133–1159.
Hong Y., Lee Y-J. (2005) Generalized Spectral Tests for Conditional Mean Models in Time Series with Conditional Heteroscedasticity of Unknown Form, Rev. Econ. Stud. 72, 499–541.
Hornik K., Stinchcombe M., White H. (1989) Multilayer Feedforward Networks are Universal Approximators, Neural Net. 2, 359–366.
Hornik K., Stinchcombe M., White H. (1990) Universal Approximation of an Unknown Mapping and Its Derivatives Using Multilayer Feedforward Networks, Neural Net., 3, 551–560.
Ibragimov R., Müller U.K. (2010) t-Statistic based Correlation and Heterogeneity Robust Inference, J. Bus. Econ. Stat. 28, 453–468.
Lahiri S.N. (1995) On the Asymptotic Beahiour of the Moving Block Bootstrap for Normalized Sums of Heavy-Tailed Random Variables, Ann. Stat. 23, 1331–1349.
Leadbetter M.R., Lindgren G., Rootzén H. (1983) Extremes and Related Properties of Random Sequences and Processes. Springer-Verlag: New York.
Lee T., White H., Granger C.W.J. (1993) Testing for Neglected Nonlinearity in Time-Series Models: A Comparison of Neural Network Methods and Alternative Tests, J. Econometrics 56, 269–290.
Ling S. (2005) Self-Weighted LAD Estimation for Infinite Variance Autoregressive Models, J. R. Stat. Soc. B 67, 381–393.
McLeod A.I., Li W.K. (1983) Diagnostic Checking ARMA Time Series Models Using Squared Residual Autocorrelations, J. Time Ser. Anal. 4, 269–273.
Meitz M., Saikkonen P. (2008) Stability of Nonlinear AR-GARCH Models, J. Time Ser. Anal. 29, 453–475.
Newey W.K. (1985) Maximum Likelihood Specification Testing and Conditional Moment Tests, Econometrica 53, 1047–1070.
Peng L., Yao Q. (2003) Least Absolute Deviation Estimation for ARCH and GARCH Models, Biometrika 90, 967–975.
Pham T., Tran L. (1985) Some Mixing Properties of Time Series Models. Stoch. Proc. Appl. 19, 297–303.
Pollard D. (1984) Convergence of Stochastic Processes. Springer-Verlang, New York.
Ramsey J.B. (1969) Tests for Specification Errors in Classical Linear Least-Squares Regression, J. R. Stat. Soc. B 31, 350–371.
Resnick S.I. (1987) Extreme Values, Regular Variation and Point Processes. Springer-Verlag: New York.
Stinchcombe M., White H. (1989) Universal Approximation Using Feedforward Networks with Non-Sigmoid Hidden Layer Activation Functions, Proceedings of the International Joint Conference on Neural Net., I, 612–617.
Stinchcombe M.B., White H. (1992) Some Measurability Results for Extrema of Random Functions Over Random Sets, Rev. Economic Stud., 59, 495–514.
Stinchcombe M.B., White H. (1998) Consistent Specification Testing with Nuisance Parameters Present Only Under the Alternative, Econometric Theory 14, 295–325.
Tsay R. (1986) Nonlinearity Tests for Time Series, Biometrika 73, 461–466.
White H. (1981) Consequences and Detection of Misspecified Nonlinear Regression Models, J. Amer. Stat. Assoc. 76, 419–433.
White H. (1982) Maximum Likelihood Estimation of Misspecified Models, Econometrica 50, 1–25.
White H. (1987) Specification Testing in Dynamic Models, in Truman Bewley, ed., Advances in Econometrics. Cambridge University Press: New York.
White H. (1989a) A Consistent Model Selection Procedure Based on m-Testing, in C.W.J. Granger, ed., Modelling Economic Series: Readings in Econometric Methodology, p. 369–403. Oxford University Press: Oxford.
White H. (1989b) Some Asymptotic Results for Learning in Single Hidden Layer Feedforward Network Models, J. Amer. Stat. Assoc., 84, 1003–1013.
White H. (1990) Connectionist Nonparametric Regression: Multilayer Feedforward Networks Can Learn Arbitrary Mappings, Neural Net., 3, 535–549.
Wooldridge J.M. (1990) A Unified Approach to Robust, Regression-Based Specification Tests, Econometric Theory 6, 17–43.
Yatchew A.J. (1992) Nonparametric Regression Tests Based on Least Squares, Econometric Theory 8, 435–451.
Zheng J.X. (1996) A Consistent Test of Functional Form via Nonparametric Estimation Techniques, J. Econometrics 75, 263–289.
Acknowledgments
The author thanks an anonymous referee and Co-Editor Xiaohong Chen for constructive remarks.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
We ignore for notational economy measurability issues that arise when taking a supremum over an index set. Assume all functions in this chapter satisfy Pollard (1984) permissibility criteria, the measure space that governs all random variables is complete, and therefore all majorants are measurable. Probability statements are therefore with respect to outer probability, and expectations over majorants are outer expectations. Cf. Dudley (1978) and Stinchcombe and White (1992).
Appendix A: Assumptions
We ignore for notational economy measurability issues that arise when taking a supremum over an index set. Assume all functions in this chapter satisfy Pollard (1984) permissibility criteria, the measure space that governs all random variables is complete, and therefore all majorants are measurable. Probability statements are therefore with respect to outer probability, and expectations over majorants are outer expectations. Cf. Dudley (1978) and Stinchcombe and White (1992).
Write thresholds and fractiles compactly \(c_{z,n}(\cdot ) = \max \{l_{z,n}(\cdot ),u_{z,n}(\cdot )\}\) and \(k_{j,n} = \max \left\{ k_{j,\epsilon ,n},k_{j,1,n},...,k_{j,q,n}\right\} \), define \(\sigma _{n}^{2}(\beta ,\gamma ) := E\left[ m_{n,t}^{*2}\left( \beta ,\gamma \right) \right] \) and
Drop \(\beta ^{0}\), define \(\mathfrak I _{t} = \sigma (x_{\tau +1},y_{\tau } :\) \(\tau \le t)\), and let \(\Gamma \) be any compact subset of \( \mathbb R ^{p}\) with positive Lebesgue measure. Six sets of assumptions are employed. First, the test weight is revealing.
W1 (weight). \(a. F : \mathbb R \rightarrow \mathbb R \) is Borel measurable, analytic, and nonpolynomial on some open interval \(R_{0} \subseteq \mathbb R \) containing \(0\) . \(b. \text{sup}_{u\in U}|F(u)| \le K\) and \(\text{inf}_{u\in U}|F(u)| > 0\) on any compact subset \(U \subset S_{F}\), with \(S_{F}\) the support of \(F\) .
Remark
The W1.b upper bound allows us to exclude \(F(\gamma ^{\prime }\psi _{t})\) from the trimming indicators which greatly simplifies proving test consistency under trimming, and is mild since it applies to repeatedly cited weights (exponential, logistic, sine, cosine). The lower bound in W1.b helps to establish a required stochastic equicontinuity condition for weak convergence when \(\epsilon _{t}\) may be heavy tailed, and is easily guaranteed by centering \(F(\gamma ^{\prime }\psi _{t})\) if necessary.
Second, the plug-in \(\hat{\beta }_{n}\) is consistent. Let \(\tilde{m}_{n,t}\) be \(\mathfrak I _{t}\)-measurable mappings from \(\mathcal B \subset \mathcal R^{q}\) to \(\mathcal R^{r}\), \(r \ge q\), and \(\{\tilde{V}_{n}\}\) a sequence of non-random matrices \(\tilde{V}_{n} \in \mathbb R ^{q\times q}\) where \(\tilde{V}_{i,i,n}\;\rightarrow \; \infty \). Stack equations \(\mathcal M _{n,t}^{*}(\beta ,\gamma ) := [m_{n,t}^{*}\left( \beta ,\gamma \right) ,\tilde{m}_{n,t}^{\prime }(\beta )]^{\prime } \in \mathcal R^{r+1}\), and define the covariances \(\tilde{S}_{n}\left( \beta \right) := \text{sum} _{s,t=1}^{n}E[\{\tilde{m}_{n,s}(\beta ) - E[\tilde{m} _{n,s}(\beta )]\} \times \{\tilde{m}_{n,t}(\beta ) - E[\tilde{m} _{n,t}(\beta )]\}^{\prime }]\) and \(\mathfrak S _{n}^{*}(\beta ,\gamma )\) \(:= \text{sum} _{s,t=1}^{n}E[\{\mathcal M _{n,s}^{*}(\beta ,\gamma ) - E[ \mathcal M _{n,s}^{*}(\beta ,\gamma )]\} \times \{\mathcal M _{n,t}^{*}(\beta ,\gamma ) - E[\mathcal M _{n,t}^{*}(\beta ,\gamma )]\}^{\prime }]\), hence \([\mathfrak S _{i,j,n}^{*}(\beta ,\gamma )]_{i=2,j=2}^{r+1,r+1} = \tilde{S}_{n}\left( \beta \right) \). We abuse notation since \(\mathfrak S _{n}^{*}(\beta ,\gamma )\) may not exist for some or any \(\beta \). Let f.d.d. denote finite dimensional distributions.
-
P1 (fast (non)linear plug-ins). \(\tilde{V} _{n}^{1/2}(\hat{\beta }_{n} - \beta ^{0})= O_{p}(1)\) and \(\text{sup}_{\gamma \in \Gamma }||V_{n}(\gamma )\tilde{V}_{n}^{-1}|| \rightarrow 0\).
-
P2 (slow linear plug-ins). \(\mathfrak S _{n}^{*}(\gamma )\) exists for each \(n\) , specifically \( \text{sup}_{\gamma \in \Gamma }||\mathfrak S _{n}^{*}(\gamma )||<\infty \) and \(\lim \text{inf}_{n\rightarrow \infty }\text{inf}_{\gamma \in \Gamma }\lambda _{\min }(\mathfrak S _{n}^{*}(\gamma ))> 0\). Further:
-
a.
\(\tilde{V}_{n}^{1/2}(\hat{\beta }_{n} - \beta ^{0})\) \(= O_{p}(1)\) and \(\tilde{V}_{n} \sim \mathcal K (\gamma )V_{n}(\gamma )\) , where \(\mathcal K : \Gamma \rightarrow \mathbb R ^{q\times q}\) and \(\text{inf}_{\gamma \in \Gamma }\lambda _{\min }( \mathcal K (\gamma )) > 0\) .
-
b.
\(\tilde{V}_{n}^{1/2}(\hat{\beta }_{n} - \beta ^{0}) = \tilde{A}_{n}\text{sum} _{t=1}^{n}\{\tilde{m}_{n,t}\)\(- E[\tilde{m}_{n,t}]\} \times \)\((1 + o_{p}\left( 1\right) ) +\)\(o_{p}\left( 1\right) \) where nonstochastic \(\tilde{A}_{n} \in \mathbb R ^{q\times r}\) has full column rank and \(\tilde{A}_{n}\tilde{S} _{n}^{-1}\tilde{A}_{n}^{\prime } \rightarrow I_{q}\).
-
c.
The f.d.d. of \(\mathfrak S _{n}^{*}\left( \gamma \right) ^{-1/2}\{\mathcal M _{n,t}^{*}(\gamma ) - E[ \mathcal M _{n,t}^{*}(\gamma )]\}\) belong to the same domain of attraction as the f.d.d. of \(S_{n}^{-1}(\gamma )\{m_{n,t}^{*}(\gamma )\)\(- E[m_{n,t}^{*}(\gamma )]{\ }\).
-
a.
-
P3 (orthogonal equations and (non)linear plug-ins). \(\tilde{V}_{n}^{1/2}(\hat{\beta }_{n} - \beta ^{0}) =\) \(O_{p}(1)\) and \(\lim \text{sup}_{n\rightarrow \infty }\text{sup}_{\gamma \in \Gamma }||V_{n}^{\perp }(\gamma )\tilde{V} _{n}^{-1}|| < \infty \).
Remark
\(\hat{\beta }_{n}\) effects the limit distribution of \( {\hat{\mathcal T }}_{n}(\gamma )\) under P2 hence we assume \(\hat{\beta }_{n}\) is linear. P3 is invoked for orthogonalized equations \(\hat{m}_{n,t}^{\bot }(\beta ,\gamma )\).
Third, identification under trimming.
I1 (identification by \(m_{n,t}^{*}(\gamma )\)). Under the null \(\text{sup}_{\gamma \in \Gamma }|nS_{n}^{-1}(\gamma )E[m_{n,t}^{*}\left( \gamma \right) ]|\) \(\rightarrow 0\).
Remark
If \(m_{t}(\gamma )\) is asymmetric there is no guarantee \(E[m_{n,t}^{*}\left( \gamma \right) ] = 0\), although \( E[m_{n,t}^{*}\left( \gamma \right) ] \rightarrow 0\) under \(H_{0}\) by trimming negligibility and dominated convergence. The fractiles \( \{k_{j,\epsilon ,n},k_{j,i,n}\}\) must therefore promote I1 for asymptotic normality in view of expansion (5) and mean centering. Since \( \text{sup}_{\gamma \in \Gamma }\{S_{n}(\gamma )/n\} = o(1)\) by Lemma B.1, below, I1 implies identification of \(H_{0}\) sufficiently fast. The property is superfluous if \(E[\epsilon _{t}] = 0\) under either hypothesis, \(\epsilon _{t}\) is independent of \(x_{t}\) under \(H_{0}\), and re-centering is used since then \(E[m_{n,t}^{*}\left( \gamma \right) ] = 0\) under \(H_{0}\) (see Sect. 3).
Fourth, the DGP and properties of regression model components.
-
R1 (response). \(f(\cdot ,\beta )\) is for each \( \beta \in \mathcal B \) a Borel measurable function, continuous, and differentiable on \(\mathcal B \) with Borel measurable gradient \(g_{t}(\beta ) = g(x_{t},\beta ) := (\partial /\partial \beta )f(x_{t},\beta )\).
-
R2 (moments). \(E|y_{t}| < \infty \), and \( E(\text{sup}_{\beta \in \mathcal B }|f(x_{t},\beta )|^{\iota }) < \infty \) and \(E(\text{sup}_{\beta \in \mathcal B }|(\partial /\partial \beta _{i})f(x_{t},\beta )|^{\iota }) < \infty \) for each \(i\) and some tiny \(\iota > 0\).
-
R3 (distribution).
-
a.
The finite dimensional distributions of \(\{y_{t},x_{t}\}\)are strictly stationary, non-degenerate, and absolutely continuous. The density function of \(\epsilon _{t}(\beta )\) is uniformly bounded \(\text{sup}_{\beta \in \mathcal B }\text{sup}_{a\in \mathbb R }\{(\partial /\partial a)P(\epsilon _{t}(\beta ) \le a)\} < \infty \).
-
b.
Define \(\kappa _{\epsilon }(\beta ) := \mathrm argsup _{\alpha \ > \ 0}\{E|\epsilon _{t}(\beta )|^{\alpha } < \infty \} \in (0,\infty ]\), write \(\kappa _{\epsilon } = \kappa _{\epsilon }(\beta ^{0})\), and let \(\mathcal B _{2,\epsilon }\) denote the set of \( \beta \) such that the error variance is infinite \(\kappa _{\epsilon }(\beta ) \le 2\). If \(\kappa _{\epsilon }(\beta ) \le 2\) then\(P(|\epsilon _{t}(\beta )|\)\(> c) = d(\beta )\epsilon ^{-\kappa _{\epsilon }(\beta )}(1 + o(1))\) where \(\text{inf}_{\beta \in \mathcal B _{2,\epsilon }}d(\beta ) > 0\) and \( \text{inf}_{\beta \in \mathcal B _{2,\epsilon }}\kappa _{\epsilon }(\beta )\)\(> 0\), and \(o(1)\) is not a function of \(\beta \), hence\(\lim _{c\rightarrow \infty }\text{sup}_{\beta \in \mathcal B _{2,\epsilon }}|d(\beta )^{-1}\epsilon ^{\kappa _{\epsilon }(\beta )}P(|\epsilon _{t}(\beta )| > c) - 1| = 0\).
-
a.
-
R4 (mixing). \(\{y_{t},x_{t}\}\) are geometrically \( \beta \) -mixing: \(\text{sup}_{\mathcal A \subset \mathfrak I _{t+l}^{+\infty }}E|P( \mathcal A |\mathfrak I _{-\infty }^{t}) - P(\mathcal A )|\) \(= o(\rho ^{l})\) for \(\rho \in (0,1)\).
Remark 1
Response function smoothness R1 coupled with distribution continuity and boundedness R3.a imply \(\text{sum} _{t=1}^{n}\hat{m} _{n,t}^{*}(\hat{\beta }_{n},\gamma )\) can be asymptotically expanded around \(\beta ^{0}\), cf. Hill (2011b, Appendices B and C). Power-law tail decay R3.b is mild since it includes weakly dependent processes that satisfy a central limit theorem (Leadbetter et al. 1983), and simplifies characterizing tail-trimmed variances in heavy-tailed cases by Karamata’s Theorem.
Remark 2
The mixing property characterizes nonlinear AR with nonlinear random volatility errors (Pham and Tran 1985; An and Huang 1996; Meitz and Saikkonen 2008).
Fifth, we restrict the fractiles and impose nondegeneracy under trimming. Recall \(k_{j,n} = \max \{k_{j,\epsilon ,n}, k_{j,1,n}, ...,k_{j,q,n}\}\), the R3.b moment supremum \(\kappa _{\epsilon } > 0\), and \(\sigma _{n}^{2}(\beta ,\gamma ) = E[m_{n,t}^{*2}(\beta ,\gamma )]\).
-
F1 (fractiles).
-
a.
\(k_{j,\epsilon ,n}/\ln (n) \rightarrow \infty \);
-
b.
If \(\kappa _{\epsilon } \in (0,1)\) then \(k_{j,\epsilon ,n}/n^{2\left( 1-\kappa _{\epsilon }\right) /\left( 2-\kappa _{\epsilon }\right) } \rightarrow \infty \).
-
a.
-
F2 (nondegenerate trimmed variance). \(\lim \text{inf}_{n\rightarrow \infty }\text{inf}_{\beta \in \mathcal B ,\gamma \in \Gamma }\{S_{n}^{2}(\beta ,\gamma )/n\} > 0\)and\( \text{sup}_{\beta \in \mathcal B ,\gamma \in \Gamma }\{n\sigma _{n}^{2}(\beta ,\gamma )/S_{n}^{2}(\beta ,\gamma )\} = O(1)\).
Remark 1
F1.a sets a mild lower bound on \(k_{\epsilon ,n}\) that is useful for bounding trimmed variances \(\sigma _{n}^{2}(\beta ,\gamma )\) and \(S_{n}^{2}(\beta ,\gamma )\). F1.b sets a harsh lower bound on \( k_{\epsilon ,n}\) if, under misspecification, \(\epsilon _{t}\) is not integrable: as \(\kappa _{\epsilon } \searrow 0\) we must trim more \( k_{\epsilon ,n} \nearrow n\) in order to prove a LLN for \(m_{n,t}^{*}(\gamma )\) which is used to prove \({\hat{\mathcal T }}_{n}(\gamma )\) is consistent. Any \(k_{\epsilon ,n} \sim n/L(n)\) for slowly varying \(L(n) \rightarrow \) \(\infty \) satisfies F1.
Remark 2
Distribution nondegeneracy under R3.a coupled with trimming negligibility ensure trimmed moments are not degenerate for sufficiently large \(n\), for example \(\lim \text{inf}_{n\rightarrow \infty }\text{inf}_{\beta \in \mathcal B ,\gamma \in \Gamma }\sigma _{n}^{2}(\beta ,\gamma ) > 0\). The long-run variance \(S_{n}^{2}(\beta ,\gamma )\), however, can in principle be degenerate due to negative dependence, hence F2 is imposed. F2 is standard in the literature on dependent CLT’s and exploited here for a CLT for \(m_{n,t}^{*}(\beta ,\gamma )\), cf. Dehling et al. (1986) .
Finally, the kernel \(\omega (\cdot )\) and bandwidth \(b_{n}\).
K1 (kernel and bandwidth). \(\omega (\cdot )\) is integrable, and a member of the class \(\omega : \mathbb R \rightarrow [-1,1] | \omega (0) = 1, \omega (x) = \omega (-x) \forall x \in \mathbb R , \int _{-\infty }^{\infty }|\omega (x)|dx < \infty , \int _{-\infty }^{\infty }|\vartheta (\xi )|d\xi < \infty , \omega (\cdot )\) is continuous at \(0\) and all but a finite number of points \(\},\) where \(\vartheta (\xi ) := (2\pi )^{-1}\int _{-\infty }^{\infty }\omega (x)e^{i\xi x}dx<\infty \). Further \( \sum \nolimits _{s,t=1}^{n}|\omega ((s - t)/b_{n})| = o(n^{2}), \text{max} _{1\le s\le n}|\sum \nolimits _{t=1}^{n}\omega ((s - t)/b_{n})| =\)\(o(n)\) and \(b_{n} = o(n)\).
Remark
Assumption K1 includes Bartlett, Parzen, Quadratic Spectral, Tukey-Hanning, and other kernels. See Jong and Davidson (2000) and their references.
Appendix B: Proofs of Main Results
We require several preliminary results proved in the supplemental appendix Hill (2011c, Sect. C.3). Throughout the terms \(o_{p}(1)\), \(O_{p}(1)\), \( o(1) \) and \(O(1)\), do not depend on \(\beta \), \(\gamma \), and \(t\). We only state results that concern \(\hat{m}_{n,t}^{*}(\beta ,\gamma )\), and \( m_{n,t}^{*}(\beta ,\gamma )\), since companion results extend to \(\hat{m} _{n,t}^{\perp }(\beta ,\gamma )\), and \(m_{n,t}^{\perp }(\beta ,\gamma )\). Let F1–F2, K1, R1–R4, and W1.b hold. Recall \(\sigma _{n}^{2}(\beta ,\gamma ) =\) \(E[m_{n,t}^{*2}(\beta ,\gamma )]\).
Lemma B.1
(variance bounds)
-
a.
\(\sigma _{n}^{2}(\beta ,\gamma ) = o\left( n\max \left\{ 1,(E[m_{n,t}^{*}(\beta ,\gamma )])^{2}\right\} \right) \) , \( \sup \limits _{\gamma \in \Gamma }\left\{ \dfrac{\sigma _{n}^{2}(\gamma )}{ \max \left\{ 1,(E[m_{n,t}^{*}(\gamma )])^{2}\right\} }\right\} = o(n/\ln (n));\)
-
b.
\(S_{n}^{2}(\gamma ) = \mathfrak L _{n}n\sigma _{n}^{2}(\gamma ) =\)\(o(n^{2})\) for some sequence \(\{\mathfrak L _{n}\}\) that satisfies \(\lim \text{inf}_{n\rightarrow \infty }\mathfrak L _{n} > 0\), \( \mathfrak L _{n} = K\)if \(\epsilon _{t}\) is finite dependent or\(E[\epsilon _{t}^{2}] <\infty \), and otherwise \( \mathfrak L _{n} \le K\ln (n/\text{min}_{j\in \{1,2\}}\{k_{j,\epsilon ,n}\}) \le K\ln (n)\).
Lemma B.2
(variance bounds)
-
a.
\(\text{sup}_{\gamma \in \Gamma }|S_{n}^{-1}(\gamma )\text{sum} _{t=1}^{n}\{\hat{m}_{n,t}^{*}(\gamma )-m_{n,t}^{*}(\gamma )\}| \ = \ o_{p}(1)\).
-
b.
Define \(\hat{\mu }_{n,t}^{*}(\beta ,\gamma ) := \hat{m} _{n,t}^{*}(\beta ,\gamma ) - \hat{m}_{n}^{*}(\beta ,\gamma )\) and \(\mu _{n,t}^{*}(\beta ,\gamma ) := m_{n,t}^{*}(\beta ,\gamma ) - m_{n}^{*}(\beta ,\gamma )\). If additionally P1 or P2 holds \(\text{sup}_{\gamma \in \Gamma }|S_{n}^{-2}(\gamma )\text{sum} _{s,t=1}^{n}\omega ((s - t)/b_{n})\{\hat{\mu }_{n,s}^{*}(\hat{ \beta }_{n},\gamma )\hat{\mu }_{n,t}^{*}(\hat{\beta }_{n},\gamma ) - \mu _{n,s}^{*}(\gamma )\mu _{n,t}^{*}(\gamma )\}| =\)\(o_{p}(1)\).
Lemma B.3
(variance bounds) Let \(\beta ,\tilde{\beta } \in \mathcal B \). For some sequence \(\{\beta _{n,*}\}\) in \(\mathcal B \) satisfying \(||\beta _{n,*} -\) \(\tilde{\beta }|| \le ||\beta - \tilde{\beta }||\), and for some tiny \(\iota > 0\) and arbitrarily large finite \(\delta > 0\) we have \(\text{sup}_{\gamma \in \Gamma }|\hat{m}_{n}^{*}(\beta ,\gamma ) - \hat{m} _{n}^{*}(\tilde{\beta },\gamma ) - \hat{J}_{n}^{*}(\beta _{n,*},\gamma )^{\prime }(\beta - \tilde{\beta })| = n^{-\delta } \times ||\beta - \tilde{\beta }||^{1/\iota } \times o_{p}(1)\).
Lemma B.4
(Jacobian) Under P1 or P2 \(\text{sup}_{\gamma \in \Gamma }||J_{n}^{*}(\hat{\beta }_{n},\gamma ) -\) \(J_{n}(\gamma )(1 + o_{p}(1))|| = o_{p}(1)\).
Lemma B.5
(HAC) Under P1 or P2 \(\text{sup}_{\gamma \in \Gamma }|\hat{S}_{n}^{2}(\hat{\beta }_{n},\gamma )/S_{n}^{2}(\gamma )\) \(- 1| \overset{p}{\rightarrow } 0\).
Lemma B.6
(ULLN) Let \(\text{inf}_{n\ge N}|E[m_{n,t}^{*}(\gamma )]| > 0\) for some \(N\in \mathbb N \) and all \(\gamma \in \Gamma /S\) where \(S\) has measure zero. Then \(\text{sup}_{\gamma \in \Gamma /S}\{1/n\text{sum} _{t=1}^{n}m_{n,t}^{*}(\gamma )/E[ m_{n,t}^{*}(\gamma )] \} \overset{p}{\rightarrow }\,1\).
Lemma B.7
(UCLT) \(\{S_{n}^{-1}(\gamma )\text{sum} _{t=1}^{n}(m_{n,t}^{*}(\gamma ) - E[m_{n,t}^{*}(\gamma )])\) \(: \gamma \in \Gamma \} \Longrightarrow \{z(\gamma ) : \gamma \in \Gamma \}\), a scalar \((0,1)\)-Gaussian process on \(\mathcal C [\Gamma ]\) with covariance function \(E[z(\gamma _{1})z(\gamma _{2})]\) and a.s. bounded sample paths. If P2 also holds then \(\{\mathfrak S _{n}^{-1/2}(\gamma )\text{sum} _{t=1}^{n}\{\mathcal M _{n,t}^{*}(\gamma ) -\) \(E[\mathcal M _{n,t}^{*}(\gamma )] : \gamma \) \(\in \Gamma \} \Longrightarrow \) \(\{\mathcal Z (\gamma ) : \gamma \in \Gamma \}\) an \(r + 1\) dimensional Gaussian process on \(\mathcal C [\Gamma ]\) with zero mean, covariance \(I_{r+1}\) , and covariance function \(E[\mathcal Z (\gamma _{1})\mathcal Z (\gamma _{2})^{\prime }]\).
Proof of Lemma
We only prove the claims for \( m_{n,t}^{*}(\beta ,\gamma )\). In view of the \(\sigma (x_{t})\) -measurability of \(\mathcal P _{n,t}(\gamma )\) and \(\text{sup}_{\gamma \in \Gamma }E|\mathcal P _{n,t}(\gamma )| < \infty \) the proof extends to \( m_{n,t}^{\perp }(\beta ,\gamma )\) with few modifications. Under \(H_{0}\) the claim follows from trimming negligibility and Lebesgue’s dominated convergence: \(E[m_{n,t}^{*}(\gamma )] \rightarrow \) \(E[m_{t}(\gamma )] = 0\).
Under the alternative there are two cases: \(E|\epsilon _{t}| <\) \(\infty \), or \(E|\epsilon _{t}| = \infty \) such that \(E[\epsilon _{t}|x_{t}]\) may not exist.
Case 1 (\(E|\epsilon _{t}|<\infty \)). Property W1, compactness of \(\Gamma \), and boundedness of \(\psi \) imply \(F(\gamma ^{\prime }\psi _{t})\) is uniformly bounded and revealing: \(E[\epsilon _{t}F(\gamma ^{\prime }\psi _{t})] \ne 0\) for all \(\gamma \in \Gamma /S\) where \(S\) has Lebesgue measure zero. Now invoke boundedness of \( F(\gamma ^{\prime }\psi _{t})\) with Lebesgue’s dominated convergence theorem and negligibility of trimming to deduce \(|E[\epsilon _{t}(1 - I_{n,t}(\beta ^{0}))F(\gamma ^{\prime }\psi _{t})]| \rightarrow 0\), hence \(E[\epsilon _{t}I_{n,t}(\beta ^{0})F(\gamma ^{\prime }\psi _{t})] =\)\(E\left[ \epsilon _{t}F(\gamma ^{\prime }\psi _{t})\right] + o(1) \ne 0\) for all \(\gamma \in \Gamma /S\) and all \(n \ge N\) for sufficiently large \(N\).
Case 2 (\(E|\epsilon _{t}| = \infty \)). Under \( H_{1}\) since \(I_{n,t}(\beta ) \rightarrow 1\,a.s.\) and \(E|\epsilon _{t}| = \infty \), by the definition of conditional expectations there exists sufficiently large \(N\) such that \(\text{min}_{n\ge N}|E[\epsilon _{t}I_{n,t}(\beta ^{0})|x_{t}]| > 0\) with positive probability \(\forall n \ge N\). The claim therefore follows by Theorem 1 of Bierens and Ploberger (1997) and Theorem 2.3 of Stinchcombe and White (1998): \(\lim \text{inf}_{n\rightarrow \infty }|E[\epsilon _{t}I_{n,t}(\beta ^{0})F(\gamma ^{\prime }\psi _{t})]| > 0\) for all \(\gamma \in \Gamma /S\). \( \mathcal{QED} \).
Proof of Theorem
Define \(M_{n,t}^{*}(\beta ,\gamma ) := m_{n,t}^{*}(\beta ,\gamma ) - E[m_{n,t}^{*}(\beta ,\gamma )]\) and \(\hat{M}_{n,t}^{*}(\beta ,\gamma ) := \hat{m}_{n,t}^{*}(\beta ,\gamma ) - E[\hat{m}_{n,t}^{*}(\beta ,\gamma )]\). We first state some required properties. Under plug–in properties P1 or P2 \(\hat{ \beta }_{n} - \beta ^{0} = o_{p}\left( 1\right) \). Identification I1 imposes under \(H_{0}\)
which implies the following long-run variance relation uniformly on \(\Gamma \):
Uniform expansion Lemma B.3, coupled with Jacobian consistency Lemma B.4 and \(\hat{\beta }_{n} \overset{p}{\rightarrow } \beta ^{0}\) imply for any arbitrarily large finite \(\delta > 0\),
Finally, by uniform approximation Lemma B.2.a
and by Lemma B.5 we have uniform HAC consistency:
Claim i ( \(\hat{T}_{n}\left( \gamma \right) :\) Null \(H_{0}\)). Under fast plug-in case P1 we assume \( \text{sup}_{\gamma \in \Gamma }||V_{n}(\gamma )\tilde{V}_{n}^{-1}|| \rightarrow \) \(0\), hence
Since \(\delta > 0\) in (B.3) may be arbitrarily large, \(\lim \text{inf}_{n\rightarrow \infty }\text{inf}_{\gamma \in \Gamma }S_{n}(\gamma )>0\) by nondegeneracy F2, and Eqs. (B.1)–(B.6) are uniform properties, it follows uniformly on \(\Gamma \)
say. Now apply variance relation (B.2), UCLT Lemma B.7 and the mapping theorem to conclude \(E[\mathcal M _{n}^{2}\left( \gamma \right) ] \rightarrow 1\) and \(\{{\hat{\mathcal T }}_{n}\left( \gamma \right) : \gamma \in \Gamma \} \Longrightarrow \{z^{2}(\gamma ) : \gamma \in \Gamma \}\), where \(z(\gamma )\) is \((0,1)\)-Gaussian process on \(\mathcal C [\Gamma ]\) with covariance function \(E[z(\gamma _{1})z(\gamma _{2})]\).
Under slow plug-in case P2 a similar argument applies in lieu of plug-in linearity and UCLT Lemma B.7. Since the steps follow conventional arguments we relegate the proof to Hill (Hill 2011c, Sect. C.2).
Claim ii (\(\hat{T}_{n}\left( \gamma \right) :\)Alternative \(H_{1}\)). Lemma 2.1 ensures \(\text{inf}_{n\ge N}\left|E[m_{n,t}^{*}(\gamma )]\right|> 0\) for some \(N \in \mathbb N \) and all \(\gamma \in \Gamma /S\) where \(S \subset \Gamma \) has Lebesgue measure zero. Choose any \(\gamma \in \Gamma /S\), assume \(n \ge N\) and write
In lieu of (B.5) and the Lemma B.1.a,b variance property \( n|E[m_{n,t}^{*}(\gamma )]|/ S_{n}(\gamma ) \rightarrow \infty \), the proof is complete if we show \(\mathcal M _{n}(\hat{\beta }_{n},\gamma ) :=\)\(|1/n\text{sum} _{t=1}^{n}\hat{m}_{n,t}^{*}(\hat{\beta }_{n}, \gamma )|/|E[m_{n,t}^{*}(\gamma )]| \overset{p}{\rightarrow } 1\).
By (B.3), (B.4) and the triangle inequality \(\mathcal M _{n}(\hat{ \beta }_{n},\gamma )\) is bounded by
where \(\text{sup}_{\gamma \in \Gamma /S}\{1/n\text{sum} _{t=1}^{n}m_{n,t}^{*}(\gamma )/E[m_{n,t}^{*}(\gamma )]\} \overset{p}{\rightarrow } 1\) by Lemma B.6. Further, combine fast or slow plug-in P1 or P2, the construction of \( V_{n}\left( \gamma \right) \) and variance relation Lemma B.1.a,b to obtain
Therefore \(\mathcal M _{n}(\hat{\beta }_{n},\gamma ) \overset{p}{ \rightarrow } 1\).
Claim iii (\(\hat{T}_{n}^{\perp }\left( \gamma \right) \) ). The argument simply mimics claims (\(i\)) and (\(ii\)) since under plug-in case P3 it follows \(\hat{S}_{n}^{\perp }(\hat{\beta } _{n},\gamma )^{-1}\text{sum} _{t=1}^{n}\hat{m}_{n,t}^{\perp }(\hat{\beta } _{n},\gamma ) \overset{p}{\sim } S_{n}^{\perp }(\gamma )^{-1}\text{sum} _{t=1}^{n}m_{n,t}^{\perp }(\gamma )\) by construction of the orthogonal equations (Wooldridge 1990), and straightforward generalizations of the supporting lemmas. \(\mathcal{QED} \).
The remaining proofs exploit the fact that for each \(z_{t} \in \{\epsilon _{t},g_{i,t}\}\) the product \(z_{t}F\left( \gamma ^{\prime }\psi _{t}\right) \) has the same tail decay rate as \(z_{t}\): by weight boundedness W1.b \(P(|z_{t}\text{sup}_{u\in \mathbb R }F(u)| > c) \ge P(|z_{t}F_{t}\left( \gamma \right) | > c) \ge P(|z_{t}\text{inf}_{u\in \mathbb R }F(u)| > c)\). Further, use \(I_{n,t} = I_{\epsilon ,n,t}I_{g,n,t}\), dominated convergence and each \(I_{z,n,t} \overset{a.s.}{\rightarrow } 1\) to deduce \(E[|z_{t}F(\gamma ^{\prime }\psi _{t})|^{r}I_{n,t}] = E[|z_{t}F(\gamma ^{\prime }\psi _{t})|^{r}I_{z,n,t}] \times (1 + o(1))\) for any \(r>0\). Hence higher moments of \(z_{t}F(\gamma ^{\prime }\psi _{t})I_{n,t}\) and \(z_{t}I_{z,n,t}\) are equivalent up to a constant scale.
Proof of Theorem
The claim under \(H_{1}\) follows from Theorem 2.2. We prove \(\tau _{n}(\alpha ) \overset{d}{ \rightarrow } (1 - \underline{\lambda })^{-1}\int _{ \underline{\lambda }}^{1}I(u(\lambda )<\alpha )d\lambda \) under \(H_{0}\) for plug-in case P1 since the remaining cases follow similarly. Drop \(\gamma \) and write \(\hat{m}_{n,t}^{*}(\hat{\beta }_{n},\lambda )\) and \(\hat{S} _{n}^{2}(\hat{\beta }_{n},\lambda )\) to express dependence on \(\lambda \in \Lambda := [\underline{\lambda },1]\). Define \(\hat{Z} _{n}(\lambda ) := \hat{S}_{n}^{-1}(\hat{\beta }_{n},\lambda )\text{sum} _{t=1}^{n}\hat{m}_{n,t}^{*} (\hat{\beta }_{n},\lambda )\). We exploit weak convergence on a Polish spaceFootnote 9: we write \(\{\hat{Z} _{n}(\lambda ) : \lambda \in \Lambda \} \Longrightarrow ^{*}\) \(\{z(\lambda ) : \lambda \in \Lambda \}\) on \(l_{\infty }(\Lambda )\), where \(\{z(\lambda ) : \lambda \in \Lambda \}\) is a Gaussian process with a version that has uniformly bounded and uniformly continuous sample paths with respect to \(||\cdot ||_{2}\), if \(\hat{Z}_{n}(\lambda )\) converges in f.d.d. and tightness applies: \(\text{lim} _{\delta \rightarrow 0}\lim\text{sup} _{n\rightarrow \infty }P(\text{sup}_{||\lambda -\tilde{\lambda }||\le \delta }|\hat{Z}_{n}(\lambda ) - \hat{Z}_{n}(\tilde{\lambda })| > \varepsilon ) = 0 \forall \varepsilon > 0\).
We need only prove \(\{\hat{Z}_{n}(\lambda ) : \lambda \in \) \(\Lambda \} \Longrightarrow ^{*} \{z(\lambda ) : \lambda \in \Lambda \}\) since the claim follows from multiple applications of the mapping theorem. Convergence in f.d.d. follows from \(\text{sup}_{\lambda \in \Lambda }|\hat{S}_{n}^{-1}(\hat{\beta }_{n},\lambda )\text{sum} _{t=1}^{n}\hat{m} _{n,t}^{*}(\hat{\beta }_{n},\lambda ) - S_{n}^{-1}(\lambda )\text{sum} _{t=1}^{n}m_{n,t}^{*}(\lambda )| \overset{p}{\rightarrow } 0\) by (B.3)–(B.5) under plug-in case P1, and the proof of UCLT Lemma B.7.
Consider tightness and notice by (B.3)–(B.6) and plug-in case P1
hence we need only to consider \(\mathcal Z _{n}\left( \lambda \right) \) for tightness. By Lemma B.1.b and \(\inf \{\Lambda \} > 0\) it is easy to verify \(\text{inf}_{\lambda \in \Lambda }S_{n}^{2}(\lambda ) = n\sigma _{n}^{2} \) for some sequence \(\{\sigma _{n}^{2}\}\) that satisfies \(\lim \text{inf}_{n\rightarrow \infty }\sigma _{n}^{2} > 0\). Therefore
By subadditivity it suffices to prove each \(\text{lim}_{\delta \rightarrow 0}\lim\text{sup}_{n\rightarrow \infty }P(\text{sup}_{||\lambda -\tilde{\lambda }||\le \delta }\mathcal A _{i,n}(\lambda ,\tilde{\lambda }) > \varepsilon ) =\) \(0 \forall \varepsilon > 0\).
Consider \(\mathcal A _{1,n}(\lambda ,\tilde{\lambda })\) and note \( I_{n,t}(\lambda )\) can be approximated by a sequence of continuous, differentiable functions. Let \(\{\mathcal N _{n}\}\) be a sequence of positive numbers to be chosen below, and define a smoothed version of \(I_{n,t}(\lambda )\),
where \(\mathrm S (u)\) is a so-called “smudge” function used to blot out \( I_{n,t}(\varpi )\) when \(\varpi \) is outside the interval \((\lambda - 1/ \mathcal N _{n},\lambda + 1/\mathcal N _{n})\). The term \(\{\cdot \}\) after the second equality defines \(\mathrm S (u)\) on \([-1,1]\). The random variable \(\mathfrak I _{\mathcal N _{n},n,t}(\lambda )\) is \(\mathfrak I _{t}\) -measurable, uniformly bounded, continuous, and differentiable for each \( \mathcal N _{n}\), and since \(k_{n}(\lambda ) \ge k_{n}(\tilde{\lambda } ) \) for \(\lambda \ge \tilde{\lambda }\) then \(\mathfrak I _{\mathcal N _{n},n,t}(\lambda ) \le \mathfrak I _{\mathcal N _{n},n,t}(\tilde{ \lambda }) a.s.\) Cf. Phillips (1995).
Observe \(\mathcal A _{1,n}(\lambda ,\tilde{\lambda }) = \mathcal B _{1, \mathcal N _{n},n}(\lambda ,\tilde{\lambda }) + \mathcal B _{2,\mathcal N _{n},n}(\lambda ) + \mathcal B _{2,\mathcal N _{n},n}(\tilde{\lambda })\) where
Consider \(\mathcal B _{1,\mathcal N _{n},n}(\lambda ,\tilde{\lambda })\), define \(\mathcal D _{\mathcal N _{n},n,t}(\lambda ) := (\partial /\partial \lambda )\mathfrak I _{\mathcal N _{n},n,t}(\lambda )\), and let \( \{b_{n}(\lambda ,\iota )\}\) for infinitessimal \(\iota > 0\) be any sequence of positive numbers that satisfies \(P(|m_{t}| > b_{n}(\lambda ,\iota )) \rightarrow \lambda - \iota \in (0,1)\), hence \( \text{lim} _{n\rightarrow \infty }\text{sup}_{\lambda \in \Lambda }b_{n}(\lambda ,\iota )\) \(< \infty \). By the mean-value-theorem \(\mathfrak I _{\mathcal N _{n},n,t}(\lambda )-\mathfrak I _{\mathcal N _{n},n,t}(\tilde{\lambda }) =\) \(\mathcal D _{\mathcal N _{n},n,t}(\lambda _{*})(\lambda -\tilde{\lambda } )\) for some \(\lambda _{*} \in \Lambda \), \(|\lambda - \lambda _{*}| \le |\lambda - \tilde{\lambda }|\). But since \( \text{sup}_{\lambda \in \Lambda }|I_{n,t}(\lambda ) - 1| \overset{a.s.}{ \rightarrow } 0\) it must be the case that \(\text{sup}_{\lambda \in \Lambda }| \mathcal D _{\mathcal N _{nn},n,t}(\lambda )| \rightarrow 0 a.s.\) as \( n \rightarrow \infty \) for any \(\mathcal N _{n} \rightarrow \infty \). Therefore, for \(N\) sufficiently large, all \(n \ge N\), any \(p > 0\) and some \(\{b_{n}(\lambda ,\iota )\}\) we have \(\text{sup}_{\lambda \in \Lambda }E|m_{t}\mathcal D _{\mathcal N _{n},n,t}(\lambda )|^{p} \le K\text{sup}_{\lambda \in \Lambda }E|m_{t}I(|m_{t}| \le b_{n}(\lambda ,\iota ) )|^{p}\le K\text{sup}_{\lambda \in \Lambda }b_{n}^{p}(\lambda ,\iota )\) which is bounded on \( \mathbb N \). This implies \(m_{t}\mathcal D _{\mathcal N _{n},n,t}(\lambda )\) is \(L_{p}\) -bounded for any \(p > 2\) uniformly on \(\Lambda \times \mathbb N \), and geometrically \(\beta \)-mixing under R4. In view of \(\lim \text{inf}_{n\rightarrow \infty }\sigma _{n}^{2} > 0\) we may therefore apply Lemma 3 in Doukhan et al. (1995) to obtain \(\text{sup}_{\lambda \in \Lambda }|n^{-1/2}\sigma _{n}^{-1}\text{sum} _{t=1}^{n}m_{t}\mathcal D _{\mathcal N _{n},n,t}(\lambda )| = O_{p}(1)\). This suffices to deduce \(\text{lim} _{\delta \rightarrow 0}\lim\text{sup} _{n\rightarrow \infty }P(\text{sup}_{||\lambda -\tilde{\lambda }||\le \delta }|\mathcal B _{1,\mathcal N _{n},n}(\lambda ,\tilde{\lambda } )| > \varepsilon )\) is bounded by
Further, since the rate \(\mathcal N _{n} \rightarrow \infty \) is arbitrary, we can always let \(\mathcal N _{n} \rightarrow \) \(\infty \) so fast that \(\lim\text{sup} _{n\rightarrow \infty }P(\text{sup}_{\lambda \in \Lambda }|\mathcal B _{2,\mathcal N _{n},n}(\lambda )| > \varepsilon ) = 0\), cf. Phillips (1995). By subadditivity this proves \(\text{lim} _{\delta \rightarrow 0}\lim\text{sup} _{n\rightarrow \infty }P(\text{sup}_{||\lambda - \tilde{\lambda }||\le \delta }\mathcal A _{1,n}(\lambda ,\tilde{\lambda }) > \varepsilon ) = 0 \forall \varepsilon > 0\).
Now consider \(\mathcal A _{2,n}(\lambda ,\tilde{\lambda })\). By UCLT Lemma B.7 \(\text{sup}_{\lambda \in \Lambda }|S_{n}^{-1}(\lambda )\text{sum} _{t=1}^{n}m_{t}I_{n,t} (\lambda )|=O_{p}(1)\) for any compact subset \(\Lambda \) of \((0,1]\). The proof is therefore complete if we show \( |S_{n}(\lambda )/S_{n}(\tilde{\lambda }) - 1| \le K|\lambda - \tilde{\lambda }|^{1/2}\). By Lemma B.1.b \(S_{n}^{2}(\lambda ) = \mathfrak L _{n}(\lambda )nE[m_{t}^{2}I_{n,t}(\lambda )]\). Compactness of \(\Lambda \subset (0,1]\) ensures \(\lim \text{inf}_{n\rightarrow \infty }\text{inf}_{\lambda \in \Lambda }\mathfrak L _{n}\) \((\lambda )>0\) and \(\text{sup}_{\lambda \in \Lambda } \mathfrak L _{n}(\lambda ) = O(\ln (n))\), and by distribution continuity \(E[m_{t}^{2}I_{n,t}(\lambda )]\) is differentiable, hence \(|S_{n}(\lambda )/S_{n}(\tilde{\lambda }) - 1| \le K(\text{sup}_{\lambda \in \Lambda }\{|G_{n}(\lambda )|\}/E[m_{t}^{2}I_{n,t}(\lambda )])^{1/2} \times |\lambda - \tilde{\lambda }|^{1/2} =: \mathcal E _{n}|\lambda - \tilde{\lambda }|^{1/2}\) where \(G_{n}(\lambda ) := \left( \partial /\partial \lambda \right) E[m_{t}^{2}I_{n,t}(\lambda )]\). Since \(k_{n} \sim \lambda n/\ln (n)\) it is easy to verify \(\lim \text{sup}_{n\rightarrow \infty }\text{sup}_{\lambda \in \Lambda }\mathcal E _{n}< \infty \): if \( E[m_{t}^{2}] < \infty \) then the bound is trivial, and if \(E[m_{t}^{2}]\) \(= \infty \) then use \(c_{\epsilon ,n} = K(n/k_{n})^{1/\kappa } = K(\ln (n))^{1/\kappa }\lambda ^{-1/\kappa }\) and Karamata’s Theorem (Resnick 1987, Theorem 0.6). \(\mathcal{QED} \).
Proof of Lemma 4.1
By Lemma B.7 in Hill (2011b) \( J_{n}(\gamma ) = -E[g_{t}F_{t}(\gamma )I_{n,t}] \times (1 + o(1))\) hence it suffices to bound \((E[g_{i,t}F_{t}\left( \gamma \right) I_{n,t}])^{2}/S_{n}^{2}(\gamma )\). The claim follows from Lemma B.1.b, and the following implication of Karamata’s theorem (e.g. Resnick 1987, Theorem 0.6): if any random variable \(w_{t}\) has tail \(P(|w_{t}| > w) = dw^{-\kappa }(1 + o(1))\), and \(w_{n,t}^{*} := w_{t}I(|w_{t}| \le c_{w,n})\), \(P(|w_{t}| > c_{w,n}) = k_{w,n}/n = o(1)\) and \(k_{w,n} \rightarrow \infty \), then \(E|w_{n,t}^{*}|^{p}\) is slowly varying if \(p = \kappa \), and \(E|w_{n,t}^{*}|^{p} \sim Kc_{w,n}^{p}(k_{w,n}/n) = K(n/k_{w,n})^{p/\kappa -1}\) if \(p > \kappa \). \(\mathcal{QED} \).
Proof of Lemma 4.2
First some preliminaries. Integrability of \(\epsilon _{t}\) is assured by \(\kappa > 1\), and \(y_{t}\) has tail (11) with the same tail index \(\kappa \) (Brockwell and Cline 1985). Stationarity ensures \(\epsilon _{t}(\beta ) = \text{sum} _{i=0}^{\infty }\psi _{i}(\beta )\epsilon _{t-i}\), where \(\text{sup}_{\beta \in \mathcal B }|\psi _{i}(\beta )| \le K\rho ^{i}\) for \(\rho \in (0,1)\), \(\psi _{0}(\beta ^{0}) = 1\) and \(\psi _{i}(\beta ^{0}) = 0 \forall i \ge 1\). Since \(\epsilon _{t}\) is iid with tail (11) it is easy to show \(\epsilon _{t}(\beta )\) satisfies uniform power law property R3.b by exploiting convolution tail properties developed in Embrechts and Goldie (1980). Use (4) and (11) to deduce \(c_{\epsilon ,n} = K\left( n/k_{n}\right) ^{1/\kappa }\).
F2 follows from the stationary AR data generating process and distribution continuity. I1 holds since \(E[m_{n,t}^{*}(\gamma )] = 0\) by independence, symmetry, and symmetric trimming. R1 and R2 hold by construction; (11) and the stated error properties ensure R3; see Pham and Tran (1985) for R4.
Now P1–P3. OLS and LAD are \(n^{1/\kappa }\)-convergent if \(\kappa \in (1,2]\) (Davis et al. 1992); LTTS and GMTTM are \(n^{1/\kappa }/L(n)\) -convergent if \(\kappa \in (1,2]\) (Hill and Renault 2010; Hill 2011b)Footnote 10; and LWAD is \(n^{1/2}\)-convergent in all cases (Ling 2005). It remains to characterize \(V_{n}(\gamma )\). Each claim follows by application of Lemma 4.1. If \(\kappa > 2\) then \(V_{n}(\gamma ) \sim Kn\), so OLS, LTTS and GMTTM satisfy P2 [LAD and LWAD are not linear: see Davis et al. (1992)]. If \(\kappa \in (1,2)\) then \(V_{n}(\gamma ) \sim Kn\left( k_{n}/n\right) ^{2/\kappa -1} = o(n)\), while each \(\hat{\beta }_{n}\) satisfies \(\tilde{V}_{i,i,n}^{1/2}/n^{1/2} \rightarrow \infty \), hence P1 applies for any intermediate order \(\{k_{n}\}\). The case \(\kappa = 2\) is similar.
Finally, Lemma 4.1 can be shown to apply to \(V_{n}^{\bot }(\gamma )\) by exploiting the fact that \(\epsilon _{t}g_{i,t} = \epsilon _{t}y_{t-i}\) have the same tail index as \(\epsilon _{t}\) (Embrechts and Goldie 1980). The above arguments therefore extend to \(m_{n,t}^{\perp }(\beta ,\gamma )\) under P3. \(\mathcal{QED} \).
Proof of Lemma 4.3
The ARCH process \(\{y_{t}\}\) is stationary geometrically \(\beta \)-mixing (Carrasco and Chen 2002). In lieu of re-centering after trimming and error independence, all conditions except P1–P3 hold by the arguments used to prove Lemma 4.2.
Consider P1–P3. Note \(\epsilon _{t} = u_{t}^{2} - 1\) is iid, it has tail index \(\kappa _{u}/2 \in (1,2]\) if \(E[u_{t}^{4}] = \infty \), and \((\partial /\partial \beta )\epsilon _{t}(\beta )|_{\beta ^{0}} = -u_{t}^{2}x_{t}/h_{t}^{2}\) is integrable. Further \(S_{n}^{2}(\gamma ) = nE[m_{n,t}^{*2}(\gamma )]\) by independence and re-centering. Thus \( V_{n}(\gamma ) \sim Kn\) if \(E[u_{t}^{4}] < \infty \), and otherwise apply Lemma 4.1 to deduce \(V_{n}(\gamma ) \sim Kn\left( k_{n}/n\right) ^{4/\kappa _{u}-1}\) if \(\kappa _{u} < 4\), and \(V_{n}(\gamma ) \sim n/L(n)\) if \(\kappa _{u} = 4\).
GMTTM with QML-type equations and QMTTL have a scale \(||\tilde{V}_{n}|| \sim n/L(n)\) if \(E[u_{t}^{4}] = \infty \), hence P1, otherwise \(|| \tilde{V}_{n}|| \sim Kn\) hence P2 (Hill and Renault 2010; Hill 2011b). Log-LAD is \(n^{1/2}\)-convergent if \(E[u_{t}^{2}] < \infty \), hence P1 if \(\kappa _{u} \le 4\), and if \(\kappa _{u} > 4\) then it does not satisfy P2 since it is not linear. QML is \(n^{1/2}\)-convergent if \( E[u_{t}^{4}] < \infty \) hence P2, and if \(E[u_{t}^{4}] = \infty \) then the rate is \(n^{1-2/\kappa _{u}}/L(n)\) when \(\kappa _{u} \in (2,4]\) (Hall and Yao 2003, Theorem 2.1). But if \(\kappa _{u} < 4\) then \( n(k_{n}/n)^{4/\kappa _{u}-1} = k_{n}^{4/\kappa _{u}-1}n^{2-4/\kappa _{u}} > n^{2-4/\kappa _{u}}/L(n)\) for any slowly varying \(L(n) \rightarrow \infty \) and intermediate order \(\{k_{n}\}\) hence QML does not satisfy P1 or P2. Synonymous arguments extend to \(m_{n,t}^{\perp }(\gamma )\) under P3 by exploiting Lemma 4.1. \(\mathcal{QED} \).
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Hill, J. (2013). Heavy-Tail and Plug-In Robust Consistent Conditional Moment Tests of Functional Form. In: Chen, X., Swanson, N. (eds) Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1653-1_10
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1653-1_10
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1652-4
Online ISBN: 978-1-4614-1653-1
eBook Packages: Business and EconomicsEconomics and Finance (R0)