Abstract
Testing equality of two curves occurs often in functional data analysis. In this paper, we develop procedures for testing if two curves measured with either homoscedastic or heteroscedastic errors are equal. The method is applicable to a general class of curves. Compared with existing tests, ours does not require repeated measurements to obtain the variances at each of the explanatory values. Instead, our test calculates the overall variances by pooling all of the data points. The null distribution of the test statistic is derived and an approximation formula to calculate the p value is developed when the heteroscedastic variances are either known or unknown. Simulations are conducted to show that this procedure works well in the finite sample situation. Comparisons with other test procedures are made based on simulated data sets. Applications to our motivating example from an environmental study will be illustrated. An R package was created for ease of general applications.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
ATSDR: The nature and extent of lead poisoning in children in the united states: a report to congress. Technical report, Agency for Toxic Substances and Disease Registry, Atlanta: US Department of Health and Human Services, Public Health Service(1988)
Besse, P., Ramsay, J.O.: Principle components analysis of sampled functions. Psychometrika 51(2), 285–311 (1986). https://doi.org/10.1007/bf02293986
Chakravarti, I.M., Laha, R.G., Roy, J.: Handbook of Methods of Applied Statistics, Vol. I. John Wiley and Sons (1967)
Clive, R.: Loader: Bandwidth selection: classical or plug-in? Ann. Statist. 27(2), 415–438 (1999)
Cleveland, W., Devin, S.: Locally weighted regression: an approach to regression analysis by local fitting. J. Am. Stat. Assoc. 83(403), 596–610 (1988). https://doi.org/10.1080/01621459.1988.10478639
Dai, J., Sperlich, S.: Simple and effective boundary correction for kernel densities and regression with an application to the world income and Engel curve estimation. Comput. Stat. Data Anal. Elsevier 54(11), 2487–2497 (2010)
Fan, J., Lin, S.: Test of significance when data are curves. J. Am. Stat. Assoc. 93, 1007–1021 (1998). https://doi.org/10.1080/01621459.1998.10473763
Hotelling, H.: Tubes and spheres in n-spaces, and a class of statistical problems. Am. J. Math. 61, 440–460 (1939)
James, G., Hastie, T.: Functional linear discriminant analysis for irregularly sampled curves. J. R. Stat. Soc. Ser. B 63(3), 533–550 (2001). https://doi.org/10.1111/1467-9868.00297
James, W., Stein, C.: Estimation with quadratic loss. In: Proceedings of Fourth Berkeley Symposium on Mathematical Statistics and Probability Theory University of California Press, pp 361–380 (1961)
Johansen, S., Johnstone, I.: Hotelling’s theorem on the volume of tubes: some illustrations in simultaneous inference and data analysis. Ann. Stat. 18, 652–684 (1990)
Kitska, D.J.: Simultaneous inference for functional linear models. Ph.D. thesis, Case Western Reserve University (2005)
Knowles, M., Siegmund, D.: On hotelling’s approach to testing for a nonlinear parameter in regression. Int. Stat. Rev. 57(3), 205–220 (1989). https://doi.org/10.2307/1403794
Leurgans, S.E., Moyeed, R.A., Silverman, B.W.: Canonical correlation analysis when the data are curves. J. R. Stat. Soc. Ser. B 55(3), 725–740 (1993)
Loader, C.: Local Regression and Likelihood. Springer, New York (1999)
Naiman, D.Q.: Simultaneous confidence bounds in multiple regression using predictor variable constraints. J. Am. Stat. Assoc. 82, 214–219 (1987). https://doi.org/10.2307/2289156
Naiman, D.Q.: On volumes of tubular neighborhoods of spherical polyhedra and statistical inference. Ann. Stat. 18(2), 685–716 (1990)
Parzen, E.: An approach to time series analysis. Ann. Math. Stat. 32(4), 951–989 (1961)
Ramsay, J., Daizell, C.: Some tools for functional data analysis. J. R. Stat. Soc. Ser. B 53(3), 539–572 (1991). https://doi.org/10.2307/2345586
Robbins, N., Zhang, Z., Sun, J., Ketterer, M., Lalumandier, J., Shulze, R.: Childhood lead exposure and uptake in teeth in the Cleveland area during the era of leaded gasoline. Sci. Total Environ. 408(19), 4118–27 (2010). https://doi.org/10.1016/j.scitotenv.2010.04.060
Shapiro, S., Wilk, M.B.: An analysis of variance test for normality (complete samples). Biometrika 52(3 & 4), 591–611 (1965)
Sun, J., Loader, C.: Simultaneous confidence bands for linear regression and smoothing. Ann. Stat. 22, 1328–1345 (1994). https://doi.org/10.1214/aos/1176325631
Sun, J.: Tail probabilities of the maxima of gaussian random fields. Ann. Probab. 21(1), 34–71 (1993). https://doi.org/10.1214/aop/1176989393
Sun, J.: Multiple comparisons for a large number of parameters. Biom. J. 43, 627–643 (2001). https://doi.org/10.1002/1521-4036(200109)43:53.3.CO;2-6
Weyl, H.: On the volume of tubes. Am. J. Math. 61(2), 461–472 (1939)
Wang, J.-L., Chiou, J.-M., Mller1, H.-G.: Functional data analysis. Ann. Rev. Stat. Appl. 3, 257–295 (2016). https://doi.org/10.1146/annurev-statistics-041715-033624
Xintaras, C.: Impact of lead-contaminated soil on public health (1992). http://www.cdc.gov/search.do. (Technical report, Agency for Toxic Substances and Disease Registry)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix
Appendix A. Proof of Lemma 4.1
Let \(X_1=\sum _{i=1}^{n_1}W_i^2,\) where \(W_i\sim _{ iid} N(0,1).\) Since \(EX_1=n_1, Var(X_1)=2n_1, \) we can approximate \(X_1 \) by \(X_1 {\mathop {=}\limits ^{d}} n_1+\sqrt{2n_1} Z_1 + o(\sqrt{n_1}),\) \(\;\;X_2 \) by \( X_2{\mathop {=}\limits ^{d}} n_2+\sqrt{2n_2} Z_2 + o(\sqrt{n_2}), \) where \(Z_1\) and \(Z_2\) are independent standard normal random variables and notation \({\mathop {=}\limits ^{d}}\) denotes equality in distribution. Therefore,
as \(n_i \rightarrow \infty \) for \(i=1,2.\) QED
Appendix B. Proof of Lemma 4.2
Clearly by \(S_i/\nu _i \sim \chi ^2_{\nu _i}, \) we have \( ES_i=1, ~~Var(S_i)=2/\nu _i \), and there exists a sequence of i.i.d. standard normal random variables \(W_{i,k}, \) for \(k=1, 2, \dots , \nu _i, i=1,2\) such that \(S_i= (1/\nu _i) {\sum _{k=1}^{\nu _i} W_{i,k}} \) by definition. A large sample theory gives that \(S_i{\mathop {=}\limits ^{d}} 1+\sqrt{\frac{2}{\nu _i}} Z_i+o(\frac{1}{\sqrt{\nu _i}})\), where \(Z_i \sim N(0,1)\).
Expanding \(Y=f(S_1, S_2)= (X_1+X_2 )/(X_1/S_1+X_2/S_2)\) around (1, 1), we have:
It is easy to find that \(EY\rightarrow 1.\) Conditioning on \(X_1, X_2\), we have
Thus by comparing \(Var{(\chi ^2_\nu /\nu )}=2/\nu \) with \(Var(Y)=(2n_1^2\nu _2 +2n_2^2\nu _1)/((n_1+n_2)^2\nu _1\nu _2)\), we have estimate \(\nu \) in formula (4.3). QED
Appendix C. Proof of Theorem 5.1
Z(t) in formula (5.6) can be written as \(~~Z(t)=\left\langle u_1(t), \; \xi _1 \right\rangle \) \(- \left\langle u_2(t),\; \xi _2 \right\rangle \) \(=\left\langle \mathcal {M}(t),\; \xi \right\rangle , \) where \(\mathcal {M}(t)= \left( u_1(t), -u_2(t)\right) ' \in \mathcal {S}^{n-1}\) and \(\xi = \left( \xi _1, \xi _2\right) ' = \left( \varepsilon _1/\sigma , \varepsilon _2/\sigma \right) '\in \mathcal {R}^n\) is standard multivariate normal. Conditioning on \(\Vert \xi \Vert ,\) the probability can be written as
where g(y, n) is the probability density function (pdf) of the square root of a \(\chi ^2\) random variable with n degrees of freedom. Since \( U={\xi }/{\Vert \xi \Vert } \sim uniform (\mathcal {S}^{n-1})\) is independent of \({\Vert \xi \Vert }, \) we can drop the condition in the probability. Let \(T=\{x\in S^{n-1}:\; \sup _{t\in \mathcal {T}}\left| <\mathcal {M}(t),\; x>\right| \ge ( {t_0}/ {y})\}\), we then have tubes around curve \(\mathcal {M}(t) \) and curve \(-\mathcal {M}(t) \) embedded in \(S^{n-1}\), with radius \(r =\sqrt{2-2{t_0}/{y}}\) (See relation (4.4)). The probability inside the integral of (10.1) can be calculated by \( {Vol(T)} /{Vol(S^{n-1})}.\) We then plug-in the tube formula (4.5) to get result (5.9). See also [22] [ Proposition 1, p. 1330]. Result (5.10) is obtained by replacing g(y, n) of the pdf of \({\Vert \xi \Vert } =\sqrt{\varepsilon '\varepsilon / \sigma ^2}\) by the pdf of \( \Vert \hat{\xi }\Vert =\sqrt{\varepsilon '\varepsilon / \hat{\sigma }^2}, \) where \(\Vert \hat{\xi }\Vert ^2/n \sim F_{n, \nu }\). For details, see the above citation. QED
Appendix D. Proof of Theorem 5.2
Suppose \(\sigma _1^2\) and \(\sigma _2^2\) are known. Let \(\mathcal {M}(t)=\left( u_1(t), \;\; u_2(t)\right) ' \in \mathcal {S}^{n-1} \subseteq \mathcal {R}^n\) for \(n=n_1+n_2,\; t\in \mathcal {T}, \) as before. Let \(\xi =\left( \xi _1, \; \; \xi _2 \right) '\) and \(U= { \xi }/{ \Vert \xi \Vert }. \) Then U is uniformly distributed (over \(\mathcal {S}^{n-1}\)), independent of \(\Vert \xi \Vert \). Since \(\sigma _1\) and \(\sigma _2\) are known, \(\Vert \xi \Vert ^2\) will follow a \(\chi ^2_n\) distribution.
Conditioning on value \(\Vert \xi \Vert , \) making use of the fact that \(\varepsilon _1\) and \(\varepsilon _2\) are independent, we have
where \(f_{\Vert \xi \Vert }(y)\) denotes the pdf of \(\Vert \xi \Vert ,\) whose square follows a \(\chi ^2_n\) distribution. The last equation holds because of the fact that \(\xi /\Vert \xi \Vert \) and \(\Vert \xi \Vert \) are independent.
When \(t_0\) is large, the following tube formula [cf: Lemma 4.5] can be plugged into the last equation
where \(\beta _{\{1/2, (n-1)/2\}}\) denotes a Beta random variable with parameters 1/2 and (n–1)/2. The factor 2 corresponds to the two curves satisfying the probability condition: one is \({\mathcal {M}}(t)\), another \(-{\mathcal {M}}(t).\) This gives formula (5.17).
Suppose we don’t know both \(\sigma ^2_i,i=1,2, \;\) but they are estimated as in formula (5.5). Let
Then the requirements of Lemma 4.2 are satisfied, hence we have \(X\sim _{approx}\chi ^2_{\nu }/\nu , \) with degrees of freedom \(\nu \) estimated in formula (4.3). Therefore, \(Y= ({X_1+X_2 })/{X}\rightarrow (\chi ^2_n)/(\chi ^2_{\nu }/\nu )\sim nF_{n,\nu }. \)
Now let
Then
where \({Y}=:\Vert {\hat{\xi }} \Vert ^2\sim _{approx} \chi ^2_{\nu }/\nu , \) as Y was defined in (4.2).
Let the \(\sigma _i^2\) be known values in \({\hat{\mathcal {M}}(t) }\) in order to estimate \(\kappa _0, \) so that we can use the Tube formula (4.5) for the estimation of the probability inside the integral. After some calculation, we get result (5.18). QED
Appendix E. Software
Our R package curvetest is freely available at http://www.r-project.org/. This package tests the equality of curves as described in this paper. The main function curvetest has the following parameters:
- formula::
-
specified the regression formula.
- data1::
-
data.frame representing the first (discretized) curve.
- data2::
-
a data frame representing the second curve. If it is NULL, then the test is test \(f(t)==0.\) data1 and data2 must have two columns with same column names, that can be retrieved by calls on the formula.
- equal.var::
-
logic value, specifies if equal.variances are assumed. Default=TRUE.
- plotit::
-
logic, asks if curvetest should generate the scatter plots and smoothing curves. It is useful to plot it to select the window width bw below. Default=F.
- bw::
-
Window bandwidth for both curves.
- alpha::
-
Smoothing parameter. Default=0.5.
- nn::
-
number of points used to smooth the curves. The points are equally spaced between the domains that appeared in the two data set. Default=100.
- myx::
-
x (or t) values to estimate the curves. Default= NULL. This will put n points specified by nn in the data range. If myx is non-null, papameter nn will be suppressed.
- bcorrect::
-
method to use for boundary correction. Default=‘simple’. Other options are: ’none’=no corrections.
- Conf.level::
-
the \(\alpha \) value for the type I error level. Default=.05.
- kernel::
-
kernel function to choose for smoothing. Users can choose one of ‘Trio’, ‘Gaussian’, ‘Uniform’, ‘Triweight’, ‘Triangle’, ‘Epanechnikov’, ‘Quartic’. See the definitions of them in Table 2.
Usage:
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Z., Yang, Y., Sun, J. (2019). Novel Test for the Equality of Continuous Curves with Homoscedastic or Heteroscedastic Measurement Errors. In: Liu, R., Tsong, Y. (eds) Pharmaceutical Statistics. MBSW 2016. Springer Proceedings in Mathematics & Statistics, vol 218. Springer, Cham. https://doi.org/10.1007/978-3-319-67386-8_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-67386-8_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67385-1
Online ISBN: 978-3-319-67386-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)