Skip to main content

Novel Test for the Equality of Continuous Curves with Homoscedastic or Heteroscedastic Measurement Errors

  • Conference paper
  • First Online:
  • 908 Accesses

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 218))

Abstract

Testing equality of two curves occurs often in functional data analysis. In this paper, we develop procedures for testing if two curves measured with either homoscedastic or heteroscedastic errors are equal. The method is applicable to a general class of curves. Compared with existing tests, ours does not require repeated measurements to obtain the variances at each of the explanatory values. Instead, our test calculates the overall variances by pooling all of the data points. The null distribution of the test statistic is derived and an approximation formula to calculate the p value is developed when the heteroscedastic variances are either known or unknown. Simulations are conducted to show that this procedure works well in the finite sample situation. Comparisons with other test procedures are made based on simulated data sets. Applications to our motivating example from an environmental study will be illustrated. An R package was created for ease of general applications.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. ATSDR: The nature and extent of lead poisoning in children in the united states: a report to congress. Technical report, Agency for Toxic Substances and Disease Registry, Atlanta: US Department of Health and Human Services, Public Health Service(1988)

    Google Scholar 

  2. Besse, P., Ramsay, J.O.: Principle components analysis of sampled functions. Psychometrika 51(2), 285–311 (1986). https://doi.org/10.1007/bf02293986

    Article  MathSciNet  MATH  Google Scholar 

  3. Chakravarti, I.M., Laha, R.G., Roy, J.: Handbook of Methods of Applied Statistics, Vol. I. John Wiley and Sons (1967)

    Google Scholar 

  4. Clive, R.: Loader: Bandwidth selection: classical or plug-in? Ann. Statist. 27(2), 415–438 (1999)

    MathSciNet  MATH  Google Scholar 

  5. Cleveland, W., Devin, S.: Locally weighted regression: an approach to regression analysis by local fitting. J. Am. Stat. Assoc. 83(403), 596–610 (1988). https://doi.org/10.1080/01621459.1988.10478639

    Article  MATH  Google Scholar 

  6. Dai, J., Sperlich, S.: Simple and effective boundary correction for kernel densities and regression with an application to the world income and Engel curve estimation. Comput. Stat. Data Anal. Elsevier 54(11), 2487–2497 (2010)

    Article  MathSciNet  Google Scholar 

  7. Fan, J., Lin, S.: Test of significance when data are curves. J. Am. Stat. Assoc. 93, 1007–1021 (1998). https://doi.org/10.1080/01621459.1998.10473763

    Article  MathSciNet  MATH  Google Scholar 

  8. Hotelling, H.: Tubes and spheres in n-spaces, and a class of statistical problems. Am. J. Math. 61, 440–460 (1939)

    Article  MathSciNet  Google Scholar 

  9. James, G., Hastie, T.: Functional linear discriminant analysis for irregularly sampled curves. J. R. Stat. Soc. Ser. B 63(3), 533–550 (2001). https://doi.org/10.1111/1467-9868.00297

    Article  MathSciNet  MATH  Google Scholar 

  10. James, W., Stein, C.: Estimation with quadratic loss. In: Proceedings of Fourth Berkeley Symposium on Mathematical Statistics and Probability Theory University of California Press, pp 361–380 (1961)

    Google Scholar 

  11. Johansen, S., Johnstone, I.: Hotelling’s theorem on the volume of tubes: some illustrations in simultaneous inference and data analysis. Ann. Stat. 18, 652–684 (1990)

    Article  MathSciNet  Google Scholar 

  12. Kitska, D.J.: Simultaneous inference for functional linear models. Ph.D. thesis, Case Western Reserve University (2005)

    Google Scholar 

  13. Knowles, M., Siegmund, D.: On hotelling’s approach to testing for a nonlinear parameter in regression. Int. Stat. Rev. 57(3), 205–220 (1989). https://doi.org/10.2307/1403794

    Article  MATH  Google Scholar 

  14. Leurgans, S.E., Moyeed, R.A., Silverman, B.W.: Canonical correlation analysis when the data are curves. J. R. Stat. Soc. Ser. B 55(3), 725–740 (1993)

    MathSciNet  MATH  Google Scholar 

  15. Loader, C.: Local Regression and Likelihood. Springer, New York (1999)

    Google Scholar 

  16. Naiman, D.Q.: Simultaneous confidence bounds in multiple regression using predictor variable constraints. J. Am. Stat. Assoc. 82, 214–219 (1987). https://doi.org/10.2307/2289156

    Article  MathSciNet  MATH  Google Scholar 

  17. Naiman, D.Q.: On volumes of tubular neighborhoods of spherical polyhedra and statistical inference. Ann. Stat. 18(2), 685–716 (1990)

    Article  MathSciNet  Google Scholar 

  18. Parzen, E.: An approach to time series analysis. Ann. Math. Stat. 32(4), 951–989 (1961)

    Article  Google Scholar 

  19. Ramsay, J., Daizell, C.: Some tools for functional data analysis. J. R. Stat. Soc. Ser. B 53(3), 539–572 (1991). https://doi.org/10.2307/2345586

    Article  MathSciNet  Google Scholar 

  20. Robbins, N., Zhang, Z., Sun, J., Ketterer, M., Lalumandier, J., Shulze, R.: Childhood lead exposure and uptake in teeth in the Cleveland area during the era of leaded gasoline. Sci. Total Environ. 408(19), 4118–27 (2010). https://doi.org/10.1016/j.scitotenv.2010.04.060

    Article  Google Scholar 

  21. Shapiro, S., Wilk, M.B.: An analysis of variance test for normality (complete samples). Biometrika 52(3 & 4), 591–611 (1965)

    Article  MathSciNet  Google Scholar 

  22. Sun, J., Loader, C.: Simultaneous confidence bands for linear regression and smoothing. Ann. Stat. 22, 1328–1345 (1994). https://doi.org/10.1214/aos/1176325631

    Article  MathSciNet  MATH  Google Scholar 

  23. Sun, J.: Tail probabilities of the maxima of gaussian random fields. Ann. Probab. 21(1), 34–71 (1993). https://doi.org/10.1214/aop/1176989393

    Article  MathSciNet  MATH  Google Scholar 

  24. Sun, J.: Multiple comparisons for a large number of parameters. Biom. J. 43, 627–643 (2001). https://doi.org/10.1002/1521-4036(200109)43:53.3.CO;2-6

    Article  MathSciNet  MATH  Google Scholar 

  25. Weyl, H.: On the volume of tubes. Am. J. Math. 61(2), 461–472 (1939)

    Article  MathSciNet  Google Scholar 

  26. Wang, J.-L., Chiou, J.-M., Mller1, H.-G.: Functional data analysis. Ann. Rev. Stat. Appl. 3, 257–295 (2016). https://doi.org/10.1146/annurev-statistics-041715-033624

    Article  Google Scholar 

  27. Xintaras, C.: Impact of lead-contaminated soil on public health (1992). http://www.cdc.gov/search.do. (Technical report, Agency for Toxic Substances and Disease Registry)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhongfa Zhang .

Editor information

Editors and Affiliations

Appendices

Appendix

Appendix A. Proof of Lemma 4.1

Let \(X_1=\sum _{i=1}^{n_1}W_i^2,\) where \(W_i\sim _{ iid} N(0,1).\) Since \(EX_1=n_1, Var(X_1)=2n_1, \) we can approximate \(X_1 \) by \(X_1 {\mathop {=}\limits ^{d}} n_1+\sqrt{2n_1} Z_1 + o(\sqrt{n_1}),\) \(\;\;X_2 \) by \( X_2{\mathop {=}\limits ^{d}} n_2+\sqrt{2n_2} Z_2 + o(\sqrt{n_2}), \) where \(Z_1\) and \(Z_2\) are independent standard normal random variables and notation \({\mathop {=}\limits ^{d}}\) denotes equality in distribution. Therefore,

$$\begin{aligned} G_i=&( {\frac{X_i }{X_1+X_2 }} )/ ({ \frac{n_i}{n_1+n_2}}) \\ {\mathop {=}\limits ^{d}}&\frac{n_i+\sqrt{2n_i} Z_i + o(\sqrt{n_i})}{n_1+n_2 +\sqrt{2n_1} Z_1 +\sqrt{2n_2} Z_2 + o(\sqrt{n_2})+ o(\sqrt{n_1})}/({ \frac{n_i}{n_1+n_2}}) \\ =&\frac{ 1+\frac{\sqrt{2n_i} Z_i + o(\sqrt{n_i})}{n_i}}{1 +\frac{\sqrt{2n_1} Z_1 +\sqrt{2n_2} Z_2 + o(\sqrt{n_2})+ o(\sqrt{n_1})}{n_1+n_2}} {\mathop {\sim }\limits ^{d}} _{approx} 1, \end{aligned}$$

as \(n_i \rightarrow \infty \) for \(i=1,2.\) QED

Appendix B. Proof of Lemma 4.2

Clearly by \(S_i/\nu _i \sim \chi ^2_{\nu _i}, \) we have \( ES_i=1, ~~Var(S_i)=2/\nu _i \), and there exists a sequence of i.i.d. standard normal random variables \(W_{i,k}, \) for \(k=1, 2, \dots , \nu _i, i=1,2\) such that \(S_i= (1/\nu _i) {\sum _{k=1}^{\nu _i} W_{i,k}} \) by definition. A large sample theory gives that \(S_i{\mathop {=}\limits ^{d}} 1+\sqrt{\frac{2}{\nu _i}} Z_i+o(\frac{1}{\sqrt{\nu _i}})\), where \(Z_i \sim N(0,1)\).

Expanding \(Y=f(S_1, S_2)= (X_1+X_2 )/(X_1/S_1+X_2/S_2)\) around (1, 1),  we have:

$$\begin{aligned} Y=f(S_1, S_2)=&1+\sqrt{ \frac{2}{\nu _1} }Z_1 \frac{ X_1}{X_1+X_2 }+ \sqrt{\frac{2}{\nu _2} }Z_2 \frac{ X_2}{X_1+X_2 }+ o(\frac{1}{\sqrt{\nu _1}}+\frac{1}{\sqrt{\nu _2}}). \end{aligned}$$

It is easy to find that \(EY\rightarrow 1.\) Conditioning on \(X_1, X_2\), we have

$$\begin{aligned} Var(Y) =&Var(E(Y|X_1, X_2)) +E (Var(Y|X_1, X_2)) \\ =&E (Var(Y|X_1, X_2)) +o(\frac{1}{\sqrt{\nu _1}}+\frac{1}{\sqrt{\nu _2}})\\ \approx&E \left\{ Var( \sqrt{\frac{2}{\nu _1}}Z_1 \frac{ X_1}{X_1+X_2 } + \sqrt{\frac{2}{\nu _2}}Z_2 \frac{ X_2}{X_1+X_2 } |X_1, X_2)\right\} \\ =&E \left\{ \frac{2}{\nu _1} (\frac{ X_1}{X_1+X_2 })^2 + \frac{2}{\nu _2} (\frac{ X_2}{X_1+X_2 })^2 \right\} \\ =&\frac{2}{\nu _1} E (\frac{ X_1}{X_1+X_2 })^2 + \frac{2}{\nu _2} E(\frac{ X_2}{X_1+X_2 })^2 \\ \approx&\frac{2}{\nu _1} (\frac{n_1}{n_1+n_2})^2 + \frac{2}{\nu _2} (\frac{ n_2}{ n_1+n_2})^2 \qquad \qquad \qquad {\text {(by Lemma (4.1))}} \\ =&\frac{2n_1^2\nu _2 +2n_2^2\nu _1}{(n_1+n_2)^2\nu _1\nu _2}. \end{aligned}$$

Thus by comparing \(Var{(\chi ^2_\nu /\nu )}=2/\nu \) with \(Var(Y)=(2n_1^2\nu _2 +2n_2^2\nu _1)/((n_1+n_2)^2\nu _1\nu _2)\), we have estimate \(\nu \) in formula (4.3). QED

Appendix C. Proof of Theorem 5.1

Z(t) in formula (5.6) can be written as \(~~Z(t)=\left\langle u_1(t), \; \xi _1 \right\rangle \) \(- \left\langle u_2(t),\; \xi _2 \right\rangle \) \(=\left\langle \mathcal {M}(t),\; \xi \right\rangle , \) where \(\mathcal {M}(t)= \left( u_1(t), -u_2(t)\right) ' \in \mathcal {S}^{n-1}\) and \(\xi = \left( \xi _1, \xi _2\right) ' = \left( \varepsilon _1/\sigma , \varepsilon _2/\sigma \right) '\in \mathcal {R}^n\) is standard multivariate normal. Conditioning on \(\Vert \xi \Vert ,\) the probability can be written as

$$\begin{aligned} Pr(T\ge t_0) =&Pr(\sup _{t\in \mathcal {T}} |\left\langle \mathcal {M}(t),\;\xi \right\rangle |\ge t_0) \nonumber \\ =&\int _{t_0}^{\infty } Pr(\sup _{t\in \mathcal {T}}\left| \left\langle \mathcal {M}(t),\; \frac{\xi }{\Vert \xi \Vert }\right\rangle \right| \ge \frac{t_0}{y}\; \;|\;\; \Vert \xi \Vert =y)g(y, n)dy \end{aligned}$$
(10.1)

where g(yn) is the probability density function (pdf) of the square root of a \(\chi ^2\) random variable with n degrees of freedom. Since \( U={\xi }/{\Vert \xi \Vert } \sim uniform (\mathcal {S}^{n-1})\) is independent of \({\Vert \xi \Vert }, \) we can drop the condition in the probability. Let \(T=\{x\in S^{n-1}:\; \sup _{t\in \mathcal {T}}\left| <\mathcal {M}(t),\; x>\right| \ge ( {t_0}/ {y})\}\), we then have tubes around curve \(\mathcal {M}(t) \) and curve \(-\mathcal {M}(t) \) embedded in \(S^{n-1}\), with radius \(r =\sqrt{2-2{t_0}/{y}}\) (See relation (4.4)). The probability inside the integral of (10.1) can be calculated by \( {Vol(T)} /{Vol(S^{n-1})}.\) We then plug-in the tube formula (4.5) to get result (5.9). See also [22] [ Proposition 1, p. 1330]. Result (5.10) is obtained by replacing g(yn) of the pdf of \({\Vert \xi \Vert } =\sqrt{\varepsilon '\varepsilon / \sigma ^2}\) by the pdf of \( \Vert \hat{\xi }\Vert =\sqrt{\varepsilon '\varepsilon / \hat{\sigma }^2}, \) where \(\Vert \hat{\xi }\Vert ^2/n \sim F_{n, \nu }\). For details, see the above citation. QED

Appendix D. Proof of Theorem 5.2

Suppose \(\sigma _1^2\) and \(\sigma _2^2\) are known. Let \(\mathcal {M}(t)=\left( u_1(t), \;\; u_2(t)\right) ' \in \mathcal {S}^{n-1} \subseteq \mathcal {R}^n\) for \(n=n_1+n_2,\; t\in \mathcal {T}, \) as before. Let \(\xi =\left( \xi _1, \; \; \xi _2 \right) '\) and \(U= { \xi }/{ \Vert \xi \Vert }. \) Then U is uniformly distributed (over \(\mathcal {S}^{n-1}\)), independent of \(\Vert \xi \Vert \). Since \(\sigma _1\) and \(\sigma _2\) are known, \(\Vert \xi \Vert ^2\) will follow a \(\chi ^2_n\) distribution.

Conditioning on value \(\Vert \xi \Vert , \) making use of the fact that \(\varepsilon _1\) and \(\varepsilon _2\) are independent, we have

$$\begin{aligned} Pr(T>t_0)=&Pr(\sup _{t\in {\mathcal {T}}}\Vert Z(t)\Vert \ge t_0 ) =Pr(\sup _{t\in {\mathcal {T}}}\langle {\mathcal {M}}(t), \xi \rangle \;\; \ge t_0)\\ =&Pr(\sup _{t\in {\mathcal {T}}}\langle {\mathcal {M}}(t), \frac{\xi }{ \Vert \xi \Vert } \rangle \ge \frac{t_0 }{ \Vert \xi \Vert })\\ =&\int _{y\ge t_0}Pr(\sup _{t\in {\mathcal {T}}}\langle {\mathcal {M}}(t), U \rangle \ge \frac{t_0 }{y} \mid \Vert \xi \Vert =y) f_{\Vert \xi \Vert }(y)dy\\ =&\int _{y\ge t_0}Pr(\sup _{t\in {\mathcal {T}}}\langle {\mathcal {M}}(t), U \rangle \ge \frac{t_0 }{ y}) f_{\Vert \xi \Vert }(y)dy, \end{aligned}$$

where \(f_{\Vert \xi \Vert }(y)\) denotes the pdf of \(\Vert \xi \Vert ,\) whose square follows a \(\chi ^2_n\) distribution. The last equation holds because of the fact that \(\xi /\Vert \xi \Vert \) and \(\Vert \xi \Vert \) are independent.

When \(t_0\) is large, the following tube formula [cf: Lemma 4.5] can be plugged into the last equation

$$Pr(\sup _{t\in {\mathcal {T}}}\Vert \langle {\mathcal {M}}(t), U \rangle \Vert \ge c)\approx 2*\left\{ \frac{\kappa _0}{2\pi } (1-c^2)^{n/2-1}+\frac{E}{4} Pr(\beta _{\{1/2, (n-1)/2\}}\ge c^2\right\} $$

where \(\beta _{\{1/2, (n-1)/2\}}\) denotes a Beta random variable with parameters 1/2 and (n–1)/2. The factor 2 corresponds to the two curves satisfying the probability condition: one is \({\mathcal {M}}(t)\), another \(-{\mathcal {M}}(t).\) This gives formula (5.17).

Suppose we don’t know both \(\sigma ^2_i,i=1,2, \;\) but they are estimated as in formula (5.5). Let

$$\begin{aligned}&X_i= \frac{\varepsilon '_i\varepsilon _i }{\sigma _i^2},&S_i= \frac{\hat{\sigma }^2_i }{\sigma ^2_i},\; \;\;\;\;\; i=1,2, \end{aligned}$$
(10.2)
$$\begin{aligned}&Y= \frac{X_1}{S_1} +\frac{X_2}{S_2},&X=\frac{X_1+X_2 }{ X_1/S_1+X_2/S_2 }=\frac{X_1+X_2 }{Y}. \end{aligned}$$
(10.3)

Then the requirements of Lemma 4.2 are satisfied, hence we have \(X\sim _{approx}\chi ^2_{\nu }/\nu , \) with degrees of freedom \(\nu \) estimated in formula (4.3). Therefore, \(Y= ({X_1+X_2 })/{X}\rightarrow (\chi ^2_n)/(\chi ^2_{\nu }/\nu )\sim nF_{n,\nu }. \)

Now let

$$\begin{aligned}&{\hat{\mathcal {M}}(t) }=\frac{(\hat{\sigma }_{1}{\mathbf {l}}_1(t) , -\hat{\sigma }_{2}{\mathbf {l}}_2(t) )}{\sqrt{\hat{\sigma }_{1}^2 \left\| {\mathbf {l}}_1(t) \right\| ^2+ \hat{\sigma }_{2}^2 \left\| {\mathbf {l}}_2(t) \right\| ^2}} \in \mathcal {S}^{n-1},&U=\frac{\hat{\xi }}{ \Vert {\hat{\xi }} \Vert }, \;\;\;\;\; \\&~~~~~~\hat{\xi }= (\frac{\varepsilon _1}{\hat{\sigma }_{1}}, \frac{\varepsilon _2}{\hat{\sigma }_{2}})\in \mathcal {R}^n,&\hat{Z}(t)=<{\hat{\mathcal {M}}(t) }, \hat{\xi }>. \end{aligned}$$

Then

$$\begin{aligned} Pr(T>t_0)=&Pr(\sup _{t\in {\mathcal {T}}}\Vert \hat{Z}(t)\Vert \ge t_0 ) =Pr(\sup _{t\in {\mathcal {T}}}\langle {\hat{\mathcal {M}}(t) },\hat{\xi }\rangle \;\; \ge t_0)\\ =&Pr(\sup _{t\in {\mathcal {T}}}\langle {\hat{\mathcal {M}}(t) }, \frac{\hat{\xi }}{\Vert {\hat{\xi }} \Vert } \rangle \ge \frac{t_0 }{\Vert {\hat{\xi }} \Vert })\\ =&\int _{y\ge t_0}Pr(\sup _{t\in {\mathcal {T}}}\langle \hat{\mathcal {M}}(t), U \rangle \ge \frac{t_0 }{y} \mid \Vert {\hat{\xi }} \Vert =y) f_{\Vert {\hat{\xi }} \Vert }(y)dy\\ =&\int _{y\ge t_0}Pr(\sup _{t\in {\mathcal {T}}}\langle \hat{\mathcal {M}}(t), U \rangle \ge \frac{t_0 }{ y}) f_{\Vert {\hat{\xi }} \Vert }(y)dy, \end{aligned}$$

where \({Y}=:\Vert {\hat{\xi }} \Vert ^2\sim _{approx} \chi ^2_{\nu }/\nu , \) as Y was defined in (4.2).

Let the \(\sigma _i^2\) be known values in \({\hat{\mathcal {M}}(t) }\) in order to estimate \(\kappa _0, \) so that we can use the Tube formula (4.5) for the estimation of the probability inside the integral. After some calculation, we get result (5.18). QED

Appendix E. Software

Our R package curvetest is freely available at http://www.r-project.org/. This package tests the equality of curves as described in this paper. The main function curvetest has the following parameters:  

formula::

specified the regression formula.

data1::

data.frame representing the first (discretized) curve.

data2::

a data frame representing the second curve. If it is NULL, then the test is test \(f(t)==0.\) data1 and data2 must have two columns with same column names, that can be retrieved by calls on the formula.

equal.var::

logic value, specifies if equal.variances are assumed. Default=TRUE.

plotit::

logic, asks if curvetest should generate the scatter plots and smoothing curves. It is useful to plot it to select the window width bw below. Default=F.

bw::

Window bandwidth for both curves.

alpha::

Smoothing parameter. Default=0.5.

nn::

number of points used to smooth the curves. The points are equally spaced between the domains that appeared in the two data set. Default=100.

myx::

x (or t) values to estimate the curves. Default= NULL. This will put n points specified by nn in the data range. If myx is non-null, papameter nn will be suppressed.

bcorrect::

method to use for boundary correction. Default=‘simple’. Other options are: ’none’=no corrections.

Conf.level::

the \(\alpha \) value for the type I error level. Default=.05.

kernel::

kernel function to choose for smoothing. Users can choose one of ‘Trio’, ‘Gaussian’, ‘Uniform’, ‘Triweight’, ‘Triangle’, ‘Epanechnikov’, ‘Quartic’. See the definitions of them in Table 2.

 

Table 2 List of kernel functions

Usage:

figure a

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Z., Yang, Y., Sun, J. (2019). Novel Test for the Equality of Continuous Curves with Homoscedastic or Heteroscedastic Measurement Errors. In: Liu, R., Tsong, Y. (eds) Pharmaceutical Statistics. MBSW 2016. Springer Proceedings in Mathematics & Statistics, vol 218. Springer, Cham. https://doi.org/10.1007/978-3-319-67386-8_20

Download citation

Publish with us

Policies and ethics