Novel Test for the Equality of Continuous Curves with Homoscedastic or Heteroscedastic Measurement Errors

Zhang, Zhongfa; Yang, Yarong; Sun, Jiayang

doi:10.1007/978-3-319-67386-8_20

Novel Test for the Equality of Continuous Curves with Homoscedastic or Heteroscedastic Measurement Errors

Zhongfa Zhang³,
Yarong Yang⁴ &
Jiayang Sun³

Conference paper
First Online: 13 June 2019

908 Accesses

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 218))

Abstract

Testing equality of two curves occurs often in functional data analysis. In this paper, we develop procedures for testing if two curves measured with either homoscedastic or heteroscedastic errors are equal. The method is applicable to a general class of curves. Compared with existing tests, ours does not require repeated measurements to obtain the variances at each of the explanatory values. Instead, our test calculates the overall variances by pooling all of the data points. The null distribution of the test statistic is derived and an approximation formula to calculate the p value is developed when the heteroscedastic variances are either known or unknown. Simulations are conducted to show that this procedure works well in the finite sample situation. Comparisons with other test procedures are made based on simulated data sets. Applications to our motivating example from an environmental study will be illustrated. An R package was created for ease of general applications.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

ATSDR: The nature and extent of lead poisoning in children in the united states: a report to congress. Technical report, Agency for Toxic Substances and Disease Registry, Atlanta: US Department of Health and Human Services, Public Health Service(1988)
Google Scholar
Besse, P., Ramsay, J.O.: Principle components analysis of sampled functions. Psychometrika 51(2), 285–311 (1986). https://doi.org/10.1007/bf02293986
Article MathSciNet MATH Google Scholar
Chakravarti, I.M., Laha, R.G., Roy, J.: Handbook of Methods of Applied Statistics, Vol. I. John Wiley and Sons (1967)
Google Scholar
Clive, R.: Loader: Bandwidth selection: classical or plug-in? Ann. Statist. 27(2), 415–438 (1999)
MathSciNet MATH Google Scholar
Cleveland, W., Devin, S.: Locally weighted regression: an approach to regression analysis by local fitting. J. Am. Stat. Assoc. 83(403), 596–610 (1988). https://doi.org/10.1080/01621459.1988.10478639
Article MATH Google Scholar
Dai, J., Sperlich, S.: Simple and effective boundary correction for kernel densities and regression with an application to the world income and Engel curve estimation. Comput. Stat. Data Anal. Elsevier 54(11), 2487–2497 (2010)
Article MathSciNet Google Scholar
Fan, J., Lin, S.: Test of significance when data are curves. J. Am. Stat. Assoc. 93, 1007–1021 (1998). https://doi.org/10.1080/01621459.1998.10473763
Article MathSciNet MATH Google Scholar
Hotelling, H.: Tubes and spheres in n-spaces, and a class of statistical problems. Am. J. Math. 61, 440–460 (1939)
Article MathSciNet Google Scholar
James, G., Hastie, T.: Functional linear discriminant analysis for irregularly sampled curves. J. R. Stat. Soc. Ser. B 63(3), 533–550 (2001). https://doi.org/10.1111/1467-9868.00297
Article MathSciNet MATH Google Scholar
James, W., Stein, C.: Estimation with quadratic loss. In: Proceedings of Fourth Berkeley Symposium on Mathematical Statistics and Probability Theory University of California Press, pp 361–380 (1961)
Google Scholar
Johansen, S., Johnstone, I.: Hotelling’s theorem on the volume of tubes: some illustrations in simultaneous inference and data analysis. Ann. Stat. 18, 652–684 (1990)
Article MathSciNet Google Scholar
Kitska, D.J.: Simultaneous inference for functional linear models. Ph.D. thesis, Case Western Reserve University (2005)
Google Scholar
Knowles, M., Siegmund, D.: On hotelling’s approach to testing for a nonlinear parameter in regression. Int. Stat. Rev. 57(3), 205–220 (1989). https://doi.org/10.2307/1403794
Article MATH Google Scholar
Leurgans, S.E., Moyeed, R.A., Silverman, B.W.: Canonical correlation analysis when the data are curves. J. R. Stat. Soc. Ser. B 55(3), 725–740 (1993)
MathSciNet MATH Google Scholar
Loader, C.: Local Regression and Likelihood. Springer, New York (1999)
Google Scholar
Naiman, D.Q.: Simultaneous confidence bounds in multiple regression using predictor variable constraints. J. Am. Stat. Assoc. 82, 214–219 (1987). https://doi.org/10.2307/2289156
Article MathSciNet MATH Google Scholar
Naiman, D.Q.: On volumes of tubular neighborhoods of spherical polyhedra and statistical inference. Ann. Stat. 18(2), 685–716 (1990)
Article MathSciNet Google Scholar
Parzen, E.: An approach to time series analysis. Ann. Math. Stat. 32(4), 951–989 (1961)
Article Google Scholar
Ramsay, J., Daizell, C.: Some tools for functional data analysis. J. R. Stat. Soc. Ser. B 53(3), 539–572 (1991). https://doi.org/10.2307/2345586
Article MathSciNet Google Scholar
Robbins, N., Zhang, Z., Sun, J., Ketterer, M., Lalumandier, J., Shulze, R.: Childhood lead exposure and uptake in teeth in the Cleveland area during the era of leaded gasoline. Sci. Total Environ. 408(19), 4118–27 (2010). https://doi.org/10.1016/j.scitotenv.2010.04.060
Article Google Scholar
Shapiro, S., Wilk, M.B.: An analysis of variance test for normality (complete samples). Biometrika 52(3 & 4), 591–611 (1965)
Article MathSciNet Google Scholar
Sun, J., Loader, C.: Simultaneous confidence bands for linear regression and smoothing. Ann. Stat. 22, 1328–1345 (1994). https://doi.org/10.1214/aos/1176325631
Article MathSciNet MATH Google Scholar
Sun, J.: Tail probabilities of the maxima of gaussian random fields. Ann. Probab. 21(1), 34–71 (1993). https://doi.org/10.1214/aop/1176989393
Article MathSciNet MATH Google Scholar
Sun, J.: Multiple comparisons for a large number of parameters. Biom. J. 43, 627–643 (2001). https://doi.org/10.1002/1521-4036(200109)43:53.3.CO;2-6
Article MathSciNet MATH Google Scholar
Weyl, H.: On the volume of tubes. Am. J. Math. 61(2), 461–472 (1939)
Article MathSciNet Google Scholar
Wang, J.-L., Chiou, J.-M., Mller1, H.-G.: Functional data analysis. Ann. Rev. Stat. Appl. 3, 257–295 (2016). https://doi.org/10.1146/annurev-statistics-041715-033624
Article Google Scholar
Xintaras, C.: Impact of lead-contaminated soil on public health (1992). http://www.cdc.gov/search.do. (Technical report, Agency for Toxic Substances and Disease Registry)

Download references

Author information

Authors and Affiliations

Department of Statistics, Case Western Reserve University, Cleveland, OH, 44106, USA
Zhongfa Zhang & Jiayang Sun
Department of Statistics, North Dakota State University, Fargo, ND, 58104, USA
Yarong Yang

Authors

Zhongfa Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yarong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jiayang Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongfa Zhang .

Editor information

Editors and Affiliations

Statistical Innovation and Consultation group, Takeda Pharmaceuticals, Cambridge, MA, USA
Ray Liu
Division of Biometrics VI, CDER, U.S. Food and Drug Administration , Silver Spring, MD, USA
Yi Tsong

Appendices

Appendix

Appendix A. Proof of Lemma 4.1

Let $X_1=\sum _{i=1}^{n_1}W_i^2,$ where $W_i\sim _{ iid} N(0,1).$ Since $EX_1=n_1, Var(X_1)=2n_1, $ we can approximate $X_1 $ by $X_1 {\mathop {=}\limits ^{d}} n_1+\sqrt{2n_1} Z_1 + o(\sqrt{n_1}),$ $\;\;X_2 $ by $ X_2{\mathop {=}\limits ^{d}} n_2+\sqrt{2n_2} Z_2 + o(\sqrt{n_2}), $ where $Z_1$ and $Z_2$ are independent standard normal random variables and notation ${\mathop {=}\limits ^{d}}$ denotes equality in distribution. Therefore,

$$\begin{aligned} G_i=&( {\frac{X_i }{X_1+X_2 }} )/ ({ \frac{n_i}{n_1+n_2}}) \\ {\mathop {=}\limits ^{d}}&\frac{n_i+\sqrt{2n_i} Z_i + o(\sqrt{n_i})}{n_1+n_2 +\sqrt{2n_1} Z_1 +\sqrt{2n_2} Z_2 + o(\sqrt{n_2})+ o(\sqrt{n_1})}/({ \frac{n_i}{n_1+n_2}}) \\ =&\frac{ 1+\frac{\sqrt{2n_i} Z_i + o(\sqrt{n_i})}{n_i}}{1 +\frac{\sqrt{2n_1} Z_1 +\sqrt{2n_2} Z_2 + o(\sqrt{n_2})+ o(\sqrt{n_1})}{n_1+n_2}} {\mathop {\sim }\limits ^{d}} _{approx} 1, \end{aligned}$$

as $n_i \rightarrow \infty $ for $i=1,2.$ QED

Appendix B. Proof of Lemma 4.2

Clearly by $S_i/\nu _i \sim \chi ^2_{\nu _i}, $ we have $ ES_i=1, ~~Var(S_i)=2/\nu _i $, and there exists a sequence of i.i.d. standard normal random variables $W_{i,k}, $ for $k=1, 2, \dots , \nu _i, i=1,2$ such that $S_i= (1/\nu _i) {\sum _{k=1}^{\nu _i} W_{i,k}} $ by definition. A large sample theory gives that $S_i{\mathop {=}\limits ^{d}} 1+\sqrt{\frac{2}{\nu _i}} Z_i+o(\frac{1}{\sqrt{\nu _i}})$, where $Z_i \sim N(0,1)$.

Expanding $Y=f(S_1, S_2)= (X_1+X_2 )/(X_1/S_1+X_2/S_2)$ around (1, 1), we have:

$$\begin{aligned} Y=f(S_1, S_2)=&1+\sqrt{ \frac{2}{\nu _1} }Z_1 \frac{ X_1}{X_1+X_2 }+ \sqrt{\frac{2}{\nu _2} }Z_2 \frac{ X_2}{X_1+X_2 }+ o(\frac{1}{\sqrt{\nu _1}}+\frac{1}{\sqrt{\nu _2}}). \end{aligned}$$

It is easy to find that $EY\rightarrow 1.$ Conditioning on $X_1, X_2$, we have

$$\begin{aligned} Var(Y) =&Var(E(Y|X_1, X_2)) +E (Var(Y|X_1, X_2)) \\ =&E (Var(Y|X_1, X_2)) +o(\frac{1}{\sqrt{\nu _1}}+\frac{1}{\sqrt{\nu _2}})\\ \approx&E \left\{ Var( \sqrt{\frac{2}{\nu _1}}Z_1 \frac{ X_1}{X_1+X_2 } + \sqrt{\frac{2}{\nu _2}}Z_2 \frac{ X_2}{X_1+X_2 } |X_1, X_2)\right\} \\ =&E \left\{ \frac{2}{\nu _1} (\frac{ X_1}{X_1+X_2 })^2 + \frac{2}{\nu _2} (\frac{ X_2}{X_1+X_2 })^2 \right\} \\ =&\frac{2}{\nu _1} E (\frac{ X_1}{X_1+X_2 })^2 + \frac{2}{\nu _2} E(\frac{ X_2}{X_1+X_2 })^2 \\ \approx&\frac{2}{\nu _1} (\frac{n_1}{n_1+n_2})^2 + \frac{2}{\nu _2} (\frac{ n_2}{ n_1+n_2})^2 \qquad \qquad \qquad {\text {(by Lemma (4.1))}} \\ =&\frac{2n_1^2\nu _2 +2n_2^2\nu _1}{(n_1+n_2)^2\nu _1\nu _2}. \end{aligned}$$

Thus by comparing $Var{(\chi ^2_\nu /\nu )}=2/\nu $ with $Var(Y)=(2n_1^2\nu _2 +2n_2^2\nu _1)/((n_1+n_2)^2\nu _1\nu _2)$, we have estimate $\nu $ in formula (4.3). QED

Appendix C. Proof of Theorem 5.1

Z(t) in formula (5.6) can be written as $~~Z(t)=\left\langle u_1(t), \; \xi _1 \right\rangle $ $- \left\langle u_2(t),\; \xi _2 \right\rangle $ $=\left\langle \mathcal {M}(t),\; \xi \right\rangle , $ where $\mathcal {M}(t)= \left( u_1(t), -u_2(t)\right) ' \in \mathcal {S}^{n-1}$ and $\xi = \left( \xi _1, \xi _2\right) ' = \left( \varepsilon _1/\sigma , \varepsilon _2/\sigma \right) '\in \mathcal {R}^n$ is standard multivariate normal. Conditioning on $\Vert \xi \Vert ,$ the probability can be written as

$$\begin{aligned} Pr(T\ge t_0) =&Pr(\sup _{t\in \mathcal {T}} |\left\langle \mathcal {M}(t),\;\xi \right\rangle |\ge t_0) \nonumber \\ =&\int _{t_0}^{\infty } Pr(\sup _{t\in \mathcal {T}}\left| \left\langle \mathcal {M}(t),\; \frac{\xi }{\Vert \xi \Vert }\right\rangle \right| \ge \frac{t_0}{y}\; \;|\;\; \Vert \xi \Vert =y)g(y, n)dy \end{aligned}$$

(10.1)

where g(y, n) is the probability density function (pdf) of the square root of a $\chi ^2$ random variable with n degrees of freedom. Since $ U={\xi }/{\Vert \xi \Vert } \sim uniform (\mathcal {S}^{n-1})$ is independent of ${\Vert \xi \Vert }, $ we can drop the condition in the probability. Let $T=\{x\in S^{n-1}:\; \sup _{t\in \mathcal {T}}\left| <\mathcal {M}(t),\; x>\right| \ge ( {t_0}/ {y})\}$, we then have tubes around curve $\mathcal {M}(t) $ and curve $-\mathcal {M}(t) $ embedded in $S^{n-1}$, with radius $r =\sqrt{2-2{t_0}/{y}}$ (See relation (4.4)). The probability inside the integral of (10.1) can be calculated by $ {Vol(T)} /{Vol(S^{n-1})}.$ We then plug-in the tube formula (4.5) to get result (5.9). See also [22] [ Proposition 1, p. 1330]. Result (5.10) is obtained by replacing g(y, n) of the pdf of ${\Vert \xi \Vert } =\sqrt{\varepsilon '\varepsilon / \sigma ^2}$ by the pdf of $ \Vert \hat{\xi }\Vert =\sqrt{\varepsilon '\varepsilon / \hat{\sigma }^2}, $ where $\Vert \hat{\xi }\Vert ^2/n \sim F_{n, \nu }$. For details, see the above citation. QED

Appendix D. Proof of Theorem 5.2

Suppose $\sigma _1^2$ and $\sigma _2^2$ are known. Let $\mathcal {M}(t)=\left( u_1(t), \;\; u_2(t)\right) ' \in \mathcal {S}^{n-1} \subseteq \mathcal {R}^n$ for $n=n_1+n_2,\; t\in \mathcal {T}, $ as before. Let $\xi =\left( \xi _1, \; \; \xi _2 \right) '$ and $U= { \xi }/{ \Vert \xi \Vert }. $ Then U is uniformly distributed (over $\mathcal {S}^{n-1}$), independent of $\Vert \xi \Vert $. Since $\sigma _1$ and $\sigma _2$ are known, $\Vert \xi \Vert ^2$ will follow a $\chi ^2_n$ distribution.

Conditioning on value $\Vert \xi \Vert , $ making use of the fact that $\varepsilon _1$ and $\varepsilon _2$ are independent, we have

$$\begin{aligned} Pr(T>t_0)=&Pr(\sup _{t\in {\mathcal {T}}}\Vert Z(t)\Vert \ge t_0 ) =Pr(\sup _{t\in {\mathcal {T}}}\langle {\mathcal {M}}(t), \xi \rangle \;\; \ge t_0)\\ =&Pr(\sup _{t\in {\mathcal {T}}}\langle {\mathcal {M}}(t), \frac{\xi }{ \Vert \xi \Vert } \rangle \ge \frac{t_0 }{ \Vert \xi \Vert })\\ =&\int _{y\ge t_0}Pr(\sup _{t\in {\mathcal {T}}}\langle {\mathcal {M}}(t), U \rangle \ge \frac{t_0 }{y} \mid \Vert \xi \Vert =y) f_{\Vert \xi \Vert }(y)dy\\ =&\int _{y\ge t_0}Pr(\sup _{t\in {\mathcal {T}}}\langle {\mathcal {M}}(t), U \rangle \ge \frac{t_0 }{ y}) f_{\Vert \xi \Vert }(y)dy, \end{aligned}$$

where $f_{\Vert \xi \Vert }(y)$ denotes the pdf of $\Vert \xi \Vert ,$ whose square follows a $\chi ^2_n$ distribution. The last equation holds because of the fact that $\xi /\Vert \xi \Vert $ and $\Vert \xi \Vert $ are independent.

When $t_0$ is large, the following tube formula [cf: Lemma 4.5] can be plugged into the last equation

$$Pr(\sup _{t\in {\mathcal {T}}}\Vert \langle {\mathcal {M}}(t), U \rangle \Vert \ge c)\approx 2*\left\{ \frac{\kappa _0}{2\pi } (1-c^2)^{n/2-1}+\frac{E}{4} Pr(\beta _{\{1/2, (n-1)/2\}}\ge c^2\right\} $$

where $\beta _{\{1/2, (n-1)/2\}}$ denotes a Beta random variable with parameters 1/2 and (n–1)/2. The factor 2 corresponds to the two curves satisfying the probability condition: one is ${\mathcal {M}}(t)$, another $-{\mathcal {M}}(t).$ This gives formula (5.17).

Suppose we don’t know both $\sigma ^2_i,i=1,2, \;$ but they are estimated as in formula (5.5). Let

$$\begin{aligned}&X_i= \frac{\varepsilon '_i\varepsilon _i }{\sigma _i^2},&S_i= \frac{\hat{\sigma }^2_i }{\sigma ^2_i},\; \;\;\;\;\; i=1,2, \end{aligned}$$

(10.2)

$$\begin{aligned}&Y= \frac{X_1}{S_1} +\frac{X_2}{S_2},&X=\frac{X_1+X_2 }{ X_1/S_1+X_2/S_2 }=\frac{X_1+X_2 }{Y}. \end{aligned}$$

(10.3)

Then the requirements of Lemma 4.2 are satisfied, hence we have $X\sim _{approx}\chi ^2_{\nu }/\nu , $ with degrees of freedom $\nu $ estimated in formula (4.3). Therefore, $Y= ({X_1+X_2 })/{X}\rightarrow (\chi ^2_n)/(\chi ^2_{\nu }/\nu )\sim nF_{n,\nu }. $

Now let

$$\begin{aligned}&{\hat{\mathcal {M}}(t) }=\frac{(\hat{\sigma }_{1}{\mathbf {l}}_1(t) , -\hat{\sigma }_{2}{\mathbf {l}}_2(t) )}{\sqrt{\hat{\sigma }_{1}^2 \left\| {\mathbf {l}}_1(t) \right\| ^2+ \hat{\sigma }_{2}^2 \left\| {\mathbf {l}}_2(t) \right\| ^2}} \in \mathcal {S}^{n-1},&U=\frac{\hat{\xi }}{ \Vert {\hat{\xi }} \Vert }, \;\;\;\;\; \\&~~~~~~\hat{\xi }= (\frac{\varepsilon _1}{\hat{\sigma }_{1}}, \frac{\varepsilon _2}{\hat{\sigma }_{2}})\in \mathcal {R}^n,&\hat{Z}(t)=<{\hat{\mathcal {M}}(t) }, \hat{\xi }>. \end{aligned}$$

Then

$$\begin{aligned} Pr(T>t_0)=&Pr(\sup _{t\in {\mathcal {T}}}\Vert \hat{Z}(t)\Vert \ge t_0 ) =Pr(\sup _{t\in {\mathcal {T}}}\langle {\hat{\mathcal {M}}(t) },\hat{\xi }\rangle \;\; \ge t_0)\\ =&Pr(\sup _{t\in {\mathcal {T}}}\langle {\hat{\mathcal {M}}(t) }, \frac{\hat{\xi }}{\Vert {\hat{\xi }} \Vert } \rangle \ge \frac{t_0 }{\Vert {\hat{\xi }} \Vert })\\ =&\int _{y\ge t_0}Pr(\sup _{t\in {\mathcal {T}}}\langle \hat{\mathcal {M}}(t), U \rangle \ge \frac{t_0 }{y} \mid \Vert {\hat{\xi }} \Vert =y) f_{\Vert {\hat{\xi }} \Vert }(y)dy\\ =&\int _{y\ge t_0}Pr(\sup _{t\in {\mathcal {T}}}\langle \hat{\mathcal {M}}(t), U \rangle \ge \frac{t_0 }{ y}) f_{\Vert {\hat{\xi }} \Vert }(y)dy, \end{aligned}$$

where ${Y}=:\Vert {\hat{\xi }} \Vert ^2\sim _{approx} \chi ^2_{\nu }/\nu , $ as Y was defined in (4.2).

Let the $\sigma _i^2$ be known values in ${\hat{\mathcal {M}}(t) }$ in order to estimate $\kappa _0, $ so that we can use the Tube formula (4.5) for the estimation of the probability inside the integral. After some calculation, we get result (5.18). QED

Appendix E. Software

Our R package curvetest is freely available at http://www.r-project.org/. This package tests the equality of curves as described in this paper. The main function curvetest has the following parameters:

formula::: specified the regression formula.
data1::: data.frame representing the first (discretized) curve.
data2::: a data frame representing the second curve. If it is NULL, then the test is test $f(t)==0.$ data1 and data2 must have two columns with same column names, that can be retrieved by calls on the formula.
equal.var::: logic value, specifies if equal.variances are assumed. Default=TRUE.
plotit::: logic, asks if curvetest should generate the scatter plots and smoothing curves. It is useful to plot it to select the window width bw below. Default=F.
bw::: Window bandwidth for both curves.
alpha::: Smoothing parameter. Default=0.5.
nn::: number of points used to smooth the curves. The points are equally spaced between the domains that appeared in the two data set. Default=100.
myx::: x (or t) values to estimate the curves. Default= NULL. This will put n points specified by nn in the data range. If myx is non-null, papameter nn will be suppressed.
bcorrect::: method to use for boundary correction. Default=‘simple’. Other options are: ’none’=no corrections.
Conf.level::: the $\alpha $ value for the type I error level. Default=.05.
kernel::: kernel function to choose for smoothing. Users can choose one of ‘Trio’, ‘Gaussian’, ‘Uniform’, ‘Triweight’, ‘Triangle’, ‘Epanechnikov’, ‘Quartic’. See the definitions of them in Table 2.

Table 2 List of kernel functions

Full size table

Usage:

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Yang, Y., Sun, J. (2019). Novel Test for the Equality of Continuous Curves with Homoscedastic or Heteroscedastic Measurement Errors. In: Liu, R., Tsong, Y. (eds) Pharmaceutical Statistics. MBSW 2016. Springer Proceedings in Mathematics & Statistics, vol 218. Springer, Cham. https://doi.org/10.1007/978-3-319-67386-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-67386-8_20
Published: 13 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67385-1
Online ISBN: 978-3-319-67386-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics