Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Oracally efficient estimation for dense functional data with holiday effects

  • 137 Accesses

Abstract

Existing functional data analysis literature has mostly overlooked data with spikes in mean, such as weekly sporting goods sales by a salesperson which spikes around holidays. For such functional data, two-step estimation procedures are formulated for the population mean function and holiday effect parameters, which correspond to the population sales curve and the spikes in sales during holiday times. The estimators are based on spline smoothing for individual trajectories using non-holiday observations, and are shown to be oracally efficient in the sense that both the mean function and holiday effects are estimated as efficiently as if all individual trajectories were known a priori. Consequently, an asymptotic simultaneous confidence band is established for the mean function and confidence intervals for holiday effects, respectively. Two sample extensions are also formulated and simulation experiments provide strong evidence that corroborates the asymptotic theory. Application to sporting goods sales data has led to a number of new discoveries.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

References

  1. Anzanello M, Fogliatto F (2011) Learning curve models and applications: literature review and research directions. Int J Ind Ergon 41:573–583

  2. Benko M, Härdle W, Kneip A (2009) Common functional principal components. Ann Statist 37:1–34

  3. Bosq D (2000) Linear processes in function spaces: theory and applications. Springer, New York

  4. Cai L, Yang L (2015) A smooth simultaneous confidence band for conditional variance function. TEST 24:632–655

  5. Cai L, Liu R, Wang S, Yang L (2019) Simultaneous confidence bands for mean and variance functions based on deterministic design. Stat Sin 29:505–525

  6. Cao G, Wang L, Li Y, Yang L (2016) Oracle efficient confidence envelopes for covariance functions in dense functional data. Stat Sin 26:359–383

  7. Cao G, Yang L, Todem D (2012) Simultaneous inference for the mean function based on dense functional data. J Nonparametr Statist 24:359–377

  8. Cardot H (2000) Nonparametric estimation of smoothed principal components analysis of sampled noisy functions. J Nonparametr Stat 12:503–538

  9. Cho H, Fryzlewicz P (2015) Multiple-change-point detection for high dimensional time series via sparsified binary segmentation. J R Stat Soc B 77:475–507

  10. Claeskens G, Van Keilegom I (2003) Bootstrap confidence bands for regression curves and their derivatives. Ann Stat 31:1852–1884

  11. de Boor C (1978) A practical guide to splines. Springer, New York

  12. Degras D (2011) Simultaneous confidence bands for nonparametric regression with functional data. Stat Sin 21:1735–1765

  13. Fan J, Huang T, Li R (2007) Analysis of longitudinal data with semiparametric estimation of covariance function. J Am Stat Assoc 102:632–642

  14. Fan J, Lin S (1998) Tests of significance when data are curves. J Am Stat Assoc 93:1007–1021

  15. Fan J, Zhang W (2000) Simultaneous confidence bands and hypothesis testing in varying coefficient models. Scand J Stat 27:715–731

  16. Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, New York

  17. Fryzlewicz P, Subba Rao S (2014) Multiple-change-point detection for auto-regressive conditional heteroscedastic processes. J R Stat Soc B 76:903–924

  18. Gu L, Wang L, Härdle W, Yang L (2014) A simultaneous confidence corridor for varying coefficient regression with sparse functional data. TEST 23:806–843

  19. Gu L, Yang L (2015) Oracally efficient estimation for single-index link function with simultaneous confidence band. Electron J Stat 9:1540–1561

  20. Hall P, Müller H, Wang J (2006) Properties of principal component methods for functional and longitudinal data analysis. Ann Stat 34:1493–1517

  21. Huang J, Yang L (2004) Identification of nonlinear additive autoregressive models. J R Stat Soc B 66:463–477

  22. Huang X, Wang L, Yang L, Kravchenko A (2008) Management practice effects on relationships of grain yields with topography and precipitation. Agron J 100:1463–1471

  23. James G, Hastie T, Sugar C (2000) Principal component models for sparse functional data. Biometrika 87:587–602

  24. James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98:397–408

  25. Komlós J, Major P, Tusnády G (1976) An approximation of partial sums of independent RV’s, and the sample DF II. Z. Wahrscheinlichkeitstheorie Verw. Gebiete 34:33–58

  26. Li B, Yu Q (2008) Classification of functional data: a segmentation approach. Comput Stat Data Anal 52:4790–4800

  27. Ma S, Yang L, Carroll RJ (2012) A simultaneous confidence band for sparse longitudinal regression. Stat Sin 22:95–122

  28. Ma S (2014) A plug-in the number of knots selector for polynomial spline regression. J Nonparametr Stat 26:489–507

  29. Raña P, Aneiros G, Vilar JM (2015) Detection of outliers in functional time series. Environmetrics 26:178–191

  30. Rice J, Wu C (2001) Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57:253–259

  31. Schröder AL, Fryzlewicz P (2013) Adaptive trend estimation in financial time series via multiscale change-point-induced basis recovery. Stat Interface 6:449–461

  32. Song Q, Yang L (2009) Spline confidence bands for variance function. J Nonparametric Stat 21:589–609

  33. Wang J, Liu R, Cheng F, Yang L (2014) Oracally efficient estimation of autoregressive error distribution with simultaneous confidence band. Ann Stat 42:654–668

  34. Wang J, Wang S, Yang L (2016) Simultaneous confidence bands for the distribution function of a finite population and of its superpopulation. TEST 25:692–709

  35. Wang J, Yang L (2009) Polynomial spline confidence bands for regression curves. Stat Sin 19:325–342

  36. Wu W, Zhao Z (2007) Inference of trends in time series. J R Stat Soc B 69:391–410

  37. Yao F, Müller H, Wang J (2005) Functional data analysis for sparse longitudinal data. J Am Stat Assoc 100:577–590

  38. Zhang J (2013) Analysis of variance for functional data. Chapman & Hall/CRC, Boca Raton

  39. Zhao Z, Wu W (2008) Confidence bands in nonparametric time series regression. Ann Stat 36:1854–1878

  40. Zheng S, Liu R, Yang L, Hädle W (2016) Statistical inference for generalized additive models: simultaneous confidence corridors and variable selection. TEST 25:607–626

  41. Zheng S, Yang L, Härdle W (2014) A smooth simultaneous confidence corridor for the mean of sparse functional data. J Am Stat Assoc 109:661–673

  42. Zhou S, Shen X, Wolfe D (1998) Local asymptotics of regression splines and confidence regions. Ann Stat 26:1760–1782

Download references

Acknowledgements

This research was supported in part by National Natural Science Foundation of China Awards 11371272 and 11771240, and the Tsinghua University Center for Data-Centric Management in the Department of Industrial Engineering. Part of the research was carried out when the first author was a visitor at the Department of Statistics, Texas A & M University. The first author thanks the China Scholarship Council (CSC) for providing financial support to visit Texas A & M University. The helpful comments from Editor-in-Chief Lola Ugarte, an Associate Editor and two Reviewers are gratefully acknowledged.

Author information

Correspondence to Lijian Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Li Cai, Lisha Li: Co-first authors.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 87 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cai, L., Li, L., Huang, S. et al. Oracally efficient estimation for dense functional data with holiday effects. TEST 29, 282–306 (2020). https://doi.org/10.1007/s11749-019-00655-5

Download citation

Keywords

  • B-spline
  • Dummy variables
  • Functional data
  • Holiday effects
  • Oracle efficiency
  • Simultaneous confidence band

Mathematics Subject Classification

  • 62M10
  • 62G08
  • 62P20