Abstract
Estimating and constructing a simultaneous confidence band for the mean function in the presence of outliers is an important problem in the framework of functional data analysis. In this paper, we propose a robust estimator and a robust simultaneous confidence band for the mean function of functional data using M-estimation and B-splines. The robust simultaneous confidence band is also extended to the difference of mean functions of two populations. Further, the asymptotic properties of the M-based mean function estimator, such as the asymptotic consistency and asymptotic normality, are studied. The performance of the proposed robust methods and their robustness are demonstrated with an extensive simulation study and two real data examples.
Similar content being viewed by others
References
Bali, J. L., Boente, G., Tyler, D. E., Wang, J. L. (2011). Robust functional principal components: A projection-pursuit approach. Annals of Statistics, 39(6), 2852–2882.
Boente, G., Salibian-Barrera, M. (2015). S-estimators for functional principal component analysis. Journal of the American Statistical Association, 110(511), 1100–1111.
Cao, G., Yang, L., Todem, D. (2012). Simultaneous inference for the mean function based on dense functional data. Journal of Nonparametric Statistics, 24(2), 359–377.
Cox, D. D. (1983). Asymptotics for m-type smoothing splines. Annals of Statistics, 11, 530–551.
Daszykowski, M., Kaczmarek, K., Vander Heyden, Y., Walczak, B. (2007). Robust statistics in data analysis—A review: Basic concepts. Chemometrics and Intelligent Laboratory Systems, 85(2), 203–219.
Embling, C. B., Illian, J., Armstrong, E., van der Kooij, J., Sharples, J., Camphuysen, K. C., Scott, B. E. (2012). Investigating fine-scale spatio-temporal predator-prey patterns in dynamic marine ecosystems: A functional data analysis approach. Journal of Applied Ecology, 49(2), 481–492.
Esbensen, K., Schönkopf, S., Midtgaard, T., Guyot, D. (1996). Multivariate analysis in practice: A training package. Trondheim: Camo As.
Febrero, M., Galeano, P., González-Manteiga, W. (2008). Outlier detection in functional data by depth measures, with application to identify abnormal nox levels. Environmetrics, 19(4), 331–345.
Ferraty, F. (2011). Recent advances in functional data analysis and related topics. Berlin: Springer.
Ferraty, F., Rabhi, A., Vieu, P. (2005). Conditional quantiles for dependent functional data with application to the climatic “el niño” phenomenon. Sankhyā: The Indian Journal of Statistics, 67(2), 378–398.
Gervini, D. (2008). Robust functional estimation using the median and spherical principal components. Biometrika, 95(3), 587–600.
Gu, L., Wang, L., Härdle, W. K., Yang, L. (2014). A simultaneous confidence corridor for varying coefficient regression with sparse functional data. Test, 23(4), 806–843.
Huang, J. Z., Wu, C. O., Zhou, L. (2004). Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Statistica Sinica, 14, 763–788.
Huber, P. J. (1964). Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35(1), 73–101.
Kraus, D., Panaretos, V. M. (2012). Dispersion operators and resistant second-order functional data analysis. Biometrika, 99(4), 813–832.
Lee, S., Shin, H., Billor, N. (2013). M-type smoothing spline estimators for principal functions. Computational Statistics & Data Analysis, 66, 89–100.
Lim, Y., Oh, H. S. (2015). Simultaneous confidence interval for quantile regression. Computational Statistics, 30(2), 345–358.
Lima, I. R., Cao, G., Billor, N. (2017). Robust simultaneous inference for the mean function of functional data. Ph.D. dissertation. Auburn University.
Locantore, N., Marron, J., Simpson, D., Tripoli, N., Zhang, J., Cohen, K., Boente, G., Fraiman, R., Brumback, B., Croux, C. (1999). Robust principal component analysis for functional data. Test, 8(1), 1–73.
Maronna, R., Martin, D., Yohai, V. (2006). Robust statistics: Theory and methods. Wiley series in probability and statistics. Chichester: Wiley.
Maronna, R. A., Yohai, V. J. (2013). Robust functional linear regression based on splines. Computational Statistics & Data Analysis, 65, 46–55.
Shin, H., Lee, S. (2016). An RKHS approach to robust functional linear regression. Statistica Sinica, 26, 255–272.
Silverman, B., Ramsay, J. (2005). Functional data analysis (2nd ed.). New York: Springer.
Stone, C. J. (1985). Additive regression and other nonparametric models. The Annals of Statistics, 13, 689–705.
Tang, Q., Cheng, L. (2012). M-estimation and b-spline approximation for varying coefficient models with longitudinal data. Journal of Nonparametric Statistics, 20, 611–625.
Venables, W. N., Ripley, B. D. (2002). Modern applied statistics with S (4th ed.). New York: Springer.
Wei, Y., He, X. (2006). Conditional growth charts. Annals of Statistics, 34(5), 2069–2097.
Author information
Authors and Affiliations
Corresponding author
Additional information
Cao’s research is supported in part by the Simons Foundation under Grant #354917 and the National Science Foundation under Grants DMS 1736470. We thank the Associate Editor and two anonymous referees for their helpful and constructive comments, which lead to significant improvement in this paper.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: Variance of pseudo-data
Appendix: Variance of pseudo-data
In order to evaluate the efficiency of pseudo-data method, we compare the sample variance of the pseudo-data with the real variance of the uncontaminated model defined in Sect. 3.1. To further emphasize the influence of outliers in the calculation of the variance, we also compared the results with the sample variance of the outlier contaminated dataset, using the least-squares method as the estimator for the mean function. We generate a functional dataset from the model in Sect. 3.1 for sample size \(n = 200\), with \(s_l = 5\) and \(s_u = 7\) for peak outlier case. Each simulation is repeated 500 times. The results are presented in Fig. 4. We only present the results for peak outliers, as all other cases have similar results. The results show that the variance of the pseudo-data is very close to the true sample variance computed from the clean dataset, while the non-robust estimation of the variance of the contaminated dataset is strongly affected by the outlier curves.
About this article
Cite this article
Lima, I.R., Cao, G. & Billor, N. M-based simultaneous inference for the mean function of functional data. Ann Inst Stat Math 71, 577–598 (2019). https://doi.org/10.1007/s10463-018-0656-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-018-0656-y