Statistical Papers

, Volume 60, Issue 6, pp 1803–1826

# Two stage smoothing in additive models with missing covariates

Regular Article

## Abstract

This paper considers sparse additive models with missing covariates. The missing mechanism is assumed to be missing at random. The additive components are estimated via a two stage method. First, the penalized weighted least squares method is used. The weight is the inverse of the selection probability, which is the probability of observing covariates. As the penalty, we utilize the adaptive group lasso to distinguish between the zero and the nonzero components. Thus, the penalty is used to investigate the sparse structure and the weight reflects the missing structure. The estimator obtained from the penalized weighted least squares method is denoted by the first stage estimator (FSE). We show the sparsity and consistency properties of the FSE. However, the asymptotic distribution of the FSE of the nonzero components is not derived as it is difficult. Therefore for each nonzero component, we apply the penalized spline methods for univariate regression with the residual of the FSE of other component. The asymptotic normality of the second stage estimator is shown. To confirm the performance of the proposed estimator, simulation studies and a real data application are implemented.

## Keywords

Adaptive group lasso Additive model Inverse probability weighting Missing at random Penalized splines

## References

1. Barrow DL, Smith PW (1978) Asymptotic properties of best $$L_2[0,1]$$ approximation by spline with variable knots. Q Appl Math 36:293–304
2. Buja A, Hastie T, Tibshirani R (1989) Linear smoothers and additive models (with discussion). Ann Statist 17:453–555
3. Chen X, Wan A, Zhou Y (2015) Efficient quantile regression analysis with missing observations. J Am Statist Assoc 110:723–741
4. Claeskens G, Krivobokova T, Opsomer JD (2009) Asymptotic properties of penalized spline estimators. Biometrika 96:529–544
5. Fan JQ, Li RZ (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
6. Hao M, Song X, Sun L (2014) Reweighting estimators for the additive hazards model with missing covariates. Can J Stat 42:285–307
7. Hastie T, Tibshirani RJ (1990) Generalized additive models. CRC Press, London
8. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York
9. Horowitz JL, Lee S (2005) Nonparametric estimation of an additive quantile regression model. J Am Stat Assoc 100:1238–1249
10. Horowitz JL, Mammen E (2004) Nonparametric estimation of an additive model with a link function. Ann Stat 32:2412–2443
11. Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
12. Huang J, Horowitz JL, Wei F (2010) Variable selection in nonparametric additive models. Ann Stat 38:2282–2313
13. Konishi S, Kitagawa G (2008) Information criteria and statistical modeling. Springer, New York
14. Lee YK, Mammen E, Park BU (2010) Backfitting and smooth backfitting for additive quantile models. Ann Stat 40:2356–2357
15. Li T, Yang H (2016) Inverse probability weighted estimators for single-index models with missing covariates. Commun Stat 45:1199–1214
16. Liang H, Wang S, Robins J, Caroll R (2004) Estimation in partially linear models with missing covariates. J Am Stat Assoc 99:357–367
17. Lian H, Liang H, Ruppert D (2015) Separation of covariates into nonparametric and parametric parts in high-dimensional partially linear additive models. Stat Sin 25:591–608
18. Marx BD, Eilers PHC (1998) Direct generalized additive modeling with penalized likelihood. Comp Stat Data Anal 28:193–209
19. Meier L, van de Geer S, Bühlmann P (2009) High-dimensional additive modeling. Ann Stat 37:3779–3821
20. O’Sullivan F (1986) A statistical perspective on ill-posed inverse problems. Stat Sci 1:505–527 with discussion
21. Ravikumar P, Lafferty J, Liu H, Wasserman L (2009) Sparse additive models. J R Stat Soc B 71:1009–1030
22. Robins J, Rotnitsky A, Zhao L (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89:846–866
23. Ruppert D (2002) Selecting the number of knots for penalized splines. J Comput Graph Stat 11:735–757
24. Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, Cambridge
25. Ruppert D, Sheather SJ, Wand MP (1995) An effective bandwidth selector for local least squares regression. J Am Stat Assoc 90:1257–1270
26. Sepanski J, Knickerbocker R, Carroll R (1994) A semiparametric correction for attenuation. J Am Stat Assoc 89:1366–1373
27. Sherwood B, Wang L, Zhou X (2013) Weighted quantile regression for analyzing health care cost data with missing covariates. Stat Med 32:4967–4979
28. Sherwood B (2016) Variable selection for additive partial linear quantile regression with missing covariates. J Multivar Anal 152:206–223
29. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288
30. Wang CY, Wang S, Zhao LP, Ou ST (1997) Weighted semiparametric estimation in regression analysis with missing covariate data. J Am Stat Assoc 92:512–525
31. Wang CY, Wang S, Gutierrez RG, Carroll RJ (1998) Local linear regression for generalized linear models with missing data. Ann Stat 26:1028–1050
32. Yang H, Liu H (2016) Penalized weighted composite quantile estimators with missing covariates. Stat Pap 57:69–88
33. Yi GY, He W (2009) Median regression models for longitudinal data with dropouts. Biometrics 65:618–625
34. Yoshida T, Naito K (2014) Asymptotics for penalized splines in generalized additive models. J Nonparametric Stat 26:269–289
35. Zhang HH, Lin Y (2006) Component selection and smoothing for nonparametric regression in exponential families. Stat Sin 16:1021–1041
36. Zhou S, Shen X, Wolfe DA (1998) Local asymptotics for regression splines and confidence regions. Ann Stat 26:1760–1782