Statistical Papers

, Volume 60, Issue 6, pp 1803–1826 | Cite as

Two stage smoothing in additive models with missing covariates

  • Takuma YoshidaEmail author
Regular Article


This paper considers sparse additive models with missing covariates. The missing mechanism is assumed to be missing at random. The additive components are estimated via a two stage method. First, the penalized weighted least squares method is used. The weight is the inverse of the selection probability, which is the probability of observing covariates. As the penalty, we utilize the adaptive group lasso to distinguish between the zero and the nonzero components. Thus, the penalty is used to investigate the sparse structure and the weight reflects the missing structure. The estimator obtained from the penalized weighted least squares method is denoted by the first stage estimator (FSE). We show the sparsity and consistency properties of the FSE. However, the asymptotic distribution of the FSE of the nonzero components is not derived as it is difficult. Therefore for each nonzero component, we apply the penalized spline methods for univariate regression with the residual of the FSE of other component. The asymptotic normality of the second stage estimator is shown. To confirm the performance of the proposed estimator, simulation studies and a real data application are implemented.


Adaptive group lasso Additive model Inverse probability weighting Missing at random Penalized splines 



The authors wish to thank the Editor, Associate Editor and two anonymous referees for their variable comments. The research of the author was partially supported by KAKENHI 26730019.


  1. Barrow DL, Smith PW (1978) Asymptotic properties of best \(L_2[0,1]\) approximation by spline with variable knots. Q Appl Math 36:293–304CrossRefGoogle Scholar
  2. Buja A, Hastie T, Tibshirani R (1989) Linear smoothers and additive models (with discussion). Ann Statist 17:453–555MathSciNetCrossRefGoogle Scholar
  3. Chen X, Wan A, Zhou Y (2015) Efficient quantile regression analysis with missing observations. J Am Statist Assoc 110:723–741MathSciNetCrossRefGoogle Scholar
  4. Claeskens G, Krivobokova T, Opsomer JD (2009) Asymptotic properties of penalized spline estimators. Biometrika 96:529–544MathSciNetCrossRefGoogle Scholar
  5. Fan JQ, Li RZ (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360MathSciNetCrossRefGoogle Scholar
  6. Hao M, Song X, Sun L (2014) Reweighting estimators for the additive hazards model with missing covariates. Can J Stat 42:285–307MathSciNetCrossRefGoogle Scholar
  7. Hastie T, Tibshirani RJ (1990) Generalized additive models. CRC Press, LondonzbMATHGoogle Scholar
  8. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New YorkCrossRefGoogle Scholar
  9. Horowitz JL, Lee S (2005) Nonparametric estimation of an additive quantile regression model. J Am Stat Assoc 100:1238–1249MathSciNetCrossRefGoogle Scholar
  10. Horowitz JL, Mammen E (2004) Nonparametric estimation of an additive model with a link function. Ann Stat 32:2412–2443MathSciNetCrossRefGoogle Scholar
  11. Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685MathSciNetCrossRefGoogle Scholar
  12. Huang J, Horowitz JL, Wei F (2010) Variable selection in nonparametric additive models. Ann Stat 38:2282–2313MathSciNetCrossRefGoogle Scholar
  13. Konishi S, Kitagawa G (2008) Information criteria and statistical modeling. Springer, New YorkCrossRefGoogle Scholar
  14. Lee YK, Mammen E, Park BU (2010) Backfitting and smooth backfitting for additive quantile models. Ann Stat 40:2356–2357MathSciNetCrossRefGoogle Scholar
  15. Li T, Yang H (2016) Inverse probability weighted estimators for single-index models with missing covariates. Commun Stat 45:1199–1214MathSciNetCrossRefGoogle Scholar
  16. Liang H, Wang S, Robins J, Caroll R (2004) Estimation in partially linear models with missing covariates. J Am Stat Assoc 99:357–367MathSciNetCrossRefGoogle Scholar
  17. Lian H, Liang H, Ruppert D (2015) Separation of covariates into nonparametric and parametric parts in high-dimensional partially linear additive models. Stat Sin 25:591–608MathSciNetzbMATHGoogle Scholar
  18. Marx BD, Eilers PHC (1998) Direct generalized additive modeling with penalized likelihood. Comp Stat Data Anal 28:193–209CrossRefGoogle Scholar
  19. Meier L, van de Geer S, Bühlmann P (2009) High-dimensional additive modeling. Ann Stat 37:3779–3821MathSciNetCrossRefGoogle Scholar
  20. O’Sullivan F (1986) A statistical perspective on ill-posed inverse problems. Stat Sci 1:505–527 with discussionMathSciNetzbMATHGoogle Scholar
  21. Ravikumar P, Lafferty J, Liu H, Wasserman L (2009) Sparse additive models. J R Stat Soc B 71:1009–1030MathSciNetCrossRefGoogle Scholar
  22. Robins J, Rotnitsky A, Zhao L (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89:846–866MathSciNetCrossRefGoogle Scholar
  23. Ruppert D (2002) Selecting the number of knots for penalized splines. J Comput Graph Stat 11:735–757MathSciNetCrossRefGoogle Scholar
  24. Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  25. Ruppert D, Sheather SJ, Wand MP (1995) An effective bandwidth selector for local least squares regression. J Am Stat Assoc 90:1257–1270MathSciNetCrossRefGoogle Scholar
  26. Sepanski J, Knickerbocker R, Carroll R (1994) A semiparametric correction for attenuation. J Am Stat Assoc 89:1366–1373MathSciNetCrossRefGoogle Scholar
  27. Sherwood B, Wang L, Zhou X (2013) Weighted quantile regression for analyzing health care cost data with missing covariates. Stat Med 32:4967–4979MathSciNetCrossRefGoogle Scholar
  28. Sherwood B (2016) Variable selection for additive partial linear quantile regression with missing covariates. J Multivar Anal 152:206–223MathSciNetCrossRefGoogle Scholar
  29. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288MathSciNetzbMATHGoogle Scholar
  30. Wang CY, Wang S, Zhao LP, Ou ST (1997) Weighted semiparametric estimation in regression analysis with missing covariate data. J Am Stat Assoc 92:512–525MathSciNetCrossRefGoogle Scholar
  31. Wang CY, Wang S, Gutierrez RG, Carroll RJ (1998) Local linear regression for generalized linear models with missing data. Ann Stat 26:1028–1050MathSciNetCrossRefGoogle Scholar
  32. Yang H, Liu H (2016) Penalized weighted composite quantile estimators with missing covariates. Stat Pap 57:69–88MathSciNetCrossRefGoogle Scholar
  33. Yi GY, He W (2009) Median regression models for longitudinal data with dropouts. Biometrics 65:618–625MathSciNetCrossRefGoogle Scholar
  34. Yoshida T, Naito K (2014) Asymptotics for penalized splines in generalized additive models. J Nonparametric Stat 26:269–289MathSciNetCrossRefGoogle Scholar
  35. Zhang HH, Lin Y (2006) Component selection and smoothing for nonparametric regression in exponential families. Stat Sin 16:1021–1041MathSciNetzbMATHGoogle Scholar
  36. Zhou S, Shen X, Wolfe DA (1998) Local asymptotics for regression splines and confidence regions. Ann Stat 26:1760–1782MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.Kagoshima UniversityKagoshimaJapan

Personalised recommendations