Advertisement

Lifetime Data Analysis

, Volume 23, Issue 2, pp 305–338 | Cite as

Variable selection in discrete survival models including heterogeneity

  • Andreas Groll
  • Gerhard Tutz
Article

Abstract

Several variable selection procedures are available for continuous time-to-event data. However, if time is measured in a discrete way and therefore many ties occur models for continuous time are inadequate. We propose penalized likelihood methods that perform efficient variable selection in discrete survival modeling with explicit modeling of the heterogeneity in the population. The method is based on a combination of ridge and lasso type penalties that are tailored to the case of discrete survival. The performance is studied in simulation studies and an application to the birth of the first child.

Keywords

Variable selection Discrete survival Heterogeneity Lasso 

Notes

Acknowledgments

This article uses data from the German family panel pairfam, coordinated by Josef Brüderl, Johannes Huinink, Bernhard Nauck, and Sabine Walper. Pairfam is funded as long-term Project by the German Research Foundation (DFG). We are also grateful to Jasmin Abedieh for providing the specific discrete survival data, which were constructed from the pairfam data and were part of her master thesis.

References

  1. Anderson DA, Aitkin M (1985) Variance component models with binary response: interviewer variability. J R Stat Soc Ser B 47:203–210MathSciNetGoogle Scholar
  2. Androulakis E, Koukouvinos C, Vonta F (2012) Estimation and variable selection via frailty models with penalized likelihood. Stat Med 31(20):2223–2239MathSciNetCrossRefMATHGoogle Scholar
  3. Baker M, Melino A (2000) Duration dependence and nonparametric heterogeneity: a monte carlo study. J Econom 96:357–393CrossRefMATHGoogle Scholar
  4. Bates D, Maechler M (2010) lme4: linear mixed-effects models using S4 classes. http://CRAN.R-project.org/package=lme4, R package version 0.999999-0
  5. Bradic J, Fan J, Jiang J (2011) Regularization for coxÕs proportional hazards model with np-dimensionality. Ann Stat 39(6):3092MathSciNetCrossRefMATHGoogle Scholar
  6. Breslow NE, Clayton DG (1993) Approximate inference in generalized linear mixed model. J Am Stat Assoc 88:9–25MATHGoogle Scholar
  7. Breslow NE, Lin X (1995) Bias correction in generalized linear mixed models with a single component of dispersion. Biometrika 82:81–91MathSciNetCrossRefMATHGoogle Scholar
  8. Broström G (2009) glmmML: generalized linear models with clustering. http://CRAN.R-project.org/package=glmmML, R package version 0.81-6
  9. Brown C (1975) On the use of indicator variables for studying the time-dependence of parameters in a response-time model. Biometrics 31:863–872CrossRefMATHGoogle Scholar
  10. Bühlmann P, Van De Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, New YorkCrossRefMATHGoogle Scholar
  11. Cox DR (1972) Regression models and life tables (with discussion). J R Stat Soc B 34:187–220MathSciNetMATHGoogle Scholar
  12. Dezeure R, Bühlmann P, Meier L, Meinshausen N (2014) High-dimensional inference: confidence intervals, p values and R-Software hdi. arXiv preprint arXiv:14084026
  13. Dierckx P (1993) Curve and surface fitting with splines. Oxford Science Publications, OxfordMATHGoogle Scholar
  14. Do Ha I, Noh M, Lee Y (2012) Frailtyhl: a package for fitting frailty models with h-likelihood. R J 4(2):28–36Google Scholar
  15. Efron B (1988) Logistic regression, survival analysis, and the Kaplan–Meier-curve. J Am Stat Assoc 83:414–425MathSciNetCrossRefMATHGoogle Scholar
  16. Eilers PHC, Marx BD (1996) Flexible smoothing with B-splines and Penalties. Stat Sci 11:89–121MathSciNetCrossRefMATHGoogle Scholar
  17. Fahrmeir L (1994) Dynamic modelling and penalized likelihood estimation for discrete time survival data. Biometrika 81:317–330CrossRefMATHGoogle Scholar
  18. Fahrmeir L, Kneib T (2011) Bayesian smoothing and regression for longitudinal, spatial and event history data. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
  19. Fahrmeir L, Knorr-Held L (1997) Dynamic discrete-time duration models: estimation via markov chain monte carlo. Sociol Methodol 27(1):417–452CrossRefGoogle Scholar
  20. Fahrmeir L, Tutz G (2001) Multivariate statistical modelling based on generalized linear models. Springer, New YorkCrossRefMATHGoogle Scholar
  21. Fan J, Li R (2002) Variable selection for Cox’s proportional hazards model and frailty model. Ann Stat. pp 74–99Google Scholar
  22. Friedman JH, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22CrossRefGoogle Scholar
  23. Gamst A, Donohue M, Xu R (2009) Asymptotic properties and empirical evaluation of the npmle in the proportional hazards mixed-effects model. Stat Sin 19(3):997MathSciNetMATHGoogle Scholar
  24. Gelman A, Hill J, Su Y, Yajima M, Pittau MG (2013) mi: missing data imputation and model checking. http://CRAN.R-project.org/package=mi, R package version 0.09-18.03
  25. Goeman JJ (2010) \(\rm {L}_1\) penalized estimation in the Cox proportional hazards model. Biom J 52:70–84MathSciNetMATHGoogle Scholar
  26. Goeman JJ (2011) Penalized. R package version 0.9-42Google Scholar
  27. Groll A (2011) glmmLasso: variable selection for generalized linear mixed models by \(\text{ L }_1\)-penalized estimation. http://CRAN.R-project.org/package=glmmLasso, R package version 1.2.3
  28. Groll A, Tutz G (2014) Variable selection for generalized linear mixed models by \(\text{ L }_1\)-penalized estimation. Stat Comput 24(2):137–154MathSciNetCrossRefMATHGoogle Scholar
  29. Ham JC, Rea Jr SA (1987) Unemployment insurance and male unemployment duration in Canada. J Labor Econom. pp 325–353Google Scholar
  30. Hartzel J, Liu I, Agresti A (2001) Describing heterogenous effects in stratified ordinal contingency tables, with applications to multi-center clinical trials. Comput Stat Data Anal. 35(4):429–449CrossRefMATHGoogle Scholar
  31. Heckman JJ, Singer B (1984) Econometric duration analysis. J Econom 24(1):63–132MathSciNetCrossRefMATHGoogle Scholar
  32. Hinde J (1982) Compound poisson regression models. In: Gilchrist R (ed) GLIM 1982 international conference on generalized linear models. Springer, New York, pp 109–121Google Scholar
  33. Huinink J, Brüderl J, Nauck B, Walper S, Castiglioni L, Feldhaus M (2011) Panel analysis of intimate relationships and family dynamics (pairfam): conceptual framework and design. J Fam Res 23:77–101Google Scholar
  34. Kalbfleisch J, Prentice R (2002) The statistical analysis of failure time data, 2nd edn. Wiley, New YorkCrossRefMATHGoogle Scholar
  35. Kauermann G, Tutz G, Brüderl J (2005) The survival of newly founded firms: a case-study into varying-coefficient models. J R Stat Soc A 168:145–158MathSciNetCrossRefMATHGoogle Scholar
  36. Laird N, Olivier D (1981) Covariance analysis of censored survival data using log-linear analysis techniques. J Am Stat Assoc 76(374):231–240MathSciNetCrossRefMATHGoogle Scholar
  37. Lancaster T (1990) The econometric analysis of transition data. Cambridge University Press, CambridgeMATHGoogle Scholar
  38. Land KC, Nagin DS, McCall PL (2001) Discrete-time hazard regression models with hidden heterogeneity the semiparametric mixed poisson regression approach. Sociol Methods Res 29(3):342–373MathSciNetCrossRefGoogle Scholar
  39. Leeb H, Pötscher BM (2005) Model selection and inference: facts and fiction. Econom Theory 21(01):21–59MathSciNetCrossRefMATHGoogle Scholar
  40. Lin X, Breslow NE (1996) Bias correction in generalized linear mixed models with multiple components of dispersion. J Am Stat Assoc 91:1007–1016MathSciNetCrossRefMATHGoogle Scholar
  41. Littell R, Milliken G, Stroup W, Wolfinger R (1996) SAS system for mixed models. SAS Institute Inc., CaryGoogle Scholar
  42. Liu Q, Pierce DA (1994) A note on Gauss–Hermite quadrature. Biometrika 81:624–629MathSciNetMATHGoogle Scholar
  43. Lockhart R, Taylor J, Tibshirani RJ, Tibshirani R (2014) A significance test for the Lasso. Ann Stat 42(2):413MathSciNetCrossRefMATHGoogle Scholar
  44. Möst S, Pößnecker W, Tutz G (2015) Variable selection for discrete competing risks models. Qual Quant. pp 1–22Google Scholar
  45. Nauck B, Brüderl J, Huinink J, Walper S (2013) The german family panel (pairfam). GESIS data archive, cologne ZA5678 data file version 4.0.0Google Scholar
  46. Nicoletti C, Rondinelli C (2010) The (mis) specification of discrete duration models with unobserved heterogeneity: a monte carlo study. J Econom 159(1):1–13MathSciNetCrossRefMATHGoogle Scholar
  47. Park MY, Hastie T (2007) An l1 regularization-path algorithm for generalized linear models. J R Stat Soc B 69:659–677CrossRefGoogle Scholar
  48. Pinheiro JC, Bates DM (1995) Approximations to the log-likelihood function in the nonlinear mixed-effects model. J Comput Graph Stat 4:12–35Google Scholar
  49. Pinheiro JC, Bates DM (2000) Mixed-effects models in S and S-plus. Springer, New YorkCrossRefMATHGoogle Scholar
  50. Pötscher BM, Leeb H (2009) On the distribution of penalized maximum likelihood estimators: the lasso, scad, and thresholding. J Multivar Anal 100(9):2065–2082MathSciNetCrossRefMATHGoogle Scholar
  51. Prentice RL, Gloeckler LA (1978) Regression analysis of grouped survival data with application to breast cancer data. Biometrics 34:57–67CrossRefMATHGoogle Scholar
  52. Rondeau V, Mazroui Y, Gonzalez JR (2012) frailtypack: an R package for the analysis of correlated survival data with frailty models using penalized likelihood estimation or parametrical estimation. J Stat Softw 47(4):1–28CrossRefGoogle Scholar
  53. Scheike T, Jensen T (1997) A discrete survival model with random effects: an application to time to pregnancy. Biometrics. pp 318–329Google Scholar
  54. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464MathSciNetCrossRefMATHGoogle Scholar
  55. Simon N, Friedman J, Hastie T, Tibshirani R (2011) Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Softw 39(5):1–13CrossRefGoogle Scholar
  56. Therneau T, Grambsch P (2000) Modeling survival data: extending the Cox model. Springer, New YorkCrossRefMATHGoogle Scholar
  57. Therneau TM (2013) A package for survival analysis in S. R package version 2.37-4Google Scholar
  58. Thompson WA (1977) On the treatment of grouped observations in life studies. Biometrics 33:463–470MathSciNetCrossRefMATHGoogle Scholar
  59. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288MathSciNetMATHGoogle Scholar
  60. Tutz G, Pritscher L (1996) Nonparametric estimation of discrete hazard functions. Lifetime Data Anal 2:291–308CrossRefMATHGoogle Scholar
  61. van Buuren S, Groothuis-Oudshoorn K (2011) Mice: multivariate imputation by chained equations in R. JStat Softw 45(3):1–67. http://www.jstatsoft.org/v45/i03/
  62. van Buuren S, Groothuis-Oudshoorn K (2013) Mice: multivariate imputation by chained equations in R. http://CRAN.R-project.org/package=mice, R package version 2.18
  63. Van den Berg GJ (2001) Duration models: specification, identification and multiple durations. Handbook Econom 5:3381–3460CrossRefGoogle Scholar
  64. Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New YorkCrossRefMATHGoogle Scholar
  65. Vermunt JK (1996) Log-linear event history analysis: a general approach with missing data, latent variables, and unobserved heterogeneity, vol 8. Tilburg University Press, TilburgMATHGoogle Scholar
  66. Vonesh EF (1996) A note on the use of Laplace’s approximation for nonlinear mixed-effects models. Biometrika 83:447–452MathSciNetCrossRefMATHGoogle Scholar
  67. Wolfinger R, O’Connell M (1993) Generalized linear mixed models; a pseudolikelihood approach. J Stat Comput Simul 48:233–243CrossRefMATHGoogle Scholar
  68. Wood S, Scheipl F (2013) Gamm4: generalized additive mixed models using mgcv and lme4. http://CRAN.R-project.org/package=gamm4, R package version 0.2-2
  69. Wood SN (2006) Generalized additive models: an introduction with R. Chapman & Hall/CRC, LondonMATHGoogle Scholar
  70. Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Ludwig-Maximilians-Universität MünchenMunichGermany
  2. 2.Ludwig-Maximilians-Universität MünchenMunichGermany

Personalised recommendations