Lifetime Data Analysis

, 14:496 | Cite as

Bayesian variable selection for the Cox regression model with missing covariates

  • Joseph G. Ibrahim
  • Ming-Hui Chen
  • Sungduk Kim


In this paper, we develop Bayesian methodology and computational algorithms for variable subset selection in Cox proportional hazards models with missing covariate data. A new joint semi-conjugate prior for the piecewise exponential model is proposed in the presence of missing covariates and its properties are examined. The covariates are assumed to be missing at random (MAR). Under this new prior, a version of the Deviance Information Criterion (DIC) is proposed for Bayesian variable subset selection in the presence of missing covariates. Monte Carlo methods are developed for computing the DICs for all possible subset models in the model space. A Bone Marrow Transplant (BMT) dataset is used to illustrate the proposed methodology.


Conjugate prior Deviance information criterion Missing at random Proportional hazards models 


  1. Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) International symposium on information theory. Akademia Kiado, Budapest, pp 267–81Google Scholar
  2. Brown PJ, Vanucci M, Fearn T (1998) Multivariate Bayesian variable selection and prediction. J R Stat Soc B 60: 627–41MATHCrossRefGoogle Scholar
  3. Brown PJ, Vanucci M, Fearn T (2002) Bayes model averaging with selection of regresors. J R Stat Soc B 64: 519–36MATHCrossRefGoogle Scholar
  4. Celeux G, Forbes F, Robert CP, Titterington DM (2006) Deviance information criteria for missing data models (with discussion). Bayesian Anal 1: 651–74CrossRefMathSciNetGoogle Scholar
  5. Chen MH, Ibrahim JG (2003) Conjugate priors for generalized linear models. Stat Sinica 13: 461–76MATHMathSciNetGoogle Scholar
  6. Chen MH, Ibrahim JG, Yiannoutsos C (1999) Prior elicitation, variable selection, and Bayesian computation for logistic regression models. J R Stat Soc B 61: 223–42MATHCrossRefMathSciNetGoogle Scholar
  7. Chen MH, Ibrahim JG, Shao QM, Weiss RE (2003) Prior elicitation for model selection and estimation in generalized linear mixed models. J Stat Plan Inference 111: 57–6MATHCrossRefMathSciNetGoogle Scholar
  8. Chen MH, Dey DK, Ibrahim JG (2004) Bayesian criterion based model assessment for categorical data. Biometrika 91: 45–3MATHCrossRefMathSciNetGoogle Scholar
  9. Chen MH, Huang L, Ibrahim JG, Kim S (2008) Bayesian variable selection and computation for generalized linear models with conjugate priors. Bayesian Anal 3: 585–14CrossRefGoogle Scholar
  10. Chipman HA, George IE, McCulloch RE (1998) Bayesian CART model search (with discussion). J Am Stat Assoc 93: 935–60CrossRefGoogle Scholar
  11. Chipman HA, George IE, McCulloch RE (2001) The practical implementation of Bayesian model selection (with discussion). In: Lahiri P (eds) Model selection. Institute of Mathematical Statistics, Beachwood, pp 63–34Google Scholar
  12. Chipman HA, George IE, McCulloch RE (2003) Bayesian treed generalized linear models (with discussion). In: Bernardo JM, Bayarri M, Berger JO, Dawid AP, Heckerman D, Smith AFM (eds) Bayesian statistics, vol 7. Oxford University Press, Oxford, pp 85–03Google Scholar
  13. Clyde M (1999) Bayesian model averaging and model search strategies (with discussion). In: In:Bernardo JM, Berger JO, Dawid AP, Smith AFM (eds) Bayesian statistics, vol 6. Oxford University Press, Oxford, pp 157–85Google Scholar
  14. Clyde M, George IE (2004) Model uncertainty. Stat Sci 19: 81–4MATHCrossRefMathSciNetGoogle Scholar
  15. Cowles MK, Carlin BP (1996) Markov chain Monte Carlo convergence diagnostics: a comparative review. J Am Stat Assoc 91: 883–04MATHCrossRefMathSciNetGoogle Scholar
  16. Dellaportas P, Forster JJ (1999) Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models. Biometrika 86: 615–33MATHCrossRefMathSciNetGoogle Scholar
  17. Dey DK, Chen MH, Chang H (1997) Bayesian approach for the nonlinear random effects models. Biometrics 53: 1239–252MATHCrossRefGoogle Scholar
  18. Geisser S, Eddy W (1979) A predictive approach to model selection. J Am Stat Assoc 74: 153–60MATHCrossRefMathSciNetGoogle Scholar
  19. Gelfand AE, Dey DK (1994) Bayesian model choice: asymptotics and exact calculations. J R Stat Soc B 56: 501–14MATHMathSciNetGoogle Scholar
  20. Gelfand AE, Dey DK, Chang H (1992) Model determinating using predictive distributions with implementation via sampling-based methods (with discussion). In: Bernardo JM, Berger JO, Dawid AP, Smith AFM (eds) Bayesian statistics, vol 4. Oxford University Press, Oxford, pp 147–67Google Scholar
  21. Gelfand AE, Ghosh SK (1998) Model choice: a minimum posterior predictive loss approach. Biometrika 85: 1–3MATHCrossRefMathSciNetGoogle Scholar
  22. Gelman A, Meng XL, Stern HS (1996) Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Stat Sinica 6: 733–07MATHMathSciNetGoogle Scholar
  23. George EI (2000) The variable selection problem. J Am Stat Assoc 95: 1304–308MATHCrossRefMathSciNetGoogle Scholar
  24. George EI, Foster DP (2000) Calibration and empirical Bayes variable selection. Biometrika 87: 731–47MATHCrossRefMathSciNetGoogle Scholar
  25. George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88: 881–89CrossRefGoogle Scholar
  26. George EI, McCulloch RE (1997) Approaches for Bayesian variable selection. Stat Sinica 7: 339–74MATHGoogle Scholar
  27. George EI, McCulloch RE, Tsay R (1996) Two approaches to Bayesian model selection with applications. In: Berry D, Chaloner K, Geweke J (eds) Bayesian analysis in statistics and econometrics: essays in honor of Arnold Zellner. Wiley, New York, pp 339–48Google Scholar
  28. Gilks WR, Wild P (1992) Adaptive rejection sampling for Gibbs sampling. J R Stat Soc C (Appl Stat) 41: 337–48MATHGoogle Scholar
  29. Hanson TE (2006) Inference for mixtures of finite polya tree models. J Am Stat Assoc 101: 1548–565MATHCrossRefMathSciNetGoogle Scholar
  30. Huang L, Chen MH, Ibrahim JG (2005) Bayesian analysis for generalized linear models with nonignorably missing covariates. Biometrics 61: 767–80MATHCrossRefMathSciNetGoogle Scholar
  31. Ibrahim JG, Chen MH (2000) Power prior distributions for regression models. Stat Sci 15: 46–0CrossRefMathSciNetGoogle Scholar
  32. Ibrahim JG, Laud PW (1994) A Predictive approach to the analysis of designed experiments. J Am Stat Assoc 89: 309–19MATHCrossRefMathSciNetGoogle Scholar
  33. Ibrahim JG, Chen MH, McEachern SN (1999a) Bayesian variable selection for proportional hazards models. Can J Stat 27: 701–17MATHCrossRefGoogle Scholar
  34. Ibrahim JG, Lipsitz SR, Chen MH (1999b) Missing covariates in generalized linear models when the missing data mechanism is nonignorable. J R Stat Soc B 61: 173–90MATHCrossRefMathSciNetGoogle Scholar
  35. Ibrahim JG, Chen MH, Ryan LM (2000) Bayesian variable selection for time series count data. Stat Sinica 10: 971–87MATHMathSciNetGoogle Scholar
  36. Ibrahim JG, Chen MH, Sinha D (2001a) Bayesian survival analysis. Springer-Verlag, New YorkMATHGoogle Scholar
  37. Ibrahim JG, Chen MH, Sinha D (2001b) Criterion based methods for Bayesian model assessment. Stat Sinica 11: 419–43MATHMathSciNetGoogle Scholar
  38. Ibrahim JG, Chen MH, Lipsitz SR, Herring AH (2005) Missing data methods in regression models. J Am Stat Assoc 100: 332–46MATHCrossRefMathSciNetGoogle Scholar
  39. Kim S, Chen MH, Dey DK, Gamerman D (2007) Bayesian dynamic models for survival data with a cure fraction. Lifetime Data Anal 13: 17–5MATHCrossRefMathSciNetGoogle Scholar
  40. Laud PW, Ibrahim JG (1995) Predictive model selection. J R Stat Soc B 57: 247–62MATHMathSciNetGoogle Scholar
  41. Lipsitz SR, Ibrahim JG (1996) A conditional model for incomplete covariates in parametric regression models. Biometrika 83: 916–22MATHCrossRefGoogle Scholar
  42. Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New YorkMATHGoogle Scholar
  43. Ntzoufras I, Dellaportas P, Forster JJ (2003) Bayesian variable and link determination for generalised linear models. J Stat Plan Inference 111: 165–80MATHCrossRefMathSciNetGoogle Scholar
  44. Raftery AE (1996) Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83: 251–66MATHCrossRefMathSciNetGoogle Scholar
  45. Raftery AE, Madigan D, Hoeting JA (1997) Bayesian model averaging for linear regression models. J Am Stat Assoc 92: 179–91MATHCrossRefMathSciNetGoogle Scholar
  46. Rubin DB (1976) Inference and missing data. Biometrika 63: 581–92MATHCrossRefMathSciNetGoogle Scholar
  47. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–64MATHCrossRefGoogle Scholar
  48. Smith M, Kohn R (1996) Nonparametric regression using Bayesian variable selection. J Econom 75: 317–43MATHCrossRefGoogle Scholar
  49. Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian measures of model complexity and fit (with discussion). J R Stat Soc B 64: 583–39MATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Joseph G. Ibrahim
    • 1
  • Ming-Hui Chen
    • 2
  • Sungduk Kim
    • 3
  1. 1.Department of BiostatisticsUniversity of North CarolinaChapel HillUSA
  2. 2.Department of StatisticsUniversity of ConnecticutStorrsUSA
  3. 3.Division of Epidemiology, Statistics and Prevention ResearchNational Institute of Child Health and Human Development, NIHRockvilleUSA

Personalised recommendations