Gene Selection and Survival Prediction Under Dependent Censoring

  • Takeshi Emura
  • Yi-Hau Chen
Part of the SpringerBriefs in Statistics book series (BRIEFSSTATIST)


To select genes that are predictive of survival, univariate selection based on the Cox model has been routinely employed in biomedical research. However, this conventional approach relies on the independent censoring assumption, which is often an unrealistic assumption in many biomedical applications. We introduce an alternative approach to selecting genes by utilizing copulas to account for the effect of dependent censoring. We also introduce a method to construct a predictor based on the selected genes to predict patient survival. We use the non-small-cell lung cancer data to demonstrate the copula-based procedure for selecting genes, developing a predictor, and validating the predictor. We provide detailed instructions to implement the proposed statistical methods and to reproduce the real data analyses through the compound.Cox R package.


Clayton’s copula Competing risk Compound covariate Copula-graphic estimator Cox regression C-index Gene expression Overall survival Univariate selection 


  1. Alizadeh AA, Gentles AJ, Alencar AJ, Liu CL, Kohrt HE et al (2011) Prediction of survival in diffuse large B-cell lymphoma based on the expression of 2 genes reflecting tumor and microenvironment. Blood 118(5):1350–1358CrossRefGoogle Scholar
  2. Beer DG, Kardia SLR, Huang CC, Giordano TJ, Levin AM et al (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8:816–824CrossRefGoogle Scholar
  3. Bøvelstad HM, Nygård S, Storvold HL, Aldrin M, Borgan Ø et al (2007) Predicting survival from microarray data—a comparative study. Bioinformatics 23:2080–2087CrossRefGoogle Scholar
  4. Bøvelstad HM, Nygård S, Borgan Ø (2009) Survival prediction from clinico-genomic models-a comparative study. BMC Bioinf 10(1):1CrossRefGoogle Scholar
  5. Chen YH (2010) Semiparametric marginal regression analysis for dependent competing risks under an assumed copula. J R Stat Soc Ser B Stat Methodol 72:235–251MathSciNetCrossRefGoogle Scholar
  6. Chen HY, Yu SL, Chen CH, Chang GC, Chen CY et al (2007) A five-gene signature and clinical outcome in non-small-cell lung cancer. N Engl J Med 356:11–20CrossRefGoogle Scholar
  7. Emura T, Chen YH, Chen HY (2012). Survival prediction based on compound covariate under Cox proportional hazard models. PLoS One 7(10): e47627,
  8. Emura T, Chen HY, Matsui S, Chen YH (2018). compound.Cox: univariate feature selection and compound covariate for predicting survival, CRANGoogle Scholar
  9. Emura T, Chen YH (2016) Gene selection for survival data under dependent censoring, a copula-based approach. Stat Methods Med Res 25(6):2840–2857MathSciNetCrossRefGoogle Scholar
  10. Emura T, Michimae H (2017) A copula-based inference to piecewise exponential models under dependent censoring, with application to time to metamorphosis of salamander larvae. Environ Ecol Stat 24(1):151–173MathSciNetCrossRefGoogle Scholar
  11. Emura T, Nakatochi M, Matsui S, Michimae H, Rondeau V (2017) Personalized dynamic prediction of death according to tumour progression and high-dimensional genetic factors: meta-analysis with a joint model. Stat Methods Med Res,
  12. Escarela G, Carrière JF (2003) Fitting competing risks with an assumed copula. Stat Methods Med Res 12(4):333–349MathSciNetCrossRefzbMATHGoogle Scholar
  13. Frankel PH, Reid ME, Marshall JR (2007) A permutation test for a weighted Kaplan-Meier estimator with application to the nutritional prevention of cancer trial. Contemp Clin Trial 28:343–347CrossRefGoogle Scholar
  14. Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA (1982) Evaluating the yield of medical tests. JAMA 247:2543–2546CrossRefGoogle Scholar
  15. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New YorkCrossRefzbMATHGoogle Scholar
  16. Jenssen TK, Kuo WP, Stokke T, Hovig E (2002) Association between gene expressions in breast cancer and patient survival. Hum Genet 111:411–420CrossRefGoogle Scholar
  17. Klein JP, Moeschberger ML (2003) Survival analysis techniques for censored and truncated data. Springer, New YorkzbMATHGoogle Scholar
  18. Lossos IS, Czerwinski DK, Alizadeh AA, Wechser MA, Tibshirani R, Botstein D, Levy R (2004) Prediction of survival in diffuse large-B-cell lymphoma based on the expression of six genes. N Engl J Med 350(18):1828–1837CrossRefGoogle Scholar
  19. Matsui S (2006) Predicting survival outcomes using subsets of significant genes in prognostic marker studies with microarrays. BMC Bioinf 7:156CrossRefGoogle Scholar
  20. Matsui S, Simon RM, Qu P, Shaughnessy JD, Barlogie B, Crowley J (2012) Developing and validating continuous genomic signatures in randomized clinical trials for predictive medicine. Clin Cancer Res 18(21):6065–6073CrossRefGoogle Scholar
  21. Michiels S, Koscielny S, Hill C (2005) Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365(9458):488–492CrossRefGoogle Scholar
  22. Moradian H, Denis Larocque D, Bellavance F (2017). Survival forests for data with dependent censoring. Stat Methods Med Res,
  23. Nelsen RB (2006) An introduction to copulas, 2nd edn. Springer, New YorkzbMATHGoogle Scholar
  24. Pepe MS, Fleming TR (1989). Weighted Kaplan-Meier statistics: a class of distance tests for censored survival data. Biometrics: 497–507Google Scholar
  25. Popple A, Durrant LG, Spendlove I, Scott PRI, Deen S, Ramage JM (2012) The chemokine, CXCL12, is an independent predictor of poor survival in ovarian cancer. Br J Cancer 106:1306–1313CrossRefGoogle Scholar
  26. Rivest LP, Wells MT (2001) A martingale approach to the copula-graphic estimator for the survival function under dependent censoring. J Multivar Anal 79:138–155MathSciNetCrossRefzbMATHGoogle Scholar
  27. Sabatier R, Finetti P, Adelaide J, Guille A, Borg JP, Chaffanet M, Bertucci F (2011) Down-regulation of ECRG4, a candidate tumor suppressor gene, in human breast cancer. PLoS One 6(11):e27656CrossRefGoogle Scholar
  28. Schumacher M, Binder H, Gerds T (2007) Assessment of survival prediction models based on microarray data. Bioinformatics 23(14):1768–1774CrossRefGoogle Scholar
  29. Shedden K, Taylor JMG, Enkemann SA, Tsao MS, Yeatman TJ et al (2008) Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med 14:822–827CrossRefGoogle Scholar
  30. Siannis F, Copas J, Lu G (2005) Sensitivity analysis for informative censoring in parametric survival models. Biostatistics 6(1):77–91CrossRefzbMATHGoogle Scholar
  31. Sklar A (1959) Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut de Statistique de L’Université de Paris. 8:229–31Google Scholar
  32. Tsiatis A (1975) A nonidentifiability aspect of the problem of competing risks. Proc Natl Acad Sci 72(1):20–22MathSciNetCrossRefzbMATHGoogle Scholar
  33. Tukey JW (1993) Tightening the clinical trial. Control Clin Trials 14:266–285CrossRefGoogle Scholar
  34. Yoshihara K, Tajima A, Yahata T, Kodama S, Fujiwara H et al (2010) Gene expression profile for predicting survival in advanced-stage serous ovarian cancer across two independent datasets. PLoS One 5(3):e9615CrossRefGoogle Scholar
  35. Yoshihara K, Tsunoda T, Shigemizu D, Fujiwara H, Hatae M et al (2012) High-risk ovarian cancer based on 126-gene expression signature is uniquely characterized by downregulation of antigen presentation pathway. Clin Cancer Res 18(5):1374–1385CrossRefGoogle Scholar
  36. van Wieringen WN, Kun D, Hampel R, Boulesteix AL (2009) Survival prediction using gene expression data: a review and comparison. Comput Stat Data Anal 53(5):1590–1603MathSciNetCrossRefzbMATHGoogle Scholar
  37. Verveij PJM, van Houwelingen HC (1993) Crossvalidation in survival analysis. Stat Med 12:2305–2314CrossRefGoogle Scholar
  38. Waldron L, Haibe-Kains B, Culhane AC, Riester M, Ding J et al. (2014) Comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer. J Natl Cancer Inst 106(5): dju049Google Scholar
  39. Wang Y, Klijn JG, Zhang Y, Sieuwerts AM et al (2005) Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365(9460):671–679CrossRefGoogle Scholar
  40. Witten DM, Tibshirani R (2010) Survival analysis with high-dimensional covariates. Stat Methods Med Res 19(1):29–51MathSciNetCrossRefGoogle Scholar
  41. Zhao X, Rødland EA, Sørlie T, Naume B, Langerød A et al (2011) Combining gene signatures improves prediction of breast cancer survival. PLoS One 6(3):e17845CrossRefGoogle Scholar
  42. Zhao SD, Parmigiani G, Huttenhower C, Waldron L (2014) Más-o-menos: a simple sign averaging method for discrimination in genomic data analysis. Bioinformatics 30(21):3062–3069CrossRefGoogle Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  • Takeshi Emura
    • 1
  • Yi-Hau Chen
    • 2
  1. 1.Graduate Institute of StatisticsNational Central UniversityTaoyuanTaiwan
  2. 2.Institute of Statistical ScienceAcademia SinicaTaipeiTaiwan

Personalised recommendations