Skip to main content
Log in

Model-free feature screening for high-dimensional survival data

  • Articles
  • Published:
Science China Mathematics Aims and scope Submit manuscript

Abstract

With the rapid-growth-in-size scientific data in various disciplines, feature screening plays an important role to reduce the high-dimensionality to a moderate scale in many scientific fields. In this paper, we introduce a unified and robust model-free feature screening approach for high-dimensional survival data with censoring, which has several advantages: it is a model-free approach under a general model framework, and hence avoids the complication to specify an actual model form with huge number of candidate variables; under mild conditions without requiring the existence of any moment of the response, it enjoys the ranking consistency and sure screening properties in ultra-high dimension. In particular, we impose a conditional independence assumption of the response and the censoring variable given each covariate, instead of assuming the censoring variable is independent of the response and the covariates. Moreover, we also propose a more robust variant to the new procedure, which possesses desirable theoretical properties without any finite moment condition of the predictors and the response. The computation of the newly proposed methods does not require any complicated numerical optimization and it is fast and easy to implement. Extensive numerical studies demonstrate that the proposed methods perform competitively for various configurations. Application is illustrated with an analysis of a genetic data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Beran R. Nonparametric regression with randomly censored survival data. Technical report. Berkeley: University of California, 1981

    Google Scholar 

  2. Bradic J, Fan J, Jiang J. Regularization for Cox’s proportional hazards model with NP-dimensionality. Ann Statist, 2011, 39: 3092–3120

    Article  MathSciNet  MATH  Google Scholar 

  3. Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n: Ann Statist, 2007, 35: 2313–2351

    Google Scholar 

  4. Cui H, Li R, Zhong W. Model-free feature screening for ultrahigh dimensional discriminant analysis. J Amer Statist Assoc, 2015, 110: 630–641

    Article  MathSciNet  MATH  Google Scholar 

  5. Dave S S, Wright G, Tan B, et al. Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. N Engl J Med, 2004, 351: 2159–2169

    Article  Google Scholar 

  6. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc, 2001, 96: 1348–1360

    Article  MathSciNet  MATH  Google Scholar 

  7. Fan J, Li R. Variable selection for Cox’s proportional hazards model and frailty model. Ann Statist, 2002, 30: 74–99

    Article  MathSciNet  MATH  Google Scholar 

  8. Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B Stat Methodol, 2008, 70: 849–911

    Article  MathSciNet  Google Scholar 

  9. Fan J, Samworth R, Wu Y. Ultrahigh dimensional feature selection: beyond the linear model. J Mach Learn Res, 2009, 10: 2013–2038

    MathSciNet  MATH  Google Scholar 

  10. Fan J, Song R. Sure independence screening in generalized linear models with NP-dimensionality. Ann Statist, 2010, 38: 3567–3604

    Article  MathSciNet  MATH  Google Scholar 

  11. Gorst-Rasmussen A, Scheike T. Independent screening for single-index hazard rate models with ultrahigh dimensional features. J R Stat Soc Ser B Stat Methodol, 2013, 75: 217–245

    Article  MathSciNet  Google Scholar 

  12. He X, Wang L, Hong H G. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann Statist, 2013, 41: 342–369

    Article  MathSciNet  MATH  Google Scholar 

  13. Hoeffding W. Probability inequalities for sums of bounded random variables. J Amer Statist Assoc, 1963, 58: 13–30

    Article  MathSciNet  MATH  Google Scholar 

  14. Huang J, Sun T, Ying Z, et al. Oracle inequalities for the lasso in the Cox model. Ann Statist, 2013, 41: 1142–1165

    Article  MathSciNet  MATH  Google Scholar 

  15. Iglewicz B, Hoaglin D C. How to Detect and Handle Outliers. Milwaukee: American Society for Quality Control, 1993

    Google Scholar 

  16. Kosorok M R. Introduction to Empirical Processes and Semiparametric Inference. New York: Springer, 2006

    MATH  Google Scholar 

  17. Li G, Peng H, Zhang J, et al. Robust rank correlation based screening. Ann Statist, 2012, 40: 1846–1877

    Article  MathSciNet  MATH  Google Scholar 

  18. Li R, Zhong W, Zhu L. Feature screening via distance correlation learning. J Amer Statist Assoc, 2012, 107: 1129–1139

    Article  MathSciNet  MATH  Google Scholar 

  19. Massart P. About the constants in Talagrand’s concentration inequalities for empirical processes. Ann Probab, 2000, 28: 863–884

    Article  MathSciNet  MATH  Google Scholar 

  20. Song R, Lu W, Ma S. Censored rank independence screening for high-dimensional survival data. Biometrika, 2014, 104: 799–814

    Article  MathSciNet  MATH  Google Scholar 

  21. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Stat Methodol, 1996, 58: 267–288

    MathSciNet  MATH  Google Scholar 

  22. Tibshirani R. The lasso method for variable selection in the Cox model. Stat Med, 1997, 16: 385–395

    Article  Google Scholar 

  23. Tibshirani R. Univariate shrinkage in the Cox model for high dimensional data. Stat Appl Genet Mol Biol, 2009, 8: 1–18

    Article  MathSciNet  MATH  Google Scholar 

  24. Uno H, Cai T, Pencina M J, et al. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med, 2011, 30: 1105–1117

    MathSciNet  Google Scholar 

  25. Wu Y, Yin G. Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika, 2015, 102: 65–76

    Article  MathSciNet  MATH  Google Scholar 

  26. Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol, 2006, 68: 49–67

    Article  MathSciNet  MATH  Google Scholar 

  27. Zhang H H, Lu W. Adaptive Lasso for Cox’s proportional hazards model. Biometrika, 2007, 94: 691–703

    Article  MathSciNet  MATH  Google Scholar 

  28. Zhao S D, Li Y. Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivariate Anal, 2012, 105: 397–411

    Article  MathSciNet  MATH  Google Scholar 

  29. Zhong W. Robust sure independence screening for ultrahigh dimensional non-normal data. Acta Math Sin Engl Ser, 2014, 30: 1885–1896

    Article  MathSciNet  MATH  Google Scholar 

  30. Zhou T, Zhu L. Model-free feature screening for ultrahigh dimensional censored regression. Stat Comput, 2017, 27: 947–961

    Article  MathSciNet  MATH  Google Scholar 

  31. Zhu L, Li L, Li R, et al. Model-free feature screening for ultrahigh-dimensional data. J Amer Statist Assoc, 2011, 106: 1464–1474

    Article  MathSciNet  MATH  Google Scholar 

  32. Zou H. The adaptive lasso and its oracle properties. J Amer Statist Assoc, 2006, 101: 1418–1429

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the Research Grant Council of Hong Kong (Grant Nos. 509413 and 14311916), Direct Grants for Research of The Chinese University of Hong Kong (Grant Nos. 3132754 and 4053235), the Natural Science Foundation of Jiangxi Province (Grant No. 20161BAB201024), the Key Science Fund Project of Jiangxi Province Eduction Department (Grant No. GJJ150439), the National Natural Science Foundation of China (Grant Nos. 11461029, 11601197 and 61562030) and the Canadian Institutes of Health Research (Grant No. 145546). The authors are grateful to the two reviewers for their insightful comments that lead to substantial improvements in the paper. The authors are also thankful to Professor Liping Zhu for his constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meiling Hao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, Y., Liu, X. & Hao, M. Model-free feature screening for high-dimensional survival data. Sci. China Math. 61, 1617–1636 (2018). https://doi.org/10.1007/s11425-016-9116-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11425-016-9116-6

Keywords

MSC(2010)

Navigation