Advertisement

Science China Mathematics

, Volume 61, Issue 9, pp 1617–1636 | Cite as

Model-free feature screening for high-dimensional survival data

  • Yuanyuan Lin
  • Xianhui Liu
  • Meiling Hao
Articles
  • 45 Downloads

Abstract

With the rapid-growth-in-size scientific data in various disciplines, feature screening plays an important role to reduce the high-dimensionality to a moderate scale in many scientific fields. In this paper, we introduce a unified and robust model-free feature screening approach for high-dimensional survival data with censoring, which has several advantages: it is a model-free approach under a general model framework, and hence avoids the complication to specify an actual model form with huge number of candidate variables; under mild conditions without requiring the existence of any moment of the response, it enjoys the ranking consistency and sure screening properties in ultra-high dimension. In particular, we impose a conditional independence assumption of the response and the censoring variable given each covariate, instead of assuming the censoring variable is independent of the response and the covariates. Moreover, we also propose a more robust variant to the new procedure, which possesses desirable theoretical properties without any finite moment condition of the predictors and the response. The computation of the newly proposed methods does not require any complicated numerical optimization and it is fast and easy to implement. Extensive numerical studies demonstrate that the proposed methods perform competitively for various configurations. Application is illustrated with an analysis of a genetic data set.

Keywords

feature screening random censoring robustness sure independence screening ultra-high dimension 

MSC(2010)

35J60 35J70 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

This work was supported by the Research Grant Council of Hong Kong (Grant Nos. 509413 and 14311916), Direct Grants for Research of The Chinese University of Hong Kong (Grant Nos. 3132754 and 4053235), the Natural Science Foundation of Jiangxi Province (Grant No. 20161BAB201024), the Key Science Fund Project of Jiangxi Province Eduction Department (Grant No. GJJ150439), the National Natural Science Foundation of China (Grant Nos. 11461029, 11601197 and 61562030) and the Canadian Institutes of Health Research (Grant No. 145546). The authors are grateful to the two reviewers for their insightful comments that lead to substantial improvements in the paper. The authors are also thankful to Professor Liping Zhu for his constructive comments.

References

  1. 1.
    Beran R. Nonparametric regression with randomly censored survival data. Technical report. Berkeley: University of California, 1981Google Scholar
  2. 2.
    Bradic J, Fan J, Jiang J. Regularization for Cox’s proportional hazards model with NP-dimensionality. Ann Statist, 2011, 39: 3092–3120MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n: Ann Statist, 2007, 35: 2313–2351Google Scholar
  4. 4.
    Cui H, Li R, Zhong W. Model-free feature screening for ultrahigh dimensional discriminant analysis. J Amer Statist Assoc, 2015, 110: 630–641MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Dave S S, Wright G, Tan B, et al. Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. N Engl J Med, 2004, 351: 2159–2169CrossRefGoogle Scholar
  6. 6.
    Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc, 2001, 96: 1348–1360MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Fan J, Li R. Variable selection for Cox’s proportional hazards model and frailty model. Ann Statist, 2002, 30: 74–99MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B Stat Methodol, 2008, 70: 849–911MathSciNetCrossRefGoogle Scholar
  9. 9.
    Fan J, Samworth R, Wu Y. Ultrahigh dimensional feature selection: beyond the linear model. J Mach Learn Res, 2009, 10: 2013–2038MathSciNetzbMATHGoogle Scholar
  10. 10.
    Fan J, Song R. Sure independence screening in generalized linear models with NP-dimensionality. Ann Statist, 2010, 38: 3567–3604MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Gorst-Rasmussen A, Scheike T. Independent screening for single-index hazard rate models with ultrahigh dimensional features. J R Stat Soc Ser B Stat Methodol, 2013, 75: 217–245MathSciNetCrossRefGoogle Scholar
  12. 12.
    He X, Wang L, Hong H G. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann Statist, 2013, 41: 342–369MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Hoeffding W. Probability inequalities for sums of bounded random variables. J Amer Statist Assoc, 1963, 58: 13–30MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Huang J, Sun T, Ying Z, et al. Oracle inequalities for the lasso in the Cox model. Ann Statist, 2013, 41: 1142–1165MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Iglewicz B, Hoaglin D C. How to Detect and Handle Outliers. Milwaukee: American Society for Quality Control, 1993Google Scholar
  16. 16.
    Kosorok M R. Introduction to Empirical Processes and Semiparametric Inference. New York: Springer, 2006zbMATHGoogle Scholar
  17. 17.
    Li G, Peng H, Zhang J, et al. Robust rank correlation based screening. Ann Statist, 2012, 40: 1846–1877MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Li R, Zhong W, Zhu L. Feature screening via distance correlation learning. J Amer Statist Assoc, 2012, 107: 1129–1139MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Massart P. About the constants in Talagrand’s concentration inequalities for empirical processes. Ann Probab, 2000, 28: 863–884MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Song R, Lu W, Ma S. Censored rank independence screening for high-dimensional survival data. Biometrika, 2014, 104: 799–814MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Stat Methodol, 1996, 58: 267–288MathSciNetzbMATHGoogle Scholar
  22. 22.
    Tibshirani R. The lasso method for variable selection in the Cox model. Stat Med, 1997, 16: 385–395CrossRefGoogle Scholar
  23. 23.
    Tibshirani R. Univariate shrinkage in the Cox model for high dimensional data. Stat Appl Genet Mol Biol, 2009, 8: 1–18MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Uno H, Cai T, Pencina M J, et al. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med, 2011, 30: 1105–1117MathSciNetGoogle Scholar
  25. 25.
    Wu Y, Yin G. Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika, 2015, 102: 65–76MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol, 2006, 68: 49–67MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Zhang H H, Lu W. Adaptive Lasso for Cox’s proportional hazards model. Biometrika, 2007, 94: 691–703MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Zhao S D, Li Y. Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivariate Anal, 2012, 105: 397–411MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Zhong W. Robust sure independence screening for ultrahigh dimensional non-normal data. Acta Math Sin Engl Ser, 2014, 30: 1885–1896MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Zhou T, Zhu L. Model-free feature screening for ultrahigh dimensional censored regression. Stat Comput, 2017, 27: 947–961MathSciNetCrossRefzbMATHGoogle Scholar
  31. 31.
    Zhu L, Li L, Li R, et al. Model-free feature screening for ultrahigh-dimensional data. J Amer Statist Assoc, 2011, 106: 1464–1474MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Zou H. The adaptive lasso and its oracle properties. J Amer Statist Assoc, 2006, 101: 1418–1429MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Science China Press and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of StatisticsThe Chinese University of Hong KongHong KongChina
  2. 2.School of Statistics and Research Center of Applied StatisticsJiangxi University of Finance and EconomicsNanchangChina
  3. 3.The Princess Margaret Cancer CenterUniversity Health NetworkTorontoCanada

Personalised recommendations