Abstract
With the rapid-growth-in-size scientific data in various disciplines, feature screening plays an important role to reduce the high-dimensionality to a moderate scale in many scientific fields. In this paper, we introduce a unified and robust model-free feature screening approach for high-dimensional survival data with censoring, which has several advantages: it is a model-free approach under a general model framework, and hence avoids the complication to specify an actual model form with huge number of candidate variables; under mild conditions without requiring the existence of any moment of the response, it enjoys the ranking consistency and sure screening properties in ultra-high dimension. In particular, we impose a conditional independence assumption of the response and the censoring variable given each covariate, instead of assuming the censoring variable is independent of the response and the covariates. Moreover, we also propose a more robust variant to the new procedure, which possesses desirable theoretical properties without any finite moment condition of the predictors and the response. The computation of the newly proposed methods does not require any complicated numerical optimization and it is fast and easy to implement. Extensive numerical studies demonstrate that the proposed methods perform competitively for various configurations. Application is illustrated with an analysis of a genetic data set.
Similar content being viewed by others
References
Beran R. Nonparametric regression with randomly censored survival data. Technical report. Berkeley: University of California, 1981
Bradic J, Fan J, Jiang J. Regularization for Cox’s proportional hazards model with NP-dimensionality. Ann Statist, 2011, 39: 3092–3120
Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n: Ann Statist, 2007, 35: 2313–2351
Cui H, Li R, Zhong W. Model-free feature screening for ultrahigh dimensional discriminant analysis. J Amer Statist Assoc, 2015, 110: 630–641
Dave S S, Wright G, Tan B, et al. Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. N Engl J Med, 2004, 351: 2159–2169
Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc, 2001, 96: 1348–1360
Fan J, Li R. Variable selection for Cox’s proportional hazards model and frailty model. Ann Statist, 2002, 30: 74–99
Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B Stat Methodol, 2008, 70: 849–911
Fan J, Samworth R, Wu Y. Ultrahigh dimensional feature selection: beyond the linear model. J Mach Learn Res, 2009, 10: 2013–2038
Fan J, Song R. Sure independence screening in generalized linear models with NP-dimensionality. Ann Statist, 2010, 38: 3567–3604
Gorst-Rasmussen A, Scheike T. Independent screening for single-index hazard rate models with ultrahigh dimensional features. J R Stat Soc Ser B Stat Methodol, 2013, 75: 217–245
He X, Wang L, Hong H G. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann Statist, 2013, 41: 342–369
Hoeffding W. Probability inequalities for sums of bounded random variables. J Amer Statist Assoc, 1963, 58: 13–30
Huang J, Sun T, Ying Z, et al. Oracle inequalities for the lasso in the Cox model. Ann Statist, 2013, 41: 1142–1165
Iglewicz B, Hoaglin D C. How to Detect and Handle Outliers. Milwaukee: American Society for Quality Control, 1993
Kosorok M R. Introduction to Empirical Processes and Semiparametric Inference. New York: Springer, 2006
Li G, Peng H, Zhang J, et al. Robust rank correlation based screening. Ann Statist, 2012, 40: 1846–1877
Li R, Zhong W, Zhu L. Feature screening via distance correlation learning. J Amer Statist Assoc, 2012, 107: 1129–1139
Massart P. About the constants in Talagrand’s concentration inequalities for empirical processes. Ann Probab, 2000, 28: 863–884
Song R, Lu W, Ma S. Censored rank independence screening for high-dimensional survival data. Biometrika, 2014, 104: 799–814
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Stat Methodol, 1996, 58: 267–288
Tibshirani R. The lasso method for variable selection in the Cox model. Stat Med, 1997, 16: 385–395
Tibshirani R. Univariate shrinkage in the Cox model for high dimensional data. Stat Appl Genet Mol Biol, 2009, 8: 1–18
Uno H, Cai T, Pencina M J, et al. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med, 2011, 30: 1105–1117
Wu Y, Yin G. Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika, 2015, 102: 65–76
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol, 2006, 68: 49–67
Zhang H H, Lu W. Adaptive Lasso for Cox’s proportional hazards model. Biometrika, 2007, 94: 691–703
Zhao S D, Li Y. Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivariate Anal, 2012, 105: 397–411
Zhong W. Robust sure independence screening for ultrahigh dimensional non-normal data. Acta Math Sin Engl Ser, 2014, 30: 1885–1896
Zhou T, Zhu L. Model-free feature screening for ultrahigh dimensional censored regression. Stat Comput, 2017, 27: 947–961
Zhu L, Li L, Li R, et al. Model-free feature screening for ultrahigh-dimensional data. J Amer Statist Assoc, 2011, 106: 1464–1474
Zou H. The adaptive lasso and its oracle properties. J Amer Statist Assoc, 2006, 101: 1418–1429
Acknowledgements
This work was supported by the Research Grant Council of Hong Kong (Grant Nos. 509413 and 14311916), Direct Grants for Research of The Chinese University of Hong Kong (Grant Nos. 3132754 and 4053235), the Natural Science Foundation of Jiangxi Province (Grant No. 20161BAB201024), the Key Science Fund Project of Jiangxi Province Eduction Department (Grant No. GJJ150439), the National Natural Science Foundation of China (Grant Nos. 11461029, 11601197 and 61562030) and the Canadian Institutes of Health Research (Grant No. 145546). The authors are grateful to the two reviewers for their insightful comments that lead to substantial improvements in the paper. The authors are also thankful to Professor Liping Zhu for his constructive comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, Y., Liu, X. & Hao, M. Model-free feature screening for high-dimensional survival data. Sci. China Math. 61, 1617–1636 (2018). https://doi.org/10.1007/s11425-016-9116-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11425-016-9116-6