Abstract
For complete ultrahigh-dimensional data, sure independent screening methods can effectively reduce the dimensionality while retaining all the active variables with high probability. However, limited screening methods have been developed for ultrahigh-dimensional survival data subject to censoring. We propose a censored cumulative residual independent screening method that is model-free and enjoys the sure independent screening property. Active variables tend to be ranked above the inactive ones in terms of their association with the survival times. Compared with several existing methods, our model-free screening method works well with general survival models, and it is invariant to the monotone transformation of the responses, as well as requiring substantially weaker moment conditions. Numerical studies demonstrate the usefulness of the censored cumulative residual independent screening method, and the new approach is illustrated with a gene expression data set.
Similar content being viewed by others
References
Bitouzé D, Laurent B, Massart P (1999) A Dvoretzky–Kiefer–Wolfowitz type inequality for the Kaplan–Meier estimator. Annales de l’Institut Henri Poincare (B) Probab Stat 35:735–763
Candes E, Tao T (2007) The Dantzig selector: statistical estimation when \(p\) is much larger than \(n\). Ann Stat 35:2313–2351
Cook AJ, Gold DR, Li Y (2007) Spatial cluster detection for censored outcome data. Biometrics 63:540–549
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B 70:849–911
Fan J, Song R (2010) Sure independence screening in generalized linear models with NP-dimensionality. J Am Stat Assoc 38:3567–3604
Fan J, Samworth R, Wu Y (2009) Ultrahigh dimensional feature selection: beyond the linear model. J Mach Learn Res 10:2013–2038
Fan J, Feng Y, Wu Y (2010) High-dimensional variable selection for Cox’s proportional hazards model. In: Borrowing strength: theory powering applications—a Festschrift for Lawrence D. Brown, Institute of Mathematical Statistics 6:70–86
Fan J, Feng Y, Song R (2011) Nonparametric independence screening in sparse ultra-high-dimensional additive models. J Am Stat Assoc 106:544–557
Gorst-Rasmussen A, Scheike T (2013) Independent screening for single-index hazard rate models with ultrahigh dimensional features. J R Stat Soc Ser B 75:217–245
Hoeffding W (1948) A non-parametric test of independence. Ann Math Stat 19:546–557
Lin DY, Wei LJ, Ying Z (1993) Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika 80:557–572
Li R, Zhong W, Zhu L (2012) Feature screening via distance correlation learning. J Am Stat Assoc 107:1129–1139
Rosenwald A, Wright G, Wiestner A, Chan WC et al (2003) The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell 3:185–197
Serfling RJ (1980) Approximation theorems of mathematical statistics. Wiley, New York
Song R, Lu W, Ma S, Jeng XJ (2014) Censored rank independence screening for high-dimensional survival data. Biometrika 101:799–814
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58:267–288
Tibshirani R (2009) Univariate shrinkage in the Cox model for high dimensional data. Stat Appl Genet Mol Biol 8:1–18
Wu Y, Yin G (2015) Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika 102:65–76
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942
Zhao SD, Li Y (2012) Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivar Anal 105:397–411
Zhu LP, Li L, Li R, Zhu LX (2011) Model-free feature screening for ultrahigh dimensional data. J Am Stat Assoc 106:1464–1475
Zou H (2006) The adaptive Lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Acknowledgements
We would like to thank the editor, the associate editor, and the referees for their insightful comments, which immensely improved the work. This research was supported in part by grants (11371299, 11571263, 11671311) from the National Science Foundation of China and a grant (17125814) from the Research Grants Council of Hong Kong.
Author information
Authors and Affiliations
Corresponding author
Appendix: Theoretic Proofs
Appendix: Theoretic Proofs
Proof of Theorem 1
Let
and define
Straightforward calculations entail that
where
\({{\mathcal {O}}}_{ik}=(X_i,\Delta _i,Z_{ik})\), and the definitions of kernels \(h_{1}({{\mathcal {O}}}_{ik};{{\mathcal {O}}}_{jk};G,H)\) and \(h_{2}({{\mathcal {O}}}_{ik};{{\mathcal {O}}}_{jk};{{\mathcal {O}}}_{lk};G,H)\) in the U-statistics are clear from the context. Likewise, we have
where \(\widehat{D}_{ks}, s=1,2,\) are obtained by replacing G and H in \(\widetilde{D}_{ks}\) with \(\widehat{G}_n\) and \(\widehat{H}_n\) respectively.
First, we derive the exponential tail probability bound of \(P\big (\big |\Vert \widehat{d_k}\Vert _n^2-\Vert \widetilde{d_k}\Vert _n^2\big |\ge \upsilon n^{-\alpha }\big )\) for any positive constants \(\upsilon \) and \(\alpha \in [0, 1/2)\). Consider \(P(|\widehat{D}_{k1}-\widetilde{D}_{k1}|\ge \upsilon n^{-\alpha }/2)\) and note that
By condition C1 and the boundness of the indicator function, there exists a constant \(c_1\) such that
Denoting \(c_2=\min \{G(\tau ),H(\tau )\}\), we immediately have
where \(c_3=c_1/c_2\).
Using the similar argument, along with some tedious calculation, we also have
where \(c_{4}\) is a constant. It follows from (A.3) and Theorem 1 of Bitouzé et al. (1999) that
where \(\mu _1\) is a constant. Similarly, we also have
where \(\mu _2\) is a constant. Combining (A.1), (A.2), (A.4) and (A.5), we have
Second, we derive the exponential tail probability bound of \(P\big (\big |\Vert \widetilde{d_k}\Vert _n^2-\Vert d_k\Vert _n^2\big |\ge \upsilon n^{-\alpha }\big )\) for any positive constants \(\upsilon \) and \(0\le \alpha <1/2\).
Note that \(\Vert d_k\Vert _n^2=E\{h_{2}({{\mathcal {O}}}_{ik};{{\mathcal {O}}}_{jk};{{\mathcal {O}}}_{lk};G;H)\}=E(\widetilde{D}_{k2})\). Employing the Markov inequality, we obtain that, for any \(\epsilon >0\) and \(\xi >0\),
Serfling (1980, Section 5.1.6) showed that any U-statistic can be represented as an average of averages of i.i.d. random variables. We can rewrite
where \(\Sigma _{n!}\) denotes the summation over all possible permutations of \((1,\ldots ,n)\), and each \(D_{2}({{\mathcal {O}}}_{1k};\cdots ;{{\mathcal {O}}}_{nk};G,H)\) is an average of \(m\equiv [n/3]\) i.i.d. random variables. Denote \(\psi (\xi )=E[\exp \{\xi h_{2}({{\mathcal {O}}}_{ik};{{\mathcal {O}}}_{jk};{{\mathcal {O}}}_{lk};G,H)\}]\). Jensen’s inequality yields that
As a result,
Under condition C1, there exists a positive constant \(c_{5}\) such that \(P(|h_2|<c_{5})=1\). It follows from Lemma 1 in Li et al. (2012) that
which immediately entails that
by choosing \(\xi =\epsilon m/c_{5}^2\). It further follows from the symmetry of the U-statistic that
Using the similar argument, we also have
where \(c_{6}\) is a positive constant such that \(P(|h_1|< c_{6})=1\) and \(m^{*}=[n/2]\). Obviously, under condition C1, there exist constants \(c_{7}\) and \(c_{8}\) such that \(0\le \Vert d_k\Vert _n^2=E(\widetilde{D}_{k2})\le E|\widetilde{D}_{k2}|\le c_{7}\) and \(0\le E(\widetilde{D}_{k1})\le E|\widetilde{D}_{k1}|\le c_{8}\) for any \(1\le k \le p_n\). Taking \(\epsilon =\upsilon n^{-\alpha }\) and n large enough such that \((3n-2)n^{-2}E(\widetilde{D}_{k2})<\upsilon n^{-\alpha }\) and \((n-1)n^{-2}E(\widetilde{D}_{k1})<\upsilon n^{-\alpha }\), we have
by noting that \(m^*\ge m\ge n/4\), where \(c_{9}=1/(8c_{6}^2)\) and \(c_{10}=1/(8c_{5}^2)\). It follows from (A.6) and (A.7) that
where \(\eta =\min \{2c_{4}^{-2}\upsilon ^{2},c_{10}\upsilon ^2\}\). Immediately, we have
which proves the first part of Theorem 1 by taking \(c=8\upsilon \).
If \(\mathcal {A}\nsubseteq \widehat{\mathcal {A}}\), then there must exist some \(k\in \mathcal {A}\) such that \(\Vert \widehat{d_k}\Vert _n^2< cn^{-\alpha }\). It follows from condition C2 that \(|\Vert \widehat{d_k}\Vert _n^2-\Vert d_k\Vert _n^2|>cn^{-\alpha }\) for some \(k\in \mathcal {A}\), which implies that \(\{\mathcal {A}\nsubseteq \widehat{\mathcal {A}}\}\subseteq \{|\Vert \widehat{d_k}\Vert _n^2-\Vert d_k\Vert _n^2|>cn^{-\alpha }\) for some \(k\;\in \mathcal {A}\}\). As a result, \(\{\max _{k\in \mathcal {A}}|\Vert \widehat{d_k}\Vert _n^2-\Vert d_k\Vert _n^2|\le cn^{-\alpha }\}\subseteq \{\mathcal {A}\subseteq \widehat{\mathcal {A}}\}\). Using (A.8), we have
where \(a_n=|\mathcal {A}|\). Thus, the proof of Theorem 1 is completed. \(\square \)
Proof of Theorem 2
Under assumption (i), we rewrite
If \(k\notin \mathcal {A}\), then assumption (ii) implies that
for any t and z. As a result, \(\Vert d_k\Vert _n^2=E\{d_k(X,Z_k)^2\}=0\). It follows from condition C2 that \(\max _{k\notin \mathcal {A}}\Vert d_k\Vert _n^2 < \min _{k\in \mathcal {A}}\Vert d_k\Vert _n^2\). On the other hand, \(\Vert d_k\Vert _n^2=0\) directly implies that \(k\notin \mathcal {A}\) under condition C2. Thus, the first part of Theorem 2 is proved.
Under condition C2 and assumptions (i) and (ii), coupled with (A.9), we have
which completes the proof of Theorem 2. \(\square \)
Rights and permissions
About this article
Cite this article
Zhang, J., Yin, G., Liu, Y. et al. Censored cumulative residual independent screening for ultrahigh-dimensional survival data. Lifetime Data Anal 24, 273–292 (2018). https://doi.org/10.1007/s10985-017-9395-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-017-9395-2