Abstract
This paper considers the feature screening and variable selection for ultrahigh dimensional covariates. The new feature screening procedure base on conditional expectation which is used to differentiate whether an explanatory variable contributes to a response variable or not, without requiring a specific parametric form of the underlying data model. The authors estimate the marginal conditional expectation by kernel regression estimator. The proposed method is showed to have sure screen property. The authors propose an iterative kernel estimator algorithm to reduce the ultrahigh dimensionality to an appropriate scale. Simulation results and real data analysis demonstrate the proposed method works well and performs better than competing methods.
Similar content being viewed by others
References
Fan J and Lü J, Sure independence screening for ultrahigh dimensional feature space (with discussion), Journal of the Royal Statistical Society: Series B, 2008, 70: 849–911.
Hastie T, Tibshirani R, and Friedman J, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed., Springer, New York, 2009.
Buhlmann P and van de Geer S, Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer, Heidelberg, 2011.
Hall P and Miller H, Using generalized correlation to effect variable selection in very high dimensional problems, Journal of Computational and Graphical Statistics, 2009, 18: 533–550.
Fan J and Song R, Sure independence screening in generalized linear models with NPdimensionality, Annals of Statistics, 2010, 38: 3567–3604.
Fan J, Feng Y, and Song R, Nonparametric independence screening in sparse ultra-high-dimensional additive models, Journal of the American Statistics Association, 2011, 106: 544–557.
Fan J and Gijbels I, Local Polynomial Modeling and Its Applications, Chapman and Hall, New York, 1996.
Härdle W, Applied nonparametric regression, Econometric Society Monographs 19, Cambridge University Press, Cambridge, 1990.
Wang H, forward regression for ultra-high dimension variable screening, Journal of the American Statistics Association, 2009, 104: 1512–1524.
Fan J and Li R, Variable selection via nonconcave penalized likelihood and it oracle properties, Journal of the American Statistics Association, 2001, 96: 1348–1360.
Fan J, Samworth R, and Wu Y, Ultra-dimensional variable selection via independent learning: Beyond the linear model, Journal of Machine Learning Research, 2009, 10: 1829–1853.
Ruppert D, Sheather S, and Wand M, An effective bandwidth selector for local least squares regression, Journal of the American Statistics Association, 1995, 90: 1257–1270.
Ravikumar P, Liu H, Lafferty J, et al., Spam: Sparse additive models, Journal of the Royal Statistical Society: Series B, 2009, 71: 1009–1030.
Chiang A, Beck J, Yen H, et al, Homozygosity mapping with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a bardetiedl syndrome gene (BBS11), Proceedings of the National Academy of Sciences, 2006, 103: 6287–6292.
Author information
Authors and Affiliations
Corresponding author
Additional information
The research was supported in part by the National Natural Science Foundation of China under Grant Nos. 11571112, 11501372, 11571148, 11471160, Doctoral Fund of Ministry of Education of China under Grant No. 20130076110004, Program of Shanghai Subject Chief Scientist under Grant No. 14XD1401600, and the 111 Project of China under Grant No. B14019.
This paper was recommended for publication by Editor SUN Liuquan.
Rights and permissions
About this article
Cite this article
Zhang, J., Zhang, R. & Zhang, J. Feature Screening for Nonparametric and Semiparametric Models with Ultrahigh-Dimensional Covariates. J Syst Sci Complex 31, 1350–1361 (2018). https://doi.org/10.1007/s11424-017-6310-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-017-6310-6