Variable Selection and Feature Screening

  • Wanjun Liu
  • Runze LiEmail author
Part of the Advanced Studies in Theoretical and Applied Econometrics book series (ASTA, volume 52)


This chapter provides a selective review on feature screening methods for ultra-high dimensional data. The main idea of feature screening is reducing the ultra-high dimensionality of the feature space to a moderate size in a fast and efficient way and meanwhile retaining all the important features in the reduced feature space. This is referred to as the sure screening property. After feature screening, more sophisticated methods can be applied to reduced feature space for further analysis such as parameter estimation and statistical inference. This chapter only focuses on the feature screening stage. From the perspective of different types of data, we review feature screening methods for independent and identically distributed data, longitudinal data, and survival data. From the perspective of modeling, we review various models including linear model, generalized linear model, additive model, varying-coefficient model, Cox model, etc. We also cover some model-free feature screening procedures.



This work was supported by a NSF grant DMS 1820702 and NIDA, NIH grant P50 DA039838. The content is solely the responsibility of the authors and does not necessarily represent the official views of NSF, NIH, or NIDA.


  1. Candes, E., & Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 35(6), 2313–2351.CrossRefGoogle Scholar
  2. Carroll, R. J., Fan, J., Gijbels, I., & Wand, M. P. (1997). Generalized partially linear single-index models. Journal of the American Statistical Association, 92(438), 477–489.CrossRefGoogle Scholar
  3. Cheng, M.-Y., Honda, T., Li, J., & Peng, H. (2014). Nonparametric independence screening and structure identification for ultra-high dimensional longitudinal data. The Annals of Statistics, 42(5), 1819–1849.CrossRefGoogle Scholar
  4. Chu, W., Li, R., & Reimherr, M. (2016). Feature screening for time-varying coefficient models with ultrahigh dimensional longitudinal data. The Annals of Applied Statistics, 10(2), 596.CrossRefGoogle Scholar
  5. Cox, D. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 34(2), 87–22.Google Scholar
  6. Cui, H., Li, R., & Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110(510), 630–641.CrossRefGoogle Scholar
  7. Fan, J., & Fan, Y. (2008). High dimensional classification using features annealed independence rules. The Annals of Statistics, 36(6), 2605.CrossRefGoogle Scholar
  8. Fan, J., Feng, Y., & Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 106(494), 544–557.CrossRefGoogle Scholar
  9. Fan, J., Feng, Y., & Wu, Y. (2010). High-dimensional variable selection for cox’s proportional hazards model. In Borrowing strength: Theory powering applications–a festschrift for lawrence d. brown (pp. 70–86). Bethesda, MD: Institute of Mathematical Statistics.Google Scholar
  10. Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456), 1348–1360.CrossRefGoogle Scholar
  11. Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5), 849–911.CrossRefGoogle Scholar
  12. Fan, J., & Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20(1), 101.Google Scholar
  13. Fan, J., Ma, Y., & Dai, W. (2014). Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. Journal of the American Statistical Association, 109(507), 1270–1284.CrossRefGoogle Scholar
  14. Fan, J., Samworth, R., & Wu, Y. (2009). Ultrahigh dimensional feature selection: Beyond the linear model. The Journal of Machine Learning Research, 10, 2013–2038.Google Scholar
  15. Fan, J., & Song, R. (2010). Sure independence screening in generalized linear models with np-dimensionality. The Annals of Statistics, 38(6), 3567–3604.CrossRefGoogle Scholar
  16. Fan, J., & Zhang, W. (2008). Statistical methods with varying coefficient models. Statistics and Its Interface, 1(1), 179.CrossRefGoogle Scholar
  17. Freund, Y., & Schapire, R.E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.CrossRefGoogle Scholar
  18. Hardle, W., Hall, P., & Ichimura, H. (1993). Optimal smoothing in single-index models. The Annals of Statistics, 21(1), 157–178.CrossRefGoogle Scholar
  19. Hardle, W., Liang, H., & Gao, J. (2012). Partially linear models. Berlin: Springer Science & Business Media.Google Scholar
  20. Huang, D., Li, R., & Wang, H. (2014). Feature screening for ultrahigh dimensional categorical data with applications. Journal of Business & Economic Statistics, 32(2), 237–244.CrossRefGoogle Scholar
  21. Huang, J. Z., Wu, C. O., & Zhou, L. (2004). Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Statistica Sinica, 14, 763–788.Google Scholar
  22. Huber, P. J. (1964). Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35(1), 73–101.CrossRefGoogle Scholar
  23. Li, R., Zhong, W., & Zhu, L. (2012). Feature screening via distance correlation learning. Journal of the American Statistical Association, 107(499), 1129–1139.CrossRefGoogle Scholar
  24. Liu, J., Li, R., & Wu, R. (2014). Feature selection for varying coefficient models with ultrahigh-dimensional covariates. Journal of the American Statistical Association, 109(505), 266–274.CrossRefGoogle Scholar
  25. Luo, X., Stefanski, L. A., & Boos, D. D. (2006). Tuning variable selection procedures by adding noise. Technometrics, 48(2), 165–175.CrossRefGoogle Scholar
  26. Mai, Q., & Zou, H. (2012). The Kolmogorov filter for variable screening in high-dimensional binary classification. Biometrika, 100(1), 229–234.CrossRefGoogle Scholar
  27. Mai, Q., & Zou, H. (2015). The fused Kolmogorov filter: A nonparametric model-free screening method. The Annals of Statistics, 43(4), 1471–1497.CrossRefGoogle Scholar
  28. Meier, L., Van de Geer, S., & Bühlmann, P. (2009). High-dimensional additive modeling. The Annals of Statistics, 37(6B), 3779–3821.CrossRefGoogle Scholar
  29. Song, R., Yi, F., & Zou, H. (2014). On varying-coefficient independence screening for high-dimensional varying-coefficient models. Statistica Sinica, 24(4), 1735.Google Scholar
  30. Székely, G. J., & Rizzo, M. L. (2014). Partial distance correlation with methods for dissimilarities. The Annals of Statistics, 42(6), 2382–2412.CrossRefGoogle Scholar
  31. Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769– 2794.CrossRefGoogle Scholar
  32. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58, 267–288.CrossRefGoogle Scholar
  33. Vapnik, V. (2013). The nature of statistical learning theory. Berlin: Springer science & business media.Google Scholar
  34. Wang, L., Li, H., & Huang, J. Z. (2008). Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. Journal of the American Statistical Association, 103(484), 1556–1569.CrossRefGoogle Scholar
  35. Wu, Y., Boos, D. D., & Stefanski, L. A. (2007). Controlling variable selection by the addition of pseudovariables. Journal of the American Statistical Association, 102(477), 235–243.CrossRefGoogle Scholar
  36. Xu, C., & Chen, J. (2014). The sparse MLE for ultrahigh-dimensional feature screening. Journal of the American Statistical Association, 109(507), 1257–1269.CrossRefGoogle Scholar
  37. Xu, P., Zhu, L., & Li, Y. (2014). Ultrahigh dimensional time course feature selection. Biometrics, 70(2), 356–365.CrossRefGoogle Scholar
  38. Yang, G., Yu, Y., Li, R., & Buu, A. (2016). Feature screening in ultrahigh dimensional Cox’s model. Statistica Sinica, 26, 881.Google Scholar
  39. Yousuf, K. (2018). Variable screening for high dimensional time series. Electronic Journal of Statistics, 12(1), 667–702.CrossRefGoogle Scholar
  40. Yousuf, K., & Feng, Y. (2018). Partial distance correlation screening for high dimensional time series. Preprint arXiv:1802.09116.Google Scholar
  41. Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2), 894–942.CrossRefGoogle Scholar
  42. Zhao, S. D., & Li, Y. (2012). Principled sure independence screening for Cox models with ultra-high-dimensional covariates. Journal of Multivariate Analysis, 105(1), 397–411.CrossRefGoogle Scholar
  43. Zhong, W., & Zhu, L. (2015). An iterative approach to distance correlation-based sure independence screening. Journal of Statistical Computation and Simulation, 85(11), 2331–2345.CrossRefGoogle Scholar
  44. Zhu, L., Li, L., Li, R., & Zhu, L. (2011). Model-free feature screening for ultrahigh-dimensional data. Journal of the American Statistical Association, 106(496), 1464–1475.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of StatisticsThe Pennsylvania State UniversityState CollegeUSA
  2. 2.Department of Statistics and The Methodology CenterThe Pennsylvania State UniversityState CollegeUSA

Personalised recommendations