Advertisement

A note on quantile feature screening via distance correlation

Regular Article

Abstract

In this paper, we propose a new feature screening procedure based on a robust quantile version of distance correlation with some desirable characters. First, it is particularly useful for data exhibiting heterogeneity, which is very common for high dimensional data. Second, it is robust to model misspecification and behaves reliably when some of features contain outliers or follow heavy-tailed distributions. Under very mild conditions, we have established its sure screening property. In practice, a same index set is often found to be adequate by the quantile analysis. So we furthermore present a composite robust quantile version of distance correlation to perform feature screening. Simulation studies are carried out to examine the performance of advised procedures. We also illustrate them by a real data example.

Keywords

Heterogeneous data Independence quantile screening Sure screening property 

Notes

Acknowledgements

Chen’s research was supported by the National Natural Science Foundation of China (11501573, 11326184, 11201484 and 61402534) and Natural Science Foundation of Shandong Province of China (ZR2015AL014).

References

  1. Cheng M, Honda T, Li J, Peng H (2014) Nonparametric independence screening and structure identification for ultra-high dimensional longitudinal data. Ann Stat 42(5):1819–1849MathSciNetCrossRefMATHGoogle Scholar
  2. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360MathSciNetCrossRefMATHGoogle Scholar
  3. Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B 70(5):849–911MathSciNetCrossRefGoogle Scholar
  4. Fan J, Song R (2010) Sure independence screening in generalized linear models with np-dimensionality. Ann Stat 38(6):3567–3604MathSciNetCrossRefMATHGoogle Scholar
  5. Fan J, Samworth R, Wu Y (2009) Ultrahigh dimensional feature selection: beyond the linear model. J Mach Learn Res 10:2013–2038MathSciNetMATHGoogle Scholar
  6. Fan J, Feng Y, Song R (2011) Nonparametric independence screening in sparse ultra-high-dimensional additive models. J Am Stat Assoc 106:544–557MathSciNetCrossRefMATHGoogle Scholar
  7. Fan J, Ma Y, Dai W (2014) Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. J Am Stat Assoc 109:1270–1284MathSciNetCrossRefGoogle Scholar
  8. Gao X (2016) A flexible shrinkage operator for fussy grouped variable selection. Stat Pap. doi: 10.1007/s00362-016-0799-y
  9. Hall P, Miller H (2009) Using generalized correlation to effect variable selection in very high dimensional problems. J Comput Graph Stat 18(3):533–550MathSciNetCrossRefGoogle Scholar
  10. He X, Wang L, Hong H (2003) A lack-of-fit test for quantile regression. J Am Stat Assoc 98(1):1013–1022MathSciNetCrossRefMATHGoogle Scholar
  11. He X, Wang L, Hong H (2013) Quantile-adaptive model-free variable screening for high-dimensional heierogeneous data. Ann Stat 41(1):342–369CrossRefMATHGoogle Scholar
  12. Li G, Peng H, Zhang J, Zhu L (2012a) Robust rank correlation based screening. Ann Stat 40(3):1846–1877MathSciNetCrossRefMATHGoogle Scholar
  13. Li R, Zhong W, Zhu L (2012b) Feature screening via distance correlation learning. J Am Stat Assoc 107:1129–1139MathSciNetCrossRefMATHGoogle Scholar
  14. Li G, Li Y, Tsai C (2015) Quantile correlations and quantile autoregressive modeling. J Am Stat Assoc 110(3):246–261MathSciNetCrossRefGoogle Scholar
  15. Lin L, Sun J, Zhu L (2013) Nonparametric feature screening. Comput Stat Data Anal 67:162–174MathSciNetCrossRefGoogle Scholar
  16. Liu J, Li R, Wu R (2014) Feature selection for varying coefficient models with ultrahigh-dimensional covariates. J Am Stat Assoc 109:266–274MathSciNetCrossRefGoogle Scholar
  17. Ma X, Zhang J (2016) Robust model-free feature screening via quantile correlation. J Multivar Anal 143:472–480MathSciNetCrossRefMATHGoogle Scholar
  18. Redfern C, Coward P, Degtyarev M, Lee E, Kwa A, Hennighausen L, Bujard H, Fishman G, Conklin B (1999) Conditional expression and signaling of a specifically designed gi-coupled receptor in transgenic mice. Nat Biotechnol 17:165–169CrossRefGoogle Scholar
  19. Segal M, Dahlquist K, Conklin B (2003) Regression approach for microarray data analysis. J Comput Biol 10:961–980CrossRefGoogle Scholar
  20. Shao X, Zhang J (2014) Martingale difference correlation and its use in high dimensional variable screening. J Am Stat Assoc 109:1302–1318MathSciNetCrossRefGoogle Scholar
  21. Székely G, Rizzo M, Bakirov N (2007) Measuring and testing dependence by correlation of distances. Ann Stat 35:2769–2794MathSciNetCrossRefMATHGoogle Scholar
  22. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288MathSciNetMATHGoogle Scholar
  23. Wu Y, Yin G (2015) Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika 102(1):65–76MathSciNetCrossRefMATHGoogle Scholar
  24. Xu D, Zhang Z, Wu L (2014) Variable selection in high-dimensional double generalized linear models. Stat Pap 55(2):327–347MathSciNetCrossRefMATHGoogle Scholar
  25. Zhang J, Zhang R, Lu Z (2016) Quantile-adaptive variable screening in ultra-high dimensional varying coefficient models. J Appl Stat 43(4):371–380MathSciNetGoogle Scholar
  26. Zhong W, Zhu L (2014) An iterative approach to distance correlation-based sure independence screening. J Stat Comput Simul 85(11):1–15MathSciNetGoogle Scholar
  27. Zhong W, Zhu L, Li R, Cui H (2016) Regularized quantile regression and robust feature screening for single index models. Stat Sin 26:69–95MathSciNetMATHGoogle Scholar
  28. Zhu L, Li L, Li R, Zhu L (2011) Model-free feature screening for ultrahigh-dimensional data. J Am Stat Assoc 106:1464–1475Google Scholar
  29. Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429MathSciNetCrossRefMATHGoogle Scholar
  30. Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36:1108–1126MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.School of StatisticsQufu Normal UniversityQufuChina
  2. 2.College of ScienceChina University of PetroleumQingdaoChina

Personalised recommendations