Advertisement

Applied Intelligence

, Volume 49, Issue 2, pp 592–604 | Cite as

E-ENDPP: a safe feature selection rule for speeding up Elastic Net

  • Yitian XuEmail author
  • Ying Tian
  • Xianli Pan
  • Hongmei Wang
Article
  • 77 Downloads

Abstract

Lasso is a popular regression model, which can do automatic variable selection and continuous shrinkage simultaneously. The Elastic Net is one of the corrective methods of Lasso, which selects groups of correlated variables. It is particularly useful when the number of features p is much bigger than the number of observations n. However, the training efficiency of the Elastic Net for high-dimensional data remains a challenge. Therefore, in this paper, we propose a new safe screening rule, i.e., E-ENDPP, for the Elastic Net problem which can identify the inactive features prior to training. Then, the inactive features or predictors can be removed to reduce the size of problem and accelerate the training speed. Since this E-ENDPP is derived from the optimality conditions of the model, it can be guaranteed in theory that E-ENDPP will give identical solutions with the original model. Simulation studies and real data examples show that our proposed E-ENDPP can substantially accelerate the training speed of the Elastic Net without affecting its accuracy.

Keywords

Elastic Net Lasso Screening rule Feature selection 

Notes

Acknowledgments

The authors gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation. This work was supported in part by the Beijing Natural Science Foundation (No. 4172035) and National Natural Science Foundation of China (No. 11671010).

References

  1. 1.
    Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer 27(2):83–85Google Scholar
  2. 2.
    Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: a data perspective. Acm Computing Surveys 50(6):1–45CrossRefGoogle Scholar
  3. 3.
    Bühlmann P., Kalisch M, Meier L (2014) High-dimensional statistics with a view toward applications in biology. Annual Review of Statistics and Its Application 1(1):255–278CrossRefGoogle Scholar
  4. 4.
    Boyd S, Vandenberghe L (2004) Convex optimization, Cambridge University Press, New YorkGoogle Scholar
  5. 5.
    Bondell H, Reich B (2010) Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR. Biometrics 64(1):115–123MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Xu Y, Zhong P, Wang L (2010) Support vector machine-based embedded approach feature selection algorithm. Journal of Information and Computational Science 7(5):1155–1163Google Scholar
  7. 7.
    Tibshirani R (1996) Regression shrinkage and subset selection with the lasso. Journal of the Royal Statistical SocietyGoogle Scholar
  8. 8.
    Zhao P, Yu B (2006) On model selection consistency of lasso. J Mach Learn Res 7(12):2541–2563MathSciNetzbMATHGoogle Scholar
  9. 9.
    Candès E (2006) Compressive sampling. In: Proceedings of the international congress of mathematicsGoogle Scholar
  10. 10.
    Chen S, Donoho D, Saunders M (2001) Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing (SISC) 58(1):33–61MathSciNetzbMATHGoogle Scholar
  11. 11.
    Wright J, Ma Y, Mairal J, Sapiro G, Huang T, Yan S (2010) Sparse representation for computer vision and pattern recognition.. In: Proceedings of IEEE, 98(6): pp 1031–1044Google Scholar
  12. 12.
    Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Kim S, Koh K, Lustig M, Boyd S, Gorinevsky D (2008) An interior-point method for large scale l1-regularized least squares. IEEE Journal on Selected Topics in Signal Processing 1(4):606–617CrossRefGoogle Scholar
  14. 14.
    Friedman J, Hastie T, Hëfling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1(2):302–332MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22CrossRefGoogle Scholar
  16. 16.
    Park M, Hastie T (2007) L1-regularized path algorithm for generalized linear models. Journal of the Royal Statistical Society Series B 69(4):659–677MathSciNetCrossRefGoogle Scholar
  17. 17.
    Donoho D, Tsaig Y (2008) Fast solution of l-1 norm minimization problems when the solution may be sparse. IEEE Trans Inf Theory 54(11):4789–4812MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    El Ghaoui L, Viallon V, Rabbani T (2012) Safe feature elimination in sparse supervised learning. Pacific Journal of Optimization 8(4):667–698MathSciNetzbMATHGoogle Scholar
  19. 19.
    Pan X, Yang Z, Xu Y, Wang L (2018) Safe screening rules for accelerating twin support vector machine classification. IEEE Transactions on Neural Networks and Learning Systems 29(5):1876–1887MathSciNetCrossRefGoogle Scholar
  20. 20.
    Xiang Z, Ramadge P (2012) Fast lasso screening tests based on correlations. In: 2012 IEEE international conference on acoustics. Speech and Signal Processing (ICASSP) 22(10):2137–2140Google Scholar
  21. 21.
    Xiang Z, Xu H, Ramadge P (2011) Learning sparse representations of high dimensional data on large scale dictionaries. International conference on neural information processing systems 24:900–908Google Scholar
  22. 22.
    Tibshirani R, Bien J, Friedman J, Hastie T, Simon N (2012) Strong rules for discarding predictors in lasso-type problems. J R Stat Soc 74(2):245–266MathSciNetCrossRefGoogle Scholar
  23. 23.
    Wang J, Wonka P, Ye J (2012) Lasso screening rules via dual polytope projection. J Mach Learn Res 16(1):1063–1101MathSciNetzbMATHGoogle Scholar
  24. 24.
    Bruckstein A, Donoho D, Elad M (2009) From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev 51:34–81MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67 (2):301–320MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Hastie T, Tibshirani R (2009) The elements of statistical learning. Technometrics 45(3):267–268zbMATHGoogle Scholar
  27. 27.
    Hoerl A, Kennard R (1988) Ridge regression. In: Encyclopedia of statistical sciences, 8: 129–136. Wiley, New YorkGoogle Scholar
  28. 28.
    Breiman L (1996) Heuristics of instability in model selection. The Annals of Statistics 24Google Scholar
  29. 29.
    Bertsekas D (2003) Convex analysis and opitimization. Athena scientificGoogle Scholar
  30. 30.
    Bauschke H, Combettes P (2011) Convex analysis and monotone operator theory in hilbert spaces, Springer, New YorkGoogle Scholar
  31. 31.
    Johnson T, Guestrin C (2015) BLITZ: a principled meta-algorithm for scaling sparse optimization. In: International conference on international conference on machine learning 18(12): 1171–1179Google Scholar
  32. 32.
    He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: International conference on neural information processing systems 18:507–514Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Yitian Xu
    • 1
    Email author
  • Ying Tian
    • 1
  • Xianli Pan
    • 1
  • Hongmei Wang
    • 2
  1. 1.College of ScienceChina Agricultural UniversityBeijingChina
  2. 2.College of Information and Electrical EngineeringChina Agricultural UniversityBeijingChina

Personalised recommendations