Stable feature selection based on instance learning, redundancy elimination and efficient subsets fusion

Abstract

Feature selection is frequently used as a preprocessing step to data mining and is attracting growing attention due to the increasing amounts of data emerging from different domains. The large data dimensionality increases the noise and thus the error of learning algorithms. Filter methods for feature selection are specially very fast and useful for high-dimensional datasets. Existing methods focus on producing feature subsets that improve predictive performance, but they often suffer from instability. Instance-based filters, for example, are considered as one of the most effective methods that rank features based on instances neighborhood. However, as the feature weight fluctuates with the instances, small changes in training data result in a different selected subset of features. By another hand, some other filters generate stable results but lead to a modest predictive performance. The absence of a trade-off between stability and classification accuracy decreases the reliability of the feature selection results. In order to deal with this issue, we propose filter methods that improve stability of feature selection while preserving an optimal predictive accuracy and without increasing the complexity of the feature selection algorithms. The proposed approaches first use the strength of instance learning to identify initial sets of relevant features, and the advantage of aggregation techniques to increase the stability of the final set in a second stage. Two classification algorithms are used to evaluate the predictive performance of our proposed instance-based filters compared to state-of-the-art algorithms. The obtained results show the efficiency of our methods in improving both classification accuracy and feature selection stability for high-dimensional datasets.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. 1.

    Guyon I, Elisseff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  2. 2.

    Li S, Karatzoglou A, Gentile C (2016) Collaborative filtering bandits. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. pp 539–548

  3. 3.

    Korda N, Szorenyi B, Li S (2016) Distributed clustering of linear bandits in peer to peer networks. In: Proceedings of the 33rd international conference on international conference on machine learning. pp 1301–1309

  4. 4.

    Feldman D, Schmidt M, Sohler C (2013) Turning big data into tiny data: constant-size coresets for k-means, PCA and projective clustering. In: Proceedings of the 24th annual ACM-SIAM symposium on discrete algorithms

  5. 5.

    Hu X, Zhou P, Li P et al (2018) A survey on online feature selection with streaming features. Front Comput Sci 12:479–493

    Article  Google Scholar 

  6. 6.

    Ghaddar B, Naoum-Sawaya J (2018) High dimensional data classification and feature selection using support vector machines. Eur J Oper Res 265(3):993–1004

    MathSciNet  Article  Google Scholar 

  7. 7.

    Ben Brahim A, Limam M (2018) Ensemble feature selection for high dimensional data: a new method and a comparative study. Adv Data Anal Classif 12(4):937–952

    MathSciNet  Article  Google Scholar 

  8. 8.

    Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517

    Article  Google Scholar 

  9. 9.

    Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell (TPAMI) 19:153–158

    Article  Google Scholar 

  10. 10.

    Kuncheva LI, Rodríguez JJ (2018) On feature selection protocols for very low-sample-size data. Pattern Recognit 81:660–673

    Article  Google Scholar 

  11. 11.

    Vabalas A, Gowen E, Poliakoff E, Casson AJ (2019) Machine learning algorithm validation with a limited sample size. PLoS ONE 14(11):e0224365

    Article  Google Scholar 

  12. 12.

    Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1):95–116

    Article  Google Scholar 

  13. 13.

    Nogueira S, Sechidis K, Brown G (2018) On the stability of feature selection algorithms. J Mach Learn Res 18(174):1–54

    MathSciNet  MATH  Google Scholar 

  14. 14.

    He Z, Yu W (2010) Review article: stable feature selection for biomarker discovery. Comput Biol Chem 34(4):215–225

    Article  Google Scholar 

  15. 15.

    Bommert A, Sun X, Bischl B, Rahnenführer J, Langa M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 143:106839

    MathSciNet  Article  Google Scholar 

  16. 16.

    Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324

    Article  Google Scholar 

  17. 17.

    Ben Brahim A, Limam M (2016) A hybrid feature selection method based on instance learning and cooperative subset search. Pattern Recognit Lett 69(C):28–34

    Article  Google Scholar 

  18. 18.

    Urbanowicz RJ, Meeker M, LaCava W, Olson RS, Moore JH (2018) Relief-based feature selection: introduction and review. J Biomed Inform 85:189–203

    Article  Google Scholar 

  19. 19.

    Hu Q, Pan W, Song Y, Yu D (2012) Large-margin feature selection for monotonic classification. Knowl Based Syst 31:8–18

    Article  Google Scholar 

  20. 20.

    Yu Q, Jiang S, Wang R et al (2017) A feature selection approach based on a similarity measure for software defect prediction. Front Inf Technol Electron Eng 18:1744–1753

    Article  Google Scholar 

  21. 21.

    Kira K, Rendell L (1992) A practical approach to feature selection. In: Sleeman D, Edwards P (eds) International conference on machine learning. pp 368–377

  22. 22.

    Sun Y, Todorovic S, Goodison S (2010) Local learning based feature selection for high dimensional data analysis. IEEE Trans Pattern Anal Mach Intell (TPAMI) 32:1610–1626

    Article  Google Scholar 

  23. 23.

    Robnik SM, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53:23–69

    Article  Google Scholar 

  24. 24.

    Ben Brahim A, Limam M (2014) New prior knowledge based extensions for stable feature selection. In: Proceedings of the 6th international conference of soft computing and pattern recognition. IEEE, pp 306-311

  25. 25.

    Ben Brahim A, Kalousis A (2017) Semi supervised relevance learning for feature selection on high dimensional data. In: Proceedings of the 14th international conference on computer systems and applications. IEEE, pp 579–584

  26. 26.

    Loscalzo S, Yu L, Ding CHQ (2009) Consensus group stable feature selection. In: KDD. ACM, pp 567–576

  27. 27.

    Jerbi W, Ben Brahim A, Essoussi N (2016) A hybrid embedded-filter method for improving feature selection stability of random forests. In: Proceedings of the 16th international conference on hybrid intelligent systems. Springer, pp 370–379

  28. 28.

    Zhou Q, Ding J, Ning Y, Luo L, Li T (2014) Stable feature selection with ensembles of multi-reliefF. In: Proceedings of the 10th international conference on natural computation. IEEE, pp 742–747

  29. 29.

    Moon M, Nakai K (2016) Stable feature selection based on the ensemble L1-norm support vector machine for biomarker discovery. BMC Genom 17:1026

    Article  Google Scholar 

  30. 30.

    Han Y, Yu L (2010) A variance reduction framework for stable feature selection. In: Proceedings of the international conference on data mining. pp 206–215

  31. 31.

    Abeel T, Helleputte T, Van de Peer Y, Dupont P, Saeys Y (2010) Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26:392–398

    Article  Google Scholar 

  32. 32.

    Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238

    Article  Google Scholar 

  33. 33.

    Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS, Ray TS, Koval MA, LastA KW, Norton TA, Lister J Mesirov, Neuberg DS (2000) Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 9:68–74

    Google Scholar 

  34. 34.

    Dyrskjot L, Thykjaer T, Kruhoffer M, Jensen JL, Marcussen N, Hamilton-Dutoit S, Wolf H, Orntoft TF (2003) Identifying distinct classes of bladder carcinoma using microarrays. Nat Genet 33:90–96

    Article  Google Scholar 

  35. 35.

    Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T Jr, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511

    Article  Google Scholar 

  36. 36.

    Troyanskaya OG, Cantor M, Sherlock G, Brown PO, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525

    Article  Google Scholar 

  37. 37.

    Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209

    Article  Google Scholar 

  38. 38.

    Vant Veer LJ (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536

    Article  Google Scholar 

  39. 39.

    Pomeroy SL (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415:436–442

    Article  Google Scholar 

  40. 40.

    Gordon G (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967

    Google Scholar 

  41. 41.

    kuncheva L (2007) A stability index for feature selection. In: Proceedings of the 25th IASTED international multi-conference: artificial intelligence and applications. pp 390–395

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Afef Ben Brahim.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ben Brahim, A. Stable feature selection based on instance learning, redundancy elimination and efficient subsets fusion. Neural Comput & Applic 33, 1221–1232 (2021). https://doi.org/10.1007/s00521-020-04971-y

Download citation

Keywords

  • Feature selection
  • High dimensionality
  • Instance-based learning
  • Stability