Stability of Feature Selection Methods: A Study of Metrics Across Different Gene Expression Datasets

  • Zahra Mungloo-DilmohamudEmail author
  • Yasmina Jaufeerally-Fakim
  • Carlos Peña-Reyes
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12108)


Analysis of gene-expression data often requires that a gene (feature) subset is selected and many feature selection (FS) methods have been devised. However, FS methods often generate different lists of features for the same dataset and users then have to choose which list to use. One approach to support this choice is to apply stability metrics on the generated lists and selecting lists on that base. The aim of this study is to investigate the behavior of stability metrics applied to feature subsets generated by FS methods. The experiments in this work explore a plethora of gene expression datasets, FS methods, and expected number of features to compare several stability metrics. The stability metrics have been used to compare five feature selection methods (SVM, SAM, ReliefF, RFE + RF and LIMMA) on gene expression datasets from the EBI repository. Results show that the studied stability metrics display a high amount of variability. The reason behind this is not clear yet and is being further investigated. The final objective of the research, that is to define how to select a FS method, is an ongoing work whose partial findings are reported herein.


Stability Stability metrics FS methods Gene expression data 


  1. 1.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene Selection for Cancer Classification using Support Vector Machines. Mach. Learn. 46, 389–422 (2002). Scholar
  2. 2.
    Mungloo-Dilmohamud, Z., Jaufeerally-Fakim, Y., Peña-Reyes, C.: A meta-review of feature selection techniques in the context of microarray data. In: Rojas, I., Ortuño, F. (eds.) IWBBIO 2017. LNCS, vol. 10208, pp. 33–49. Springer, Cham (2017). Scholar
  3. 3.
    Abeel, T., Helleputte, T., Van deaaa Peer, Y., Dupont, P., Saeys, Y.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26, 392–398 (2010). Scholar
  4. 4.
    He, Z., Yu, W.: Stable feature selection for biomarker discovery. Comput. Biol. Chem. 34, 215–225 (2010). Scholar
  5. 5.
    Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., Benítez, J.M., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. (N.Y.) 282, 111–135 (2014). Scholar
  6. 6.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, Hoboken (1991)CrossRefGoogle Scholar
  7. 7.
    Kuhn, M.: Building predictive models in R using the caret Package. J. Stat. Softw. 28(5), 1–26 (2008)CrossRefGoogle Scholar
  8. 8.
    Nogueira, S., Brown, G.: Measuring the stability of feature selection. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9852, pp. 442–457. Springer, Cham (2016). Scholar
  9. 9.
    Mohana, C.: A Survey on feature selection stability measures. International Journal of Computer and Information Technology 05(1), 98–103 (2016)Google Scholar
  10. 10.
    Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008). Scholar
  11. 11.
    Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12, 95–116 (2007)CrossRefGoogle Scholar
  12. 12.
    Guzmán-Martínez, R., Alaiz-Rodríguez, R.: Feature selection stability assessment based on the jensen-shannon divergence. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6911, pp. 597–612. Springer, Heidelberg (2011). Scholar
  13. 13.
    Lustgarten, J.L., Gopalakrishnan, V., Visweswaran, S.: Measuring stability of feature selection in biomedical datasets. AMIA Annu. Symp. Proc. 2009, 406–410 (2009)PubMedPubMedCentralGoogle Scholar
  14. 14.
    Dunne, K., Cunningham, P., Azuaje, F.: Solutions to instability problems with sequential wrapper-based approaches to feature selection. J. Mach. Learn. Res., 1–22 (2002)Google Scholar
  15. 15.
    Kuncheva, L.I.: A stability index for feature selection. In: Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications, pp. 390–395. ACTA Press (2007)Google Scholar
  16. 16.
    Shi, L., Reid, L.H., Jones, W.D., Shippy, R., et al.: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006). MAQC ConsortiumCrossRefGoogle Scholar
  17. 17.
    Yu, L., Ding, C., Loscalzo, S.: Stable feature selection via dense feature groups. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08, p. 803. ACM Press, New York (2008)Google Scholar
  18. 18.
    Zucknick, M., Richardson, S., Stronach, E.A.: Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods. Stat. Appl. Genet. Mol. Biol. 7 (2008). Article7Google Scholar
  19. 19.
    Somol, P., Novovicová, J.: Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1921–1939 (2010)CrossRefGoogle Scholar
  20. 20.
    Novovicová, J., Somol, P., Pudil, P.: A new measure of feature selection algorithms’ stability. In: 2009 IEEE International Conference on Data Mining Workshops, pp. 382–387. IEEE (2009)Google Scholar
  21. 21.
    Křížek, P., Kittler, J., Hlaváč, V.: Improving stability of feature selection methods. In: Kropatsch, Walter G., Kampel, M., Hanbury, A. (eds.) CAIP 2007. LNCS, vol. 4673, pp. 929–936. Springer, Heidelberg (2007). Scholar
  22. 22.
    Goh, W.W.B., Wong, L.: Evaluating feature-selection stability in next-generation proteomics. J. Bioinform. Comput. Biol. 14, 1650029 (2016)CrossRefGoogle Scholar
  23. 23.
    CA, D.: Reliable gene signatures for microarray classification: assessment of stability and performance. Bioinformatics 22, 2356–2363 (2006)CrossRefGoogle Scholar
  24. 24.
    Lausser, L., Müssel, C., Maucher, M., Kestler, H.A.: Measuring and visualizing the stability of biomarker selection techniques. Comput Stat. 28, 51–65 (2013)CrossRefGoogle Scholar
  25. 25.
    Cancer Program Legacy Publication Resources.
  26. 26.
    ArrayExpress < EMBL-EBI.
  27. 27.
  28. 28.
    Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015, 198363 (2015)Google Scholar
  29. 29.
    Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U.S.A. 98, 5116–5121 (2001)CrossRefGoogle Scholar
  30. 30.
    Smyth, G.K.: Limma: linear models for microarray data. In: Gentleman, R., Carey, V.J., Huber, W., Irizarry, R.A., Dudoit, S. (eds.) Bioinformatics and Computational Biology Solutions Using R and Bioconductor, pp. 397–420. Springer, New York (2005). Scholar
  31. 31.
    Kononenko, I.: Estimating attributes: Analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994). Scholar
  32. 32.
    Mungloo-Dilmohamud, Z., Marigliano, G., Jaufeerally-Fakim, Y., Pena-Reyes, C.: A comparative study of feature selection methods for biomarker discovery. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2789–2791. IEEE (2018).
  33. 33.
    Mungloo-Dilmohamud, Z., Jaufeerally-Fakim, T., Peña-Reyes, C.: Exploring the Stability of Feature Selection Methods across a Palette of Gene Expression Datasets. Proceedings of the 2019 6th International Conference on Biomedical and Bioinformatics Engineering, ICBBE 2019. ACM (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of MauritiusReduitMauritius
  2. 2.School of Business and Engineering Vaud (HEIG-VD), Swiss Institute of Bioinformatics (SIB), CI4CB, Computational Intelligence for Computational Biology GroupUniversity of Applied Sciences Western Switzerland (HES-SO)Yverdon-les-Bains Switzerland

Personalised recommendations