Skip to main content

A Feature Selection Method Based on Feature Correlation Networks

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10563))

Abstract

Feature selection is an important data preprocessing step in data mining and machine learning tasks, especially in the case of high dimensional data. In this paper we present a novel feature selection method based on complex weighted networks describing the strongest correlations among features. The method relies on community detection techniques to identify cohesive groups of features. A subset of features exhibiting a strong association with the class feature is selected from each identified community of features taking into account the size of and connections within the community. The proposed method is evaluated on a high dimensional dataset containing signaling protein features related to the diagnosis of Alzheimer’s disease. We compared the performance of seven widely used classifiers that were trained without feature selection, with correlation-based feature selection by a state-of-the-art method provided by the WEKA tool, and with feature selection by four variants of our method determined by four different community detection techniques. The results of the evaluation indicate that our method improves the classification accuracy of several classification models while drastically reducing the dimensionality of the dataset. Additionally, one variant of our method outperforms the correlation-based feature selection method implemented in WEKA.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 2008(10), P10008 (2008)

    Article  Google Scholar 

  2. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., Hwang, D.U.: Complex networks: structure and dynamics. Phys. Rep. 424(4–5), 175–308 (2006)

    Article  MathSciNet  Google Scholar 

  3. Butterworth, R., Piatetsky-Shapiro, G., Simovici, D.A.: On feature selection through clustering. In: Proceedings of the Fifth IEEE International Conference on Data Mining ICDM 2005, Washington, DC, pp. 581–584. IEEE Computer Society (2005)

    Google Scholar 

  4. Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 066111 (2004)

    Article  Google Scholar 

  5. Csardi, G., Nepusz, T.: The igraph software package for complex network research. InterJournal Complex Systems, 1695 (2006). http://igraph.org

  6. Duch, W.: Filter methods. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. STUDFUZZ, vol. 207, pp. 89–117. Springer, Heidelberg (2006). doi:10.1007/978-3-540-35488-8_4

    Chapter  Google Scholar 

  7. Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010)

    Article  MathSciNet  Google Scholar 

  8. Frank, E., Hall, M., Holmes, G., Kirkby, R., Pfahringer, B., Witten, I.H., Trigg, L.: Weka-A machine learning workbench for data mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 1269–1277. Springer, Heidelberg (2010). doi:10.1007/978-0-387-09823-4_66

    Google Scholar 

  9. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  10. Hall, M.A.: Correlation-based feature subset selection for machine learning. Ph.D. thesis, University of Waikato, Hamilton, New Zealand (1998)

    Google Scholar 

  11. Horvath, S.: Correlation and gene co-expression networks. In: Horvath, S. (ed.) Weighted Network Analysis, pp. 91–121. Springer, Heidelberg (2011). doi:10.1007/978-1-4419-8819-5_5

    Chapter  Google Scholar 

  12. Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994). doi:10.1007/3-540-57868-4_57

    Chapter  Google Scholar 

  13. Krier, C., François, D., Rossi, F., Verleysen, M.: Feature clustering and mutual information for the selection of variables in spectral data. In: Proceedings of European Symposium on Artificial Neural Networks Advances in Computational Intelligence and Learning, pp. 157–162 (2007)

    Google Scholar 

  14. Lal, T.N., Chapelle, O., Weston, J., Elisseeff, A.: Embedded methods. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. STUDFUZZ, vol. 207, pp. 137–165. Springer, Heidelberg (2006). doi:10.1007/978-3-540-35488-8_6

    Chapter  Google Scholar 

  15. Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. arXiv preprint (2016). arXiv:1601.07996

  16. Newman, M.E.J.: The structure and function of complex networks. SIAM Rev. 45(2), 167–256 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  17. Newman, M.E.J.: Analysis of weighted networks. Phys. Rev. E 70, 056131 (2004)

    Article  Google Scholar 

  18. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)

    Article  Google Scholar 

  19. Pons, P., Latapy, M.: Computing communities in large networks using random walks. J. Graph Algorithms Appl. 10(2), 191–218 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  20. Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)

    Article  Google Scholar 

  21. Ray, S., Britschgi, M., Herbert, C., Takeda-Uchimura, Y., Boxer, A., Blennow, K., Friedman, L., Galasko, D., Jutel, M., Karydas, A., Kaye, J., Leszek, J., Miller, B., Minthon, L., Quinn, J., Rabinovici, G., Robinson, W., Sabbagh, M., So, Y., Sparks, D., Tabaton, M., Tinklenberg, J., Yesavage, J., Tibshirani, R., Wyss-Coray, T.: Classification and prediction of clinical Alzheimer’s diagnosis based on plasma signaling proteins. Nat. Med. 13(11), 1359–1362 (2007)

    Article  Google Scholar 

  22. Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53(1), 23–69 (2003)

    Article  MATH  Google Scholar 

  23. Rosvall, M., Bergstrom, C.T.: Maps of information flow reveal community structure in complex networks. Proc. Nat. Acad. Sci. USA 105(4), 1118–1123 (2007)

    Article  Google Scholar 

  24. Sánchez-Maroño, N., Alonso-Betanzos, A., Tombilla-Sanromán, M.: Filter methods for feature selection – a comparative study. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 178–187. Springer, Heidelberg (2007). doi:10.1007/978-3-540-77226-2_19

    Chapter  Google Scholar 

  25. Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013)

    Article  Google Scholar 

  26. Van Dijck, G., Van Hulle, M.M.: Speeding up the wrapper feature subset selection in regression by mutual information relevance and redundancy analysis. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 31–40. Springer, Heidelberg (2006). doi:10.1007/11840817_4

    Chapter  Google Scholar 

  27. Wang, M., Yang, S., Wu, L.: Improved community mining method based on LFM and EAGLE. Comput. Sci. Inf. Syst. 13(2), 515–530 (2016)

    Article  Google Scholar 

  28. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (2005). (Morgan Kaufmann Series in Data Management Systems)

    MATH  Google Scholar 

  29. Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 856–863 (2003)

    Google Scholar 

  30. Zhang, Z., Hancock, E.R.: A graph-based approach to feature selection. In: Jiang, X., Ferrer, M., Torsello, A. (eds.) GbRPR 2011. LNCS, vol. 6658, pp. 205–214. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20844-7_21

    Chapter  Google Scholar 

  31. Zhao, Z., Liu, H.: Searching for interacting features. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence IJCAI 2007, pp. 1156–1161. Morgan Kaufmann Publishers Inc., San Francisco (2007)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the bilateral project “Intelligent computer techniques for improving medical detection, analysis and explanation of human cognition and behavior disorders” between the Ministry of Education, Science and Technological Development of the Republic of Serbia and the Slovenian Research Agency. M. Savić, V. Kurbalija and M. Ivanović also thank the Ministry of Education, Science and Technological Development of the Republic of Serbia for additional support through project no. OI174023, “Intelligent techniques and their integration into wide-spectrum decision support.”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miloš Savić .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Savić, M., Kurbalija, V., Ivanović, M., Bosnić, Z. (2017). A Feature Selection Method Based on Feature Correlation Networks. In: Ouhammou, Y., Ivanovic, M., Abelló, A., Bellatreche, L. (eds) Model and Data Engineering. MEDI 2017. Lecture Notes in Computer Science(), vol 10563. Springer, Cham. https://doi.org/10.1007/978-3-319-66854-3_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66854-3_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66853-6

  • Online ISBN: 978-3-319-66854-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics