Feature selection based on community detection in feature correlation networks
- 29 Downloads
Feature selection is an important data preprocessing step in data mining and machine learning tasks, especially in the case of high dimensional data. In this paper, we propose a novel feature selection method based on feature correlation networks, i.e. complex weighted networks describing the strongest correlations among features in a dataset. The method utilizes community detection techniques to identify cohesive groups of features in feature correlation networks. A subset of features exhibiting a strong association with the class variable is selected according to the identified community structure taking into account the size of feature communities and connections within them. The proposed method is experimentally evaluated on a high dimensional dataset containing signaling protein features related to the diagnosis of Alzheimer’s disease. We compared the performance of seven commonly used classifiers that were trained without feature selection, after feature selection by four variants of our method determined by different community detection techniques, and after feature selection by four widely used state-of-the-art feature selection methods available in the WEKA machine learning library. The results of the experimental evaluation indicate that our method improves the classification accuracy of several classification models while greatly reducing the dimensionality of the dataset. Additionally, our method tends to outperform traditional feature selection methods provided by the WEKA library.
KeywordsFeature selection Feature correlation networks Community detection Alzheimer’s disease
This work is supported by the bilateral project “Intelligent computer techniques for improving medical detection, analysis and explanation of human cognition and behavior disorders” between the Ministry of Education, Science and Technological Development of the Republic of Serbia and the Slovenian Research Agency. M. Savić, V. Kurbalija and M. Ivanović also thank the Ministry of Education, Science and Technological Development of the Republic of Serbia for additional support through Project No. OI174023, “Intelligent techniques and their integration into wide-spectrum decision support”.
- 3.Butterworth R, Piatetsky-Shapiro G, Simovici DA (2005) On feature selection through clustering. In: Proceedings of the Fifth IEEE international conference on data mining, ICDM ’05. IEEE Computer Society, Washington, pp. 581–584. https://doi.org/10.1109/ICDM.2005.106
- 5.Csardi G, Nepusz T (2006) The igraph software package for complex network research. InterJ Complex Syst 1695(5):1–9Google Scholar
- 8.Frank E, Hall M, Holmes G, Kirkby R, Pfahringer B, Witten IH, Trigg L (2010) Weka–a machine learning workbench for data mining. Springer, Boston, pp 1269–1277Google Scholar
- 10.Hall MA (1998) Correlation-based feature subset selection for machine learning. Ph.D. thesis, University of Waikato, Hamilton, New ZealandGoogle Scholar
- 14.Krier C, Franois D, Rossi F, Verleysen M (2007) Feature clustering and mutual information for the selection of variables in spectral data. In: Proceedings of European symposium on artificial neural networks advances in computational intelligence and learning, pp 157–162Google Scholar
- 16.Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2016) Feature selection: a data perspective. arXiv preprint arXiv:1601.07996
- 24.Ray S, Britschgi M, Herbert C, Takeda-Uchimura Y, Boxer A, Blennow K, Friedman L, Galasko D, Jutel M, Karydas A, Kaye J, Leszek J, Miller B, Minthon L, Quinn J, Rabinovici G, Robinson W, Sabbagh M, So Y, Sparks D, Tabaton M, Tinklenberg J, Yesavage J, Tibshirani R, Wyss-Coray T (2007) Classification and prediction of clinical Alzheimer’s diagnosis based on plasma signaling proteins. Nat Med 13(11):1359–1362. https://doi.org/10.1038/nm1653 CrossRefGoogle Scholar
- 28.Savić M, Ivanović M, Radovanović M, Ognjanović Z, Pejović A, Jakšić Krüger T (2015) Exploratory analysis of communities in co-authorship networks: a case study. In: Bogdanova AM, Gjorgjevikj D (eds) ICT innovations 2014. Springer, Cham, pp 55–64. https://doi.org/10.1007/978-3-319-09879-1_6 Google Scholar
- 29.Savić M, Ivanović M, Surla BD (2016) A community detection technique for research collaboration networks based on frequent collaborators cores. In: Proceedings of the 31st annual ACM symposium on applied computing, SAC ’16. ACM, New York, pp 1090–1095. https://doi.org/10.1145/2851613.2851809
- 36.Yu L, Liu H (2003) Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Fawcett T, Mishra N (eds) Proceedings of the 20th international conference on machine learning (ICML-03), pp 856–863Google Scholar
- 38.Zhao Z, Liu H (2007) Searching for interacting features. In: Proceedings of the 20th international joint conference on artifical intelligence, IJCAI’07. Morgan Kaufmann Publishers Inc., San Francisco, pp 1156–1161Google Scholar