Linear Correlation-Based Feature Selection for Network Intrusion Detection Model
Feature selection is a preprocessing phase to machine learning, which leads to increase the classification accuracy and reduce its complexity. However, the increase of data dimensionality poses a challenge to many existing feature selection methods. This paper formulates and validates a method for selecting optimal feature subset based on the analysis of the Pearson correlation coefficients. We adopt the correlation analysis between two variables as a feature goodness measure. Where, a feature is good if it is highly correlated to the class and is low correlated to the other features. To evaluate the proposed Feature selection method, experiments are applied on NSL-KDD dataset. The experiments shows that, the number of features is reduced from 41 to 17 features, which leads to improve the classification accuracy to 99.1%. Also,The efficiency of the proposed linear correlation feature selection method is demonstrated through extensive comparisons with other well known feature selection methods.
KeywordsNetwork security Data Reduction Feature selection Linear Correlation Intrusion detection
Unable to display preview. Download preview PDF.
- 3.Kuchimanchi, G., Phoha, V., Balagani, K., Gaddam, S.: Dimension Reduction Using Feature Extraction Methods for Real-time Misuse Detection Systems. In: Proceedings of the Fifth Annual IEEE SMC Information Assurance Workshop, pp. 195–202 (2004)Google Scholar
- 6.Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature selection for clustering-a filter solution. In: Proceedings of the Second International Conference on Data Mining, pp. 115–122 (2002)Google Scholar
- 7.Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 284–292 (1996)Google Scholar
- 9.Elngar, A., Mohamed, D., Ghaleb, F.: A Real-Time Anomaly Network Intrusion Detection System with High Accuracy. Information Sciences Letters International Journal 2(2), 49–56 (2013)Google Scholar
- 11.Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 856–863 (2003)Google Scholar
- 12.Kim, Y., Street, W., Menczer, F.: Feature selection for unsupervised learning via evolutionary search. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 365–369 (2000)Google Scholar
- 16.Ben-Bassat, M.: Pattern recognition and reduction of dimensionality. In: Handbook of Statistics II, vol. 1, North-Holland, Amsterdam (1982)Google Scholar
- 17.Holland, J.: Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor (1975)Google Scholar
- 18.Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)Google Scholar
- 19.Jemili, F., Zaghdoud, M., Ahmed, M.: Intrusion detection based on Hybrid propagation in Bayesian Networks. In: Proceedings of the IEEE International Conference on Intelligence and Security Informatics, pp. 137–142 (2009)Google Scholar
- 21.Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A Detailed Analysis of the KDD CUP 99 Data Set. In: Proceeding of IEEE Symposium on Computational Intelligence in Security and Defense Application, CISDA (2009)Google Scholar
- 22.KDD’99 dataset, Irvine, CA, USA (July 2010), http://kdd.ics.uci.edu/databases