Encoding Approach for Intrusion Detection Using PCA and KNN Classifier
- 10 Downloads
Intrusion detection is an evolving area of research in the field of cyber-security. Machine learning offers many best methodologies to help intrusion detection systems (IDSs) for accurately identifying intrusions. Such IDSs analyze the features of traffic packets to identify different types of attacks. While most of the features used in IDS are numeric, some of the features like Protocol-type, Flag and Service are categorical and hence calls for an effective encoding scheme for transforming the categorical features into numeric form before applying PCA like techniques for extracting latent features from numeric data. In this paper, the authors investigate the suitability of encoding categorical features based on the posterior probability of an attack conditioned on the feature in the context of IDS. KNN classifier is used for construction of IDS on top of latent features in numeric form. The proposed method is trained and tested on NSL-KDD data set to predict one among the possible 40 distinct class labels for a test instance. Classification accuracy and false positive rate (FPR) are considered as performance metrics. The results have shown that the proposed approach is good at detecting intrusions with an accuracy of 98.05% and a false alarm rate of 0.35%.
KeywordsIntrusion Encoding Information Gain Ratio PCA KNN CrossTable
- 1.Loubna, D., B. Ahmed, E. Hoda, A. Elmoutaoukkil, F. Eladnani, and A. Benihssane. 2015. A Survey of Intrusion Detection System, © IEEE. 978-1-4799-8172-4/15.Google Scholar
- 2.https.www.youtube.comwatchv=du8YTUgOCJ0. Accessed Sept 2018.Google Scholar
- 3.Saxena, H., and V. Richariya. 2014. Intrusion Detection in KDD99 Dataset Using SVM-PSO and Feature Reduction with Information Gain 98 (6).Google Scholar
- 4.Karegowda, A.G., A.S. Manjunath, and M.A. Jayaram. 2010. Comparative Study of Attribute Selection Using Gain Ratio and Correlation Based Feature Selection. International Journal of Information Technology and Knowledge Management 2 (2): 271–277.Google Scholar
- 5.Peng, K., V.C. Leung, and Q. Huang. 2018. Clustering Approach Based on Mini Batch K means for Intrusion Detection System over Big Data. IEEE Access.Google Scholar
- 6.Elham, A., and K. Rasoul. 2017. IDS Using An Optimized Framework Based on DM Techniques. In IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI).Google Scholar
- 7.Shyu, M.L., K. Sarinnapakorn, I. Kuruppu-Appuhamilage, S.C. Chen, L. Chang, and T. Goldring. 2005. Handling Nominal Features in Anomaly Intrusion Detection Problems. In 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications RIDE-SDMA 2005, 55–62. IEEE.Google Scholar
- 8.Simple Methods to deal with Categorical Variables in Predictive Modeling, https://www.analyticsvidhya.com/blog/2015/11/easy-methods-deal-categorical-variables-predictive-modeling/. Accessed Sept 2018.
- 9.Varghese, J.E., and B. Muniyal. 2017. An Investigation of Classification Algorithms for Intrusion Detection System-A Quantitative Approach. In International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017, 2045–2051. IEEE.Google Scholar
- 10.Chabathula, K.J., C.D. Jaidhar, and M.A. Ajay Kumara. 2015. Comparative Study of Principal Component Analysis Based Intrusion Detection Approach Using Machine Learning Algorithms. In 3rd International Conference Signal Processing, Communication and Networking (ICSCN), 2015, 1–6. IEEE.Google Scholar
- 11.Dhanabal, L., and S.P. Shantharajah. 2015. A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms. International Journal of Advanced Research in Computer and Communication Engineering 4 (6): 446–452.Google Scholar
- 12.https://www.researchgate.net/post/How_can_I_download_NSL-KDD_Dataset. Accessed Sept 2018.
- 14.Chen, Y., Y. Li, X.Q. Cheng, and L. Guo. 2006. Survey and Taxonomy of Feature Selection Algorithms in Intrusion Detection System. In International Conference on Information Security and Cryptology, 153–167. Springer, Berlin, Heidelberg.Google Scholar
- 15.Zhang, M.L., and Z.H. Zhou. 2005. A k-nearest Neighbor Based Algorithm for Multi-label Classification. In 2005 IEEE International Conference Granular Computing, vol. 2, 718–721. IEEE.Google Scholar
- 16.Khushboo, S., M. Anil, and K.S. Shiv. 2016. A KNN-ACO Approach for Intrusion Detection Using KDD Cups 99 Dataset. In International Conference on Computing for Sustainable Global Development (INDIACom), 978-9-3805-4421-2/16/$31.00_c IEEE.Google Scholar
- 17.Ikram Sumaiya, T., and C.A. Kumar. 2017. Intrusion Detection Model Using Fusion of Chi-Square Feature Selection and Multi Class SVM. Journal of King Saud University-Computer and Information Sciences 29 (4): 462–472.Google Scholar