Encoding Approach for Intrusion Detection Using PCA and KNN Classifier

  • Nerella SameeraEmail author
  • M. Shashi
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1090)


Intrusion detection is an evolving area of research in the field of cyber-security. Machine learning offers many best methodologies to help intrusion detection systems (IDSs) for accurately identifying intrusions. Such IDSs analyze the features of traffic packets to identify different types of attacks. While most of the features used in IDS are numeric, some of the features like Protocol-type, Flag and Service are categorical and hence calls for an effective encoding scheme for transforming the categorical features into numeric form before applying PCA like techniques for extracting latent features from numeric data. In this paper, the authors investigate the suitability of encoding categorical features based on the posterior probability of an attack conditioned on the feature in the context of IDS. KNN classifier is used for construction of IDS on top of latent features in numeric form. The proposed method is trained and tested on NSL-KDD data set to predict one among the possible 40 distinct class labels for a test instance. Classification accuracy and false positive rate (FPR) are considered as performance metrics. The results have shown that the proposed approach is good at detecting intrusions with an accuracy of 98.05% and a false alarm rate of 0.35%.


Intrusion Encoding Information Gain Ratio PCA KNN CrossTable 


  1. 1.
    Loubna, D., B. Ahmed, E. Hoda, A. Elmoutaoukkil, F. Eladnani, and A. Benihssane. 2015. A Survey of Intrusion Detection System, © IEEE. 978-1-4799-8172-4/15.Google Scholar
  2. 2. Accessed Sept 2018.Google Scholar
  3. 3.
    Saxena, H., and V. Richariya. 2014. Intrusion Detection in KDD99 Dataset Using SVM-PSO and Feature Reduction with Information Gain 98 (6).Google Scholar
  4. 4.
    Karegowda, A.G., A.S. Manjunath, and M.A. Jayaram. 2010. Comparative Study of Attribute Selection Using Gain Ratio and Correlation Based Feature Selection. International Journal of Information Technology and Knowledge Management 2 (2): 271–277.Google Scholar
  5. 5.
    Peng, K., V.C. Leung, and Q. Huang. 2018. Clustering Approach Based on Mini Batch K means for Intrusion Detection System over Big Data. IEEE Access.Google Scholar
  6. 6.
    Elham, A., and K. Rasoul. 2017. IDS Using An Optimized Framework Based on DM Techniques. In IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI).Google Scholar
  7. 7.
    Shyu, M.L., K. Sarinnapakorn, I. Kuruppu-Appuhamilage, S.C. Chen, L. Chang, and T. Goldring. 2005. Handling Nominal Features in Anomaly Intrusion Detection Problems. In 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications RIDE-SDMA 2005, 55–62. IEEE.Google Scholar
  8. 8.
    Simple Methods to deal with Categorical Variables in Predictive Modeling, Accessed Sept 2018.
  9. 9.
    Varghese, J.E., and B. Muniyal. 2017. An Investigation of Classification Algorithms for Intrusion Detection System-A Quantitative Approach. In International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017, 2045–2051. IEEE.Google Scholar
  10. 10.
    Chabathula, K.J., C.D. Jaidhar, and M.A. Ajay Kumara. 2015. Comparative Study of Principal Component Analysis Based Intrusion Detection Approach Using Machine Learning Algorithms. In 3rd International Conference Signal Processing, Communication and Networking (ICSCN), 2015, 1–6. IEEE.Google Scholar
  11. 11.
    Dhanabal, L., and S.P. Shantharajah. 2015. A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms. International Journal of Advanced Research in Computer and Communication Engineering 4 (6): 446–452.Google Scholar
  12. 12.
  13. 13.
    Vasan, K.K., and B. Surendiran. 2016. Dimensionality Reduction Using Principal Component Analysis for network intrusion detection. Perspectives in Science 8: 510–512.CrossRefGoogle Scholar
  14. 14.
    Chen, Y., Y. Li, X.Q. Cheng, and L. Guo. 2006. Survey and Taxonomy of Feature Selection Algorithms in Intrusion Detection System. In International Conference on Information Security and Cryptology, 153–167. Springer, Berlin, Heidelberg.Google Scholar
  15. 15.
    Zhang, M.L., and Z.H. Zhou. 2005. A k-nearest Neighbor Based Algorithm for Multi-label Classification. In 2005 IEEE International Conference Granular Computing, vol. 2, 718–721. IEEE.Google Scholar
  16. 16.
    Khushboo, S., M. Anil, and K.S. Shiv. 2016. A KNN-ACO Approach for Intrusion Detection Using KDD Cups 99 Dataset. In International Conference on Computing for Sustainable Global Development (INDIACom), 978-9-3805-4421-2/16/$31.00_c IEEE.Google Scholar
  17. 17.
    Ikram Sumaiya, T., and C.A. Kumar. 2017. Intrusion Detection Model Using Fusion of Chi-Square Feature Selection and Multi Class SVM. Journal of King Saud University-Computer and Information Sciences 29 (4): 462–472.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Department of CS & SEAndhra University College of Engineering (A), Andhra UniversityVisakhapatnamIndia

Personalised recommendations