Abstract
The continuous increase in internet-based services makes network traffic data larger and more complex day by day. This makes it increasingly difficult to detect network attacks, and therefore requires more efficient and faster data processing methods to ensure network security. For this purpose, many intrusion detection systems have been developed and development works are continuing.
This study; by comparing the performance of machine learning algorithms on the same network data, aims to establish a reference source for the developed intrusion detection systems. In this study; all data of KDD Cup’99 were run on Logistic Regression, Support Vector Machine, Naive Bayes and Random Forest from machine learning algorithms using Apache Spark a big data technology; and the results were analyzed comparatively.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Çevik, M.: Intrusion detection with pattern classification. Ph.D. thesis, Istanbul Technical University, Institute of Science and Technology (2005)
Becerikli, Y.: Advanced pattern recognition. Doctorate Lecture, Computer Engineering Departmant, Kocaeli University, Kocaeli, Turkey (2016)
Gupta, G.P., Kulariya, M.: A framework for fast and efficient cyber security network intrusion detection using apache spark. Procedia Comput. Sci. 93(Supplement C), 824–831 (2016)
Siddique, K., Akhtar, Z., Lee, H.G., Kim, W., Kim, Y.: Toward bulk synchronous parallel-based machine learning techniques for anomaly detection in high-speed big data networks. Symmetry 9(9), 197 (2017)
Harifi, S., Byagowi, E., Khalilian, M.: Comparative study of apache spark MLlib clustering algorithms. In: Tan, Y., Takagi, H., Shi, Y. (eds.) DMBD 2017. LNCS, vol. 10387, pp. 61–73. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61845-6_7
Jeong, H.-D.J., et al.: A search for computationally efficient supervised learning algorithms of anomalous traffic. In: Barolli, L., Enokido, T. (eds.) IMIS 2017. AISC, vol. 612, pp. 590–600. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-61542-4_58
Oh, S.W., Kim, H.S., Lee, H.S., Kim, S.J., Park, H., You, W.: Study on the multi-modal data preprocessing for knowledge-converged super brain. In: 2016 International Conference on Information and Communication Technology Convergence (ICTC), pp. 1088–1093. IEEE (2016)
Lightning-fast cluster computing. https://spark.apache.org/. Accessed 14 Mar 2018
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, CISDA 2009, pp. 1–6. IEEE (2009)
Intrusion Detector Learning. http://archive.ics.uci.edu/ml/machine-learning-databases/kddcup99-mld/task.html. Accessed 08 Jan 2018
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (2013). https://doi.org/10.1007/978-1-4757-3264-1
Özkan, Y.: Data Mining Methods. Papatya Publishing, Istanbul (2008)
Osuna, E., Freund, R., Girosi, F.: Support Vector Machines: Training and Applications. Massachusetts Institute of Technology, Cambridge (1997)
Pöyhönen, S.: Support vector machine based classification in condition monitoring of induction motors. Helsinki University of Technology (2004)
Ilhan Omurca, S.: Machine learning. Master Lecture, Computer Engineering Departmant, Kocaeli University, Kocaeli, Turkey (2016)
Akar, Ö., Güngör, O.: Classification of multispectral images using random forest algorithm. J. Geod. Geoinf. 1, 139–146 (2012)
Özdarıcı Ok, A., Akar, Ö., Güngör, O.: Classification of crops in agricultural lands using random forest classification method. In: TUFUAB 2011 VI. Technical Symposium, Antalya, Turkey (2011)
Gislason, P.O., Benediktsson, J.A., Sveinsson, J.R.: Random forests for land cover classification. Pattern Recogn. Lett. 27(4), 294–300 (2006)
Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217–222 (2005)
Breiman, L.: Manual on setting up, using, and understanding random forests v3.1. Statistics Department, University of California Berkeley, CA, USA (2002)
Archer, K.J., Kimes, R.V.: Empirical characterization of random forest variable importance measures. Comput. Stat. Data Anal. 52(4), 2249–2260 (2008)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Hasan, M.A.M., Nasser, M., Pal, B., Ahmad, S.: Support vector machine and random forest modeling for intrusion detection system (IDS). J. Intell. Learn. Syst. Appl. 06, 45–52 (2014)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Kurt, E.M., Becerikli, Y. (2018). Network Intrusion Detection on Apache Spark with Machine Learning Algorithms. In: Pimenidis, E., Jayne, C. (eds) Engineering Applications of Neural Networks. EANN 2018. Communications in Computer and Information Science, vol 893. Springer, Cham. https://doi.org/10.1007/978-3-319-98204-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-98204-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98203-8
Online ISBN: 978-3-319-98204-5
eBook Packages: Computer ScienceComputer Science (R0)