Skip to main content

Network Intrusion Detection on Apache Spark with Machine Learning Algorithms

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 893))

Abstract

The continuous increase in internet-based services makes network traffic data larger and more complex day by day. This makes it increasingly difficult to detect network attacks, and therefore requires more efficient and faster data processing methods to ensure network security. For this purpose, many intrusion detection systems have been developed and development works are continuing.

This study; by comparing the performance of machine learning algorithms on the same network data, aims to establish a reference source for the developed intrusion detection systems. In this study; all data of KDD Cup’99 were run on Logistic Regression, Support Vector Machine, Naive Bayes and Random Forest from machine learning algorithms using Apache Spark a big data technology; and the results were analyzed comparatively.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Çevik, M.: Intrusion detection with pattern classification. Ph.D. thesis, Istanbul Technical University, Institute of Science and Technology (2005)

    Google Scholar 

  2. Becerikli, Y.: Advanced pattern recognition. Doctorate Lecture, Computer Engineering Departmant, Kocaeli University, Kocaeli, Turkey (2016)

    Google Scholar 

  3. Gupta, G.P., Kulariya, M.: A framework for fast and efficient cyber security network intrusion detection using apache spark. Procedia Comput. Sci. 93(Supplement C), 824–831 (2016)

    Article  Google Scholar 

  4. Siddique, K., Akhtar, Z., Lee, H.G., Kim, W., Kim, Y.: Toward bulk synchronous parallel-based machine learning techniques for anomaly detection in high-speed big data networks. Symmetry 9(9), 197 (2017)

    Article  Google Scholar 

  5. Harifi, S., Byagowi, E., Khalilian, M.: Comparative study of apache spark MLlib clustering algorithms. In: Tan, Y., Takagi, H., Shi, Y. (eds.) DMBD 2017. LNCS, vol. 10387, pp. 61–73. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61845-6_7

    Chapter  Google Scholar 

  6. Jeong, H.-D.J., et al.: A search for computationally efficient supervised learning algorithms of anomalous traffic. In: Barolli, L., Enokido, T. (eds.) IMIS 2017. AISC, vol. 612, pp. 590–600. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-61542-4_58

    Chapter  Google Scholar 

  7. Oh, S.W., Kim, H.S., Lee, H.S., Kim, S.J., Park, H., You, W.: Study on the multi-modal data preprocessing for knowledge-converged super brain. In: 2016 International Conference on Information and Communication Technology Convergence (ICTC), pp. 1088–1093. IEEE (2016)

    Google Scholar 

  8. Lightning-fast cluster computing. https://spark.apache.org/. Accessed 14 Mar 2018

  9. Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, CISDA 2009, pp. 1–6. IEEE (2009)

    Google Scholar 

  10. Intrusion Detector Learning. http://archive.ics.uci.edu/ml/machine-learning-databases/kddcup99-mld/task.html. Accessed 08 Jan 2018

  11. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (2013). https://doi.org/10.1007/978-1-4757-3264-1

    Book  MATH  Google Scholar 

  12. Özkan, Y.: Data Mining Methods. Papatya Publishing, Istanbul (2008)

    Google Scholar 

  13. Osuna, E., Freund, R., Girosi, F.: Support Vector Machines: Training and Applications. Massachusetts Institute of Technology, Cambridge (1997)

    Google Scholar 

  14. Pöyhönen, S.: Support vector machine based classification in condition monitoring of induction motors. Helsinki University of Technology (2004)

    Google Scholar 

  15. Ilhan Omurca, S.: Machine learning. Master Lecture, Computer Engineering Departmant, Kocaeli University, Kocaeli, Turkey (2016)

    Google Scholar 

  16. Akar, Ö., Güngör, O.: Classification of multispectral images using random forest algorithm. J. Geod. Geoinf. 1, 139–146 (2012)

    Article  Google Scholar 

  17. Özdarıcı Ok, A., Akar, Ö., Güngör, O.: Classification of crops in agricultural lands using random forest classification method. In: TUFUAB 2011 VI. Technical Symposium, Antalya, Turkey (2011)

    Google Scholar 

  18. Gislason, P.O., Benediktsson, J.A., Sveinsson, J.R.: Random forests for land cover classification. Pattern Recogn. Lett. 27(4), 294–300 (2006)

    Article  Google Scholar 

  19. Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217–222 (2005)

    Article  MathSciNet  Google Scholar 

  20. Breiman, L.: Manual on setting up, using, and understanding random forests v3.1. Statistics Department, University of California Berkeley, CA, USA (2002)

    Google Scholar 

  21. Archer, K.J., Kimes, R.V.: Empirical characterization of random forest variable importance measures. Comput. Stat. Data Anal. 52(4), 2249–2260 (2008)

    Article  MathSciNet  Google Scholar 

  22. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  23. Hasan, M.A.M., Nasser, M., Pal, B., Ahmad, S.: Support vector machine and random forest modeling for intrusion detection system (IDS). J. Intell. Learn. Syst. Appl. 06, 45–52 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Elif Merve Kurt or Yaşar Becerikli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kurt, E.M., Becerikli, Y. (2018). Network Intrusion Detection on Apache Spark with Machine Learning Algorithms. In: Pimenidis, E., Jayne, C. (eds) Engineering Applications of Neural Networks. EANN 2018. Communications in Computer and Information Science, vol 893. Springer, Cham. https://doi.org/10.1007/978-3-319-98204-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98204-5_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98203-8

  • Online ISBN: 978-3-319-98204-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics