Advertisement

Neural Computing and Applications

, Volume 31, Issue 4, pp 955–965 | Cite as

An in-depth experimental study of anomaly detection using gradient boosted machine

  • Bayu Adhi Tama
  • Kyung-Hyune RheeEmail author
Original Article
  • 458 Downloads

Abstract

This paper proposes an improved detection performance of anomaly-based intrusion detection system (IDS) using gradient boosted machine (GBM). The best parameters of GBM are obtained by performing grid search. The performance of GBM is then compared with the four renowned classifiers, i.e. random forest, deep neural network, support vector machine, and classification and regression tree in terms of four performance measures, i.e. accuracy, specificity, sensitivity, false positive rate and area under receiver operating characteristic curve (AUC). From the experimental result, it can be revealed that GBM significantly outperforms the most recent IDS techniques, i.e. fuzzy classifier, two-tier classifier, GAR-forest, and tree-based classifier ensemble. These results are the highest so far applied on the complete features of three different datasets, i.e. NSL-KDD, UNSW-NB15, and GPRS dataset using either tenfold cross-validation or hold-out method. Moreover, we prove our results by conducting two statistical significant tests which are yet to discover in the existing IDS researches.

Keywords

Gradient boosted machine Anomaly detection Significant test Performance benchmark 

Notes

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2014R1A2A1A11052981), and partially supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-2015-0-00403) supervised by the IITP (Institute for Information & communications Technology Promotion). First author acknowledges Korean Government for providing scholarship through KGSP for Graduate 2013–2018.

Compliance with ethical standards

Conflict of interest

The authors declare that there is no conflict of interest.

References

  1. 1.
    Aiello S, Eckstrand E, Fu A, Landry M, Aboyoun P (2016) Machine learning with R and H2O. https://h2o-release.s3.amazonaws.com/h2o/rel-turan/4/docs-website/h2o-docs/booklets/R_Vignette.pdf. Accessed July 2017
  2. 2.
    Arora A, Candel A, Lanford J, LeDell E, Parmar V (2016) Deep learning with H2O. http://h2o.ai/resources
  3. 3.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefzbMATHGoogle Scholar
  4. 4.
    Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca RatonzbMATHGoogle Scholar
  5. 5.
    Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27CrossRefGoogle Scholar
  6. 6.
    Chebrolu S, Abraham A, Thomas JP (2005) Feature deduction and ensemble design of intrusion detection systems. Comput Secur 24(4):295–307CrossRefGoogle Scholar
  7. 7.
    Conover WJ (1999) Practical nonparametric statistics 3rd edition, John Wiley and Sons, MichiganGoogle Scholar
  8. 8.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297zbMATHGoogle Scholar
  9. 9.
    Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064CrossRefGoogle Scholar
  11. 11.
    Giacinto G, Perdisci R, Del Rio M, Roli F (2008) Intrusion detection in computer networks by a modular ensemble of one-class classifiers. Inf Fusion 9(1):69–82CrossRefGoogle Scholar
  12. 12.
    Govindarajan M, Chandrasekaran R (2011) Intrusion detection using neural based hybrid classification methods. Comput Netw 55(8):1662–1671CrossRefGoogle Scholar
  13. 13.
    Harb HM, Desuky AS (2011) Adaboost ensemble with genetic algorithm post optimization for intrusion detection. Int J Comput Sci Issues 8:5Google Scholar
  14. 14.
    Hsu CW, Chang CC, Lin CJ et al (2010) A practical guide to support vector classification. http://www.datascienceassn.org/sites/default/files/Practical Guide to Support Vector Classification.pdf. Accessed July 2017
  15. 15.
    Kanakarajan NK, Muniasamy K (2016) Improving the accuracy of intrusion detection using GAR-Forest with feature selection. In: Proceedings of the 4th international conference on frontiers in intelligent computing: theory and applications (FICTA) 2015, Springer, New York, pp 539–547Google Scholar
  16. 16.
    Kevric J, Jukic S, Subasi A (2016) An effective combining classifier approach using tree algorithms for network intrusion detection. Neural Comput Appl 1–8Google Scholar
  17. 17.
    Krömer P, Platoš J, Snášel V, Abraham A (2011) Fuzzy classification by evolutionary algorithms. In: 2011 IEEE international conference on systems, man, and cybernetics (SMC), IEEE, pp 313–318Google Scholar
  18. 18.
    Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):1–26CrossRefGoogle Scholar
  19. 19.
    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRefGoogle Scholar
  20. 20.
    Lewis RJ (2000) An introduction to classification and regression tree (CART) analysis. In: Annual meeting of the society for academic emergency medicine in San Francisco, California, pp 1–14Google Scholar
  21. 21.
    Loh WY (2011) Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):14–23CrossRefGoogle Scholar
  22. 22.
    Mohammadi M, Raahemi B, Akbari A, Nassersharif B (2012) New class-dependent feature transformation for intrusion detection systems. Secur Commun Netw 5(12):1296–1311CrossRefGoogle Scholar
  23. 23.
    Moustafa N, Slay J (2015) UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Military communications and information systems conference (MilCIS), 2015, IEEE, pp 1–6Google Scholar
  24. 24.
    Moustafa N, Slay J (2016) The evaluation of network anomaly detection systems: statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inf Secur J Glob Perspect 25(1–3):18–31CrossRefGoogle Scholar
  25. 25.
    Mukkamala S, Sung AH, Abraham A (2005) Intrusion detection using an ensemble of intelligent paradigms. J Netw Comput Appl 28(2):167–182CrossRefGoogle Scholar
  26. 26.
    Oza NC, Tumer K (2008) Classifier ensembles: select real-world applications. Inf Fusion 9(1):4–20CrossRefGoogle Scholar
  27. 27.
    Pajouh HH, Dastghaibyfard G, Hashemi S (2017) Two-tier network anomaly detection model: a machine learning approach. J Intell Inf Syst 48(1):61–74CrossRefGoogle Scholar
  28. 28.
    Panda M, Abraham A, Patra MR (2010) Discriminative multinomial naive bayes for network intrusion detection. In: Information assurance and security (IAS), 2010 sixth international conference on IEEE, pp 5–10Google Scholar
  29. 29.
    Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2):1–39CrossRefGoogle Scholar
  30. 30.
    Sindhu SSS, Geetha S, Kannan A (2012) Decision tree based light weight intrusion detection using a wrapper approach. Expert Syst Appl 39(1):129–141CrossRefGoogle Scholar
  31. 31.
    Tama BA, Rhee KH (2015a) A combination of PSO-based feature selection and tree-based classifiers ensemble for intrusion detection systems. In: Advances in computer science and ubiquitous computing, Springer, New York, pp 489–495Google Scholar
  32. 32.
    Tama BA, Rhee KH (2015b) Performance analysis of multiple classifier system in DoS attack detection. In: International workshop on information security applications, Springer, New York, pp 339–347Google Scholar
  33. 33.
    Tama BA, Rhee KH (2016) Classifier ensemble design with rotation forest to enhance attack detection of IDS in wireless network. In: 2016 11th Asia joint conference on information security (AsiaJCIS), IEEE, pp 87–91Google Scholar
  34. 34.
    Tama BA, Rhee KH (2017) Performance evaluation of intrusion detection system using classifier ensembles. Int J Internet Protoc Technol 10(1):22–29CrossRefGoogle Scholar
  35. 35.
    Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD Cup 99 data set. In: Proceedings of the second IEEE symposium on computational intelligence for security and Defence applications 2009Google Scholar
  36. 36.
    Therneau TM, Atkinson B, Ripley B et al (2010) rpart: Recursive partitioning. R Package Version 3:1–46Google Scholar
  37. 37.
    Vilela DW, Ferreira E, Shinoda AA, de Souza Araujo NV, de Oliveira R, Nascimento VE (2014) A dataset for evaluating intrusion detection systems in IEEE 802.11 wireless networks. In: IEEE Colombian conference on communications and computing (COLCOM), IEEE, pp 1–5Google Scholar

Copyright information

© The Natural Computing Applications Forum 2017

Authors and Affiliations

  1. 1.IT Convergence and Application EngineeringPukyong National UniversityBusanSouth Korea
  2. 2.Faculty of Computer ScienceUniversity of SriwijayaInderalayaIndonesia

Personalised recommendations