Abstract
One of the countermeasures taken by security experts against network attacks is by implementing Intrusion Detection Systems (IDS) in computer networks. Researchers often utilize the de facto network intrusion detection data set, KDD Cup 1999, to evaluate proposed IDS in the context of data mining. However, the imbalanced class distribution of the data set leads to a rare class problem. The problem causes low detection (classification) rates for the rare classes, particularly R2L and U2R. Two commonly used sampling methods to mitigate the rare class problem were evaluated in this research, namely, (1) under-sampling and (2) over-sampling. However, these two methods were less effective in mitigating the problem. The reasons of such performance are presented in this paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
ACM KDD Cup 1999. Computer Network Intrusion Detection (1999), http://www.sigkdd.org/kddcup/
Chawla, N.V., Japckowicz, N., Kolcz, A.: Editorial: Special Issue on Learning from Imbalanced Data sets. ACM SIGKDD Explorations Newsletter 6(1), 1–6 (2004)
Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling Imbalanced Data sets: A Review. GESTS International Transactions on Computer Science and Engineering 30, 25–36 (2006)
Chawla, N.V.: Data Mining for Imbalanced Data sets: An Overview. In: Data Mining and Knowledge Discovery Handbook, 2nd edn., pp. 875–886. Springer Science + Business Media (2000)
McHugh, J.: Testing Intrusion Detection Systems: A Critique of the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory. ACM Transactions on Information and System Security 3(4), 262–294 (2000)
Brugger, S.T., Chow, J.: An assessment of the DARPA IDS Evaluation Data set using Snort. Technical Report CSE-2007-1, University of California, Department of Computer Science, Davis, CA (2007)
Horng, S.J., Su, M.Y., Chen, Y.H., Kao, T.W., Chen, R.J., Lai, J.L., Perkasa, C.D.: A Novel Intrusion Detection System Based on Hierachical Clustering and Support Vector Machine. Expert Systems with Applications 38, 306–313 (2011)
Gupta, K.K., Nath, B.: Layered Approach Using Conditional Random Fields for Intrusion Detection. IEEE Transaction on Dependable and Secure Computing 7(1), 35–49 (2010)
Khor, K.C., Ting, C.Y., Phon-Amnuaisuk, S.: A Cascaded Classifier Approach for Improving Detection Rates on Rare Attack Categories in Network Intrusion Detection. Applied Intelligence 36, 320–329 (2012)
Li, Y., Wang, J.L., Tian, Z.H., Lu, T.B., Young, C.: Building Lightweight Intrusion Detection System Using Wrapper-based Feature Selection Mechanisms. Computers & Security 28(6), 466–475 (2009)
Depren, O., Topallar, M., Anarim, E., Kemal Ciliz, M.: An Intelligent Intrusion Detection System (IDS) for Anomaly and Misuse Detection in Computer Networks. Expert Systems with Applications 29(4), 713–722 (2005)
Xiang, C., Png, C.Y., Lim, S.M.: Design of Multiple-level Hybrid Classifiers for Intrusion Detection System Using Bayesian Clustering and Decision Trees. Pattern Recognition 29(7), 918–924 (2008)
Liu, G., Yi, Z., Yang, S.: A Hierarchical Intrusion Detection Model Based on the PCA Neural Networks. Neurocomputing 70(7-9), 1561–1568 (2007)
Agarwal, R., Joshi, M.V.: PNRule: A New Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection). Technical Report, No. RC-21719, IBM Research Division (2001)
Engen, V., Vincent, J., Phalp, K.: Exploring Discrepancies in Findings Obtained with the KDD Cup ’99 Data Set. Journal of Intelligent Data Analysis 15(2), 251–276 (2011)
Hu, W.M., Hu, W., Maybank, S.: Adaboost-Based Algorithm for Network Intrusion Detection. IEEE Transaction on Systems, Man, and Cybernetics-Part B 38, 577–583 (2008)
Pfahringer, B.: Winning the KDD99 Classification Cup: Bagged Boosting. SIGKDD Explorations 1, 65–66 (2000)
Bouzida, Y., Cuppens, F.: Detecting Known and Novel Network Intrusions. In: Fischer-Hübner, S., Rannenberg, K., Yngström, L., Lindskog, S. (eds.) Security and Privacy in Dynamic Environments. IFIP, vol. 201, pp. 258–270. Springer, Boston (2006)
The University of Waikato, Weka 3, http://www.cs.waikato.ac.nz/ml/weka/
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
Khor, K.C., Ting, C.Y., Phon-Amnuaisuk, S.: Forming an Optimal Feature Set for Classifying Network Intrusions Involving Multiple Feature Selection Methods. In: International Conference on Information Retrieval and Knowledge Management, pp. 178–182 (2010)
Chawla, N.V., Hall, L.O., Joshi, A.: Wrapper-based computation and evaluation of sampling methods for imbalanced data sets. In: Proceedings of the 1st International Workshop on Utility-Based Data Mining, pp. 24–33 (2005)
de Sá, J.P.M.: Pattern Recognition: Concepts, Methods And Applications. Springer, New York (2001)
Visa, S., Ralescu, A.: Issues in Mining Imbalanced Data Sets - A Review Paper. In: Sixteen Midwest Artificial Intelligence and Cognitive Science Conference, pp. 67–73 (2005)
Weiss, G.M.: Mining with Rarity: A Unifying Framework. ACM SIGKDD Explorations Newsletter 6(1), 7–19 (2004)
Bouzida, Y.: Principal Component Analysis for Intrusion Detection and Supervised Learning for New Attack Detection. PhD Thesis (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Khor, KC., Ting, CY., Phon-Amnuaisuk, S. (2014). The Effectiveness of Sampling Methods for the Imbalanced Network Intrusion Detection Data Set. In: Herawan, T., Ghazali, R., Deris, M. (eds) Recent Advances on Soft Computing and Data Mining. Advances in Intelligent Systems and Computing, vol 287. Springer, Cham. https://doi.org/10.1007/978-3-319-07692-8_58
Download citation
DOI: https://doi.org/10.1007/978-3-319-07692-8_58
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07691-1
Online ISBN: 978-3-319-07692-8
eBook Packages: EngineeringEngineering (R0)