Abstract
The likelihood of suffering damage from an attack is obvious with the exponential growth in the size of computer networks and the internet. Meanwhile, intrusion detection systems (IDSs) and intrusion prevention systems (IPSs) are one of the most important defensive tools against the ever more sophisticated and ever-growing frequency of network attacks. Anomaly-based research in intrusion detection systems suffers from inaccurate deployment, analysis and evaluation due to the lack of an adequate dataset. A number of datasets such as DARPA98, KDD99, ISC2012, and ADFA13 have been used by the researchers to evaluate the performance of their proposed intrusion detection and intrusion prevention approaches. Based on our study of 16 datasets since 1998, many are out of date and unreliable. There are various shortcomings: lack of traffic diversity and volume, incomplete attack coverage, anonymized packet information and payload which does not reflect the current reality, or they lack some feature set and metadata. This paper focused on CICIDS2017 as the last updated IDS dataset that contains benign and seven common attack network flows, which meets real world criteria and is publicly available. It also evaluates the effectiveness of a set of network traffic features and machine learning algorithms to indicate the best set of features for detecting an attack category. Furthermore, we define the concept of superfeatures which are high quality derived features using a dimension reduction algorithm. We show that the random forest algorithm as one of our best performing algorithm can achieve better results with superfeatures versus top selected features.
The first two authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31(3), 357–374 (2012)
Brown, C., Cowperthwaite, A., Hijazi, A., Somayaji, A.: Analysis of the 1999 DARPA/Lincoln laboratory IDS evaluation data with NetaDHICT. In: 2009 IEEE SCISDA, pp. 1–7 (2009)
The Canadian Institute for Cybersecurity (CIC), CICFlowMeter: The network traffic flow generator and alanlyzer (2017). https://github.com/ISCX/CICFlowMeter
Creech, G., Hu, J.L.: Generation of a new IDS test dataset: time to retire the KDD collection. In: 2013 IEEE Wireless Communications and Networking Conference (WCNC), pp. 4487–4492 (2013)
T.C. Center for Applied Internet Data Analysis (CAIDA): The CAIDA OC48 Peering Point Traces Dataset, San Jose, California (2002)
I.U. University of California: KDD cup 1999 dataset (1999). http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
T.C. Center for Applied Internet Data Analysis (CAIDA): CAIDA DDoS attack dataset (2007)
T.C. Center for Applied Internet Data Analysis (CAIDA): CAIDA anonymized internet traces 2016 dataset (2016)
Gharib, A., Sharafaldin, I., Habibi Lashkari, A., Ghorbani, A.A.: An evaluation framework for intrusion detection dataset. In: 2016 International Conference on Information Science and Security (ICISS), Thailand, pp. 1–6 (2016)
Ghorbani, A.A., Lu, W., Tavallaee, M.: Network Intrusion Detection and Prevention: Concepts and Techniques. Springer, Boston (2010). https://doi.org/10.1007/978-0-387-88771-5
T.S. Group: Defcon 8, 10 and 11 (2000). https://www.defcon.org/
Habibi Lashkari, A., Draper Gil, G., Mamun, M.S.I., Ghorbani, A.A.: Characterization of tor traffic using time based features. In: Proceedings of the 3rd International Conference on Information Systems Security and Privacy (ICISSP), Portugal, pp. 253–262 (2017)
Heidemann, J., Papdopoulos, C.: Uses and challenges for network datasets. In: Cybersecurity Applications Technology Conference For Homeland Security, CATCH 2009, pp. 73–82 (2009)
Koch, R., Golling, M.G., Rodosek, G.D.: Towards comparability of intrusion detection systems: new data sets. In: Proceedings of the TERENA Networking Conference, p. 7 (2017)
Sato M., Yamaki H., Takakura H.: Unknown attacks detection using feature extraction from anomaly-based IDS alerts. In: 2012 IEEE/IPSJ 12th International Symposium on Applications and the Internet (SAINT), pp. 273–277 (2012)
McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln laboratory. ACM Trans. Inf. Syst. Secur. 3(4), 262–294 (2000)
Nechaev, B., Allman, M., Paxson, V., Gurtov, A.: Lawrence Berkeley National Laboratory (LBNL)/ICSI enterprise tracing project (2004)
Nehinbe, J.O.: A simple method for improving intrusion detections in corporate networks. In: Weerasinghe, D. (ed.) ISDF 2009. LNICST, vol. 41, pp. 111–122. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11530-1_13
Nehinbe, J.O.: A critical evaluation of datasets for investigating IDSS and IPSS researches. In: IEEE 10th International Conference on CIS, pp. 92–97 (2011)
University of Massachusetts Amherst: Optimistic TCP hacking (2011). http://traces.cs.umass.edu
Pedregosa, F., et al.: Scikit-learn: machine learning in Python (2011)
Proebstel, E.P.: Characterizing and improving distributed network-based intrusion detection systems (NIDS): timestamp synchronization and sampled traffic. Master’s thesis, University of California DAVIS, CA, USA (2008)
Chitrakar, R., Huang, C.: Anomaly based intrusion detection using hybrid learning approach of combining k-medoids clustering and Naive Bayes classification (2012)
Umer, M.F., Sher, M., Bi, Y.: Flow-based intrusion detection: techniques and challenges. Comput. Secur. 70, 238–254 (2017). In: 8th WiCOM, pp. 1–5
Sangster, B., et al.: Toward instrumenting network warfare competitions to generate labeled datasets. In: 2009 USENIX. USENIX: The Advanced Computing System Association (2009)
Scott, P., Wilkins, E.: Evaluating data mining procedures: techniques for generating artificial data sets. Inf. Softw. Technol. 41(9), 579–587 (1999)
Sharafaldin, I., Gharib, A., Habibi Lashkari, A., Ghorbani, A.A.: Towards a reliable intrusion detection benchmark dataset. Softw. Netw. 2017, 177–200 (2017)
Song, J., Takakura, H., Okabe, Y., Eto, M., Inoue, D., Nakao, K.: Statistical analysis of honeypot data and building of Kyoto 2006+ dataset for NIDS evaluation. In: Proceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, pp. 29–36. ACM (2011)
Sperotto, A., Sadre, R., Vliet, F., Pras, A.: A labeled data set for flow-based intrusion detection. In: Proceedings of the 9th IEEE International Workshop on IP Operations and Management, IPOM 2009, pp. 39–50 (2009)
Prusty, S., Levine, B.N., Liberatore, M.: Forensic Investigation of the OneSwarm Anonymous Filesharing System. In: ACM Conference on CCS (2011)
Tavallaee, M., Bagheri, E., Lu, W.,, Ghorbani, A.A.: A detailed analysis of the KDD cup 99 data set. In: 2009 IEEE SCISDA, pp. 1–6 (2009)
Xie, M., Hu, J.: Evaluating host-based anomaly detection systems: a preliminary analysis of ADFA-LD. In: Proceedings of the 6th IEEE International Congress on Image and Signal Processing (CISP 2013), pp. 1711–1716 (2013)
Skillicorn, D.: Understanding Complex Datasets: Data Mining with Matrix Decompositions. CRC Press, Boca Rato (2007). Evaluating host-based anomaly detection systems: a preliminary analysis of ADFA-LD. In: 2013 6th International Congress on Image and Signal Processing (CISP), vol. 03, pp. 1711–1716
Xie, M., Hu, J., Slay, J.: Evaluating host-based anomaly detection systems: application of the one-class SVM algorithm to ADFA-LD. In: 2014 11th FSKD, pp. 978–982 (2014)
Sharafaldin, I., Habibi Lashkari, A., Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: 4th International Conference on Information Systems Security and Privacy (ICISSP), Portugal, January 2018 (2017)
Szabó, G., Orincsay, D., Malomsoky, S., Szabó, I.: On the validation of traffic classification algorithms. In: Claypool, M., Uhlig, S. (eds.) PAM 2008. LNCS, vol. 4979, pp. 72–81. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79232-1_8
Garcia, S., Grill, M., Stiborek, J., Zunino, A.: An empirical comparison of botnet detection methods. Comput. Securi. 45, 100–123 (2014)
Hofstede, R., Hendriks, L., Sperotto, A., Pras, A.: SSH compromise detection using NetFlow/IPFIX. ACM SIGCOMM Comput. Commun. Rev. 44(5), 20–26 (2014)
Maciá-Fernández, G., Camacho, J., Magán-Carrión, R., GarcÃa-Teodoro, P., Therón, R.: UGR ‘16: a new dataset for the evaluation of cyclostationarity-based network IDSs. Comput. Secur. 73, 411–424 (2018)
De Lathauwer, L., De Moor, B., Vandewalle, J., B.S.S. by Higher-Order: Blind source separation by higher-order singular value decomposition. In: Proceeding of the 7th European Signal Processing Conference (EUSIPCO 1994), Edinburgh, UK, pp. 175–178 (1994)
Acknowledgements
The authors acknowledge the generous funding from the Atlantic Canada Opportunity Agency (ACOA) through the Atlantic Innovation Fund (AIF) and through grants from the National Science and Engineering Research Council of Canada (NSERC) to Dr. Ghorbani.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sharafaldin, I., Habibi Lashkari, A., Ghorbani, A.A. (2019). A Detailed Analysis of the CICIDS2017 Data Set. In: Mori, P., Furnell, S., Camp, O. (eds) Information Systems Security and Privacy. ICISSP 2018. Communications in Computer and Information Science, vol 977. Springer, Cham. https://doi.org/10.1007/978-3-030-25109-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-25109-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25108-6
Online ISBN: 978-3-030-25109-3
eBook Packages: Computer ScienceComputer Science (R0)