Skip to main content

A Detailed Analysis of the CICIDS2017 Data Set

  • Conference paper
  • First Online:
Information Systems Security and Privacy (ICISSP 2018)

Abstract

The likelihood of suffering damage from an attack is obvious with the exponential growth in the size of computer networks and the internet. Meanwhile, intrusion detection systems (IDSs) and intrusion prevention systems (IPSs) are one of the most important defensive tools against the ever more sophisticated and ever-growing frequency of network attacks. Anomaly-based research in intrusion detection systems suffers from inaccurate deployment, analysis and evaluation due to the lack of an adequate dataset. A number of datasets such as DARPA98, KDD99, ISC2012, and ADFA13 have been used by the researchers to evaluate the performance of their proposed intrusion detection and intrusion prevention approaches. Based on our study of 16 datasets since 1998, many are out of date and unreliable. There are various shortcomings: lack of traffic diversity and volume, incomplete attack coverage, anonymized packet information and payload which does not reflect the current reality, or they lack some feature set and metadata. This paper focused on CICIDS2017 as the last updated IDS dataset that contains benign and seven common attack network flows, which meets real world criteria and is publicly available. It also evaluates the effectiveness of a set of network traffic features and machine learning algorithms to indicate the best set of features for detecting an attack category. Furthermore, we define the concept of superfeatures which are high quality derived features using a dimension reduction algorithm. We show that the random forest algorithm as one of our best performing algorithm can achieve better results with superfeatures versus top selected features.

The first two authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31(3), 357–374 (2012)

    Article  Google Scholar 

  2. Brown, C., Cowperthwaite, A., Hijazi, A., Somayaji, A.: Analysis of the 1999 DARPA/Lincoln laboratory IDS evaluation data with NetaDHICT. In: 2009 IEEE SCISDA, pp. 1–7 (2009)

    Google Scholar 

  3. The Canadian Institute for Cybersecurity (CIC), CICFlowMeter: The network traffic flow generator and alanlyzer (2017). https://github.com/ISCX/CICFlowMeter

  4. Creech, G., Hu, J.L.: Generation of a new IDS test dataset: time to retire the KDD collection. In: 2013 IEEE Wireless Communications and Networking Conference (WCNC), pp. 4487–4492 (2013)

    Google Scholar 

  5. T.C. Center for Applied Internet Data Analysis (CAIDA): The CAIDA OC48 Peering Point Traces Dataset, San Jose, California (2002)

    Google Scholar 

  6. I.U. University of California: KDD cup 1999 dataset (1999). http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

  7. T.C. Center for Applied Internet Data Analysis (CAIDA): CAIDA DDoS attack dataset (2007)

    Google Scholar 

  8. T.C. Center for Applied Internet Data Analysis (CAIDA): CAIDA anonymized internet traces 2016 dataset (2016)

    Google Scholar 

  9. Gharib, A., Sharafaldin, I., Habibi Lashkari, A., Ghorbani, A.A.: An evaluation framework for intrusion detection dataset. In: 2016 International Conference on Information Science and Security (ICISS), Thailand, pp. 1–6 (2016)

    Google Scholar 

  10. Ghorbani, A.A., Lu, W., Tavallaee, M.: Network Intrusion Detection and Prevention: Concepts and Techniques. Springer, Boston (2010). https://doi.org/10.1007/978-0-387-88771-5

    Book  Google Scholar 

  11. T.S. Group: Defcon 8, 10 and 11 (2000). https://www.defcon.org/

  12. Habibi Lashkari, A., Draper Gil, G., Mamun, M.S.I., Ghorbani, A.A.: Characterization of tor traffic using time based features. In: Proceedings of the 3rd International Conference on Information Systems Security and Privacy (ICISSP), Portugal, pp. 253–262 (2017)

    Google Scholar 

  13. Heidemann, J., Papdopoulos, C.: Uses and challenges for network datasets. In: Cybersecurity Applications Technology Conference For Homeland Security, CATCH 2009, pp. 73–82 (2009)

    Google Scholar 

  14. Koch, R., Golling, M.G., Rodosek, G.D.: Towards comparability of intrusion detection systems: new data sets. In: Proceedings of the TERENA Networking Conference, p. 7 (2017)

    Google Scholar 

  15. Sato M., Yamaki H., Takakura H.: Unknown attacks detection using feature extraction from anomaly-based IDS alerts. In: 2012 IEEE/IPSJ 12th International Symposium on Applications and the Internet (SAINT), pp. 273–277 (2012)

    Google Scholar 

  16. McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln laboratory. ACM Trans. Inf. Syst. Secur. 3(4), 262–294 (2000)

    Article  Google Scholar 

  17. Nechaev, B., Allman, M., Paxson, V., Gurtov, A.: Lawrence Berkeley National Laboratory (LBNL)/ICSI enterprise tracing project (2004)

    Google Scholar 

  18. Nehinbe, J.O.: A simple method for improving intrusion detections in corporate networks. In: Weerasinghe, D. (ed.) ISDF 2009. LNICST, vol. 41, pp. 111–122. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11530-1_13

    Chapter  Google Scholar 

  19. Nehinbe, J.O.: A critical evaluation of datasets for investigating IDSS and IPSS researches. In: IEEE 10th International Conference on CIS, pp. 92–97 (2011)

    Google Scholar 

  20. University of Massachusetts Amherst: Optimistic TCP hacking (2011). http://traces.cs.umass.edu

  21. Pedregosa, F., et al.: Scikit-learn: machine learning in Python (2011)

    Google Scholar 

  22. Proebstel, E.P.: Characterizing and improving distributed network-based intrusion detection systems (NIDS): timestamp synchronization and sampled traffic. Master’s thesis, University of California DAVIS, CA, USA (2008)

    Google Scholar 

  23. Chitrakar, R., Huang, C.: Anomaly based intrusion detection using hybrid learning approach of combining k-medoids clustering and Naive Bayes classification (2012)

    Google Scholar 

  24. Umer, M.F., Sher, M., Bi, Y.: Flow-based intrusion detection: techniques and challenges. Comput. Secur. 70, 238–254 (2017). In: 8th WiCOM, pp. 1–5

    Article  Google Scholar 

  25. Sangster, B., et al.: Toward instrumenting network warfare competitions to generate labeled datasets. In: 2009 USENIX. USENIX: The Advanced Computing System Association (2009)

    Google Scholar 

  26. Scott, P., Wilkins, E.: Evaluating data mining procedures: techniques for generating artificial data sets. Inf. Softw. Technol. 41(9), 579–587 (1999)

    Article  Google Scholar 

  27. Sharafaldin, I., Gharib, A., Habibi Lashkari, A., Ghorbani, A.A.: Towards a reliable intrusion detection benchmark dataset. Softw. Netw. 2017, 177–200 (2017)

    Article  Google Scholar 

  28. Song, J., Takakura, H., Okabe, Y., Eto, M., Inoue, D., Nakao, K.: Statistical analysis of honeypot data and building of Kyoto 2006+ dataset for NIDS evaluation. In: Proceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, pp. 29–36. ACM (2011)

    Google Scholar 

  29. Sperotto, A., Sadre, R., Vliet, F., Pras, A.: A labeled data set for flow-based intrusion detection. In: Proceedings of the 9th IEEE International Workshop on IP Operations and Management, IPOM 2009, pp. 39–50 (2009)

    Chapter  Google Scholar 

  30. Prusty, S., Levine, B.N., Liberatore, M.: Forensic Investigation of the OneSwarm Anonymous Filesharing System. In: ACM Conference on CCS (2011)

    Google Scholar 

  31. Tavallaee, M., Bagheri, E., Lu, W.,, Ghorbani, A.A.: A detailed analysis of the KDD cup 99 data set. In: 2009 IEEE SCISDA, pp. 1–6 (2009)

    Google Scholar 

  32. Xie, M., Hu, J.: Evaluating host-based anomaly detection systems: a preliminary analysis of ADFA-LD. In: Proceedings of the 6th IEEE International Congress on Image and Signal Processing (CISP 2013), pp. 1711–1716 (2013)

    Google Scholar 

  33. Skillicorn, D.: Understanding Complex Datasets: Data Mining with Matrix Decompositions. CRC Press, Boca Rato (2007). Evaluating host-based anomaly detection systems: a preliminary analysis of ADFA-LD. In: 2013 6th International Congress on Image and Signal Processing (CISP), vol. 03, pp. 1711–1716

    Book  Google Scholar 

  34. Xie, M., Hu, J., Slay, J.: Evaluating host-based anomaly detection systems: application of the one-class SVM algorithm to ADFA-LD. In: 2014 11th FSKD, pp. 978–982 (2014)

    Google Scholar 

  35. Sharafaldin, I., Habibi Lashkari, A., Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: 4th International Conference on Information Systems Security and Privacy (ICISSP), Portugal, January 2018 (2017)

    Google Scholar 

  36. Szabó, G., Orincsay, D., Malomsoky, S., Szabó, I.: On the validation of traffic classification algorithms. In: Claypool, M., Uhlig, S. (eds.) PAM 2008. LNCS, vol. 4979, pp. 72–81. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79232-1_8

    Chapter  Google Scholar 

  37. Garcia, S., Grill, M., Stiborek, J., Zunino, A.: An empirical comparison of botnet detection methods. Comput. Securi. 45, 100–123 (2014)

    Article  Google Scholar 

  38. Hofstede, R., Hendriks, L., Sperotto, A., Pras, A.: SSH compromise detection using NetFlow/IPFIX. ACM SIGCOMM Comput. Commun. Rev. 44(5), 20–26 (2014)

    Article  Google Scholar 

  39. Maciá-Fernández, G., Camacho, J., Magán-Carrión, R., García-Teodoro, P., Therón, R.: UGR ‘16: a new dataset for the evaluation of cyclostationarity-based network IDSs. Comput. Secur. 73, 411–424 (2018)

    Article  Google Scholar 

  40. De Lathauwer, L., De Moor, B., Vandewalle, J., B.S.S. by Higher-Order: Blind source separation by higher-order singular value decomposition. In: Proceeding of the 7th European Signal Processing Conference (EUSIPCO 1994), Edinburgh, UK, pp. 175–178 (1994)

    Google Scholar 

Download references

Acknowledgements

The authors acknowledge the generous funding from the Atlantic Canada Opportunity Agency (ACOA) through the Atlantic Innovation Fund (AIF) and through grants from the National Science and Engineering Research Council of Canada (NSERC) to Dr. Ghorbani.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arash Habibi Lashkari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sharafaldin, I., Habibi Lashkari, A., Ghorbani, A.A. (2019). A Detailed Analysis of the CICIDS2017 Data Set. In: Mori, P., Furnell, S., Camp, O. (eds) Information Systems Security and Privacy. ICISSP 2018. Communications in Computer and Information Science, vol 977. Springer, Cham. https://doi.org/10.1007/978-3-030-25109-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-25109-3_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-25108-6

  • Online ISBN: 978-3-030-25109-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics