Abstract
Machine learning is commonly used for both research and operational purposes in detecting cyber attacks. However, publicly available datasets are often highly imbalanced between attack and non-attack data. Training attack detection systems on unbalanced datasets leads to inaccurate and biased algorithms. Here, we explore using Generative Adversarial Networks (GANs) to improve the training and, ultimately, performance of cyber attack detection systems. We determine the feasibility of generating cyber attack data from existing cyber attack datasets with the goal of balancing those datasets with generated data. Our findings suggest that GANs are a viable approach to improving cyber attack intrusion detection systems. Our model generates data that closely mimics the data distribution of various attack types, and could be used to balance previously unbalanced datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cieslak, D., Chawla, N., & Striegel, A. (2006). Combating imbalance in network intrusion datasets (pp. 732–737). www3.nd.edu/~dial/publications/cieslak2006combating.pdf
UCI Machine Learning Repository. (1999). KDD Cup 1999 Data. https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014, June). Generative adversarial networks. arXiv (June 2014), retrieved from the arXiv https://arxiv.org/abs/1406.2661
University of California. (1999). KDD Cup 1999 Data. The UCI KDD Archive
Labib, K., & Vemuri, V. R. (2004). Detecting denial-of-service and network probe attacks using principal component analysis. In 3rd Conference on Security and Network Architectures. http://web.cs.ucdavis.edu/~vemuri/papers/Detecting%20DoS%20and%20Probe%20Attacks%20using%20PCA.pdf
Anonymous. (2002). Maximum security: A Hacker’s guide to protecting your computer systems and network (4th ed.). Que Publishing
Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A. A. (2009). A detailed analysis of the KDD CUP 99 data set. In:IEEE Symposium on Computational Intelligence in Security and Defence Applications (CISDA)
Home - Keras Documentation. https://keras.io/
TensorFlow. www.tensorflow.org
scikit-learn: machine learning in Python scikit-learn 0.20.3 documentation. https://scikit-learn.org/stable/
MySQL. www.mysql.com
Hu, L., Zhang, Z., Tang, H., & Xie, N. (2015, August). An improved intrusion detection framework based on artificial neural networks. In: 2015 11th International Conference on Natural Computation (ICNC) (pp. 1115–1120). IEEE. https://ieeexplore.ieee.org/abstract/document/7378148
Yin, C., Zhu, Y., Liu, S., Fei, J., & Zhang, H. (2018). An enhancing framework for botnet detection using generative adversarial networks. In: 2018 International Conference on Artificial Intelligence and Big Data (ICAI BD) (pp. 228–234)
Xie, H., Lv, K., & Hu, C. (2018, August). An effective method to generate simulated attack data based on generative adversarial nets. https://ieeexplore.ieee.org/abstract/document/8456136
Lin, Z., Shi, Y., & Xue, Z. (2018, September). Idsgan: Generative adversarial networks for attack generation against intrusion detection. arXiv Comput. Sci.
Lee, H., Han, S., & Lee, J. (2017, May).textitGenerative adversarial trainer: Defense to adversarial perturbations with GAN. arxiv.org/abs/1705.03387
Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Tech. rep. www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
Arjovsky, M., & Bottou, L. (2017, January). Towards principled methods for training generative adversarial networks. arxiv.org/abs/1701.04862
Santhanam, G.K., & Grnarova, P. (2018, May). Defending against adversarial attacks by leveraging an entire GAN. arxiv.org/abs/1805.10652
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press, http://www.deeplearningbook.org
Sherman, A., Dark, M., Chan, A., Chong, R., Morris, T., Oliva, L., et al. (2017). INSuRE: collaborating centers of academic excellence engage students in cybersecurity research. IEEE Security and Privacy, 15(4), 72–78.
Acknowledgements
This undergraduate student research project is supported via the Information Security Research and Education (INSuRE) project [21]. Lin Deng is supported by the Faculty Development and Research Committee (FDRC) award at Towson University. We thank our technical director, Dr. Benjamin Blakely at Argonne National Laboratory for his research mentorship and contribution to this project. We also thank Ksenia Tepliakova and Long Chen for their contribution to this research during the fall 2019 semester. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN Xp GPU used for this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Merino, T. et al. (2020). Expansion of Cyber Attack Data from Unbalanced Datasets Using Generative Adversarial Networks. In: Lee, R. (eds) Software Engineering Research, Management and Applications. SERA 2019. Studies in Computational Intelligence, vol 845. Springer, Cham. https://doi.org/10.1007/978-3-030-24344-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-24344-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24343-2
Online ISBN: 978-3-030-24344-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)