Expansion of Cyber Attack Data from Unbalanced Datasets Using Generative Adversarial Networks

Merino, Tim; Stillwell, Matt; Steele, Mark; Coplan, Max; Patton, Jon; Stoyanov, Alexander; Deng, Lin

doi:10.1007/978-3-030-24344-9_8

Tim Merino^3,5,
Matt Stillwell³,
Mark Steele^3,5,
Max Coplan^3,4,
Jon Patton^3,5,
Alexander Stoyanov^3,5 &
…
Lin Deng³

Part of the book series: Studies in Computational Intelligence ((SCI,volume 845))

Included in the following conference series:

International Conference on Software Engineering Research, Management and Applications

781 Accesses
8 Citations

Abstract

Machine learning is commonly used for both research and operational purposes in detecting cyber attacks. However, publicly available datasets are often highly imbalanced between attack and non-attack data. Training attack detection systems on unbalanced datasets leads to inaccurate and biased algorithms. Here, we explore using Generative Adversarial Networks (GANs) to improve the training and, ultimately, performance of cyber attack detection systems. We determine the feasibility of generating cyber attack data from existing cyber attack datasets with the goal of balancing those datasets with generated data. Our findings suggest that GANs are a viable approach to improving cyber attack intrusion detection systems. Our model generates data that closely mimics the data distribution of various attack types, and could be used to balance previously unbalanced datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cieslak, D., Chawla, N., & Striegel, A. (2006). Combating imbalance in network intrusion datasets (pp. 732–737). www3.nd.edu/~dial/publications/cieslak2006combating.pdf
UCI Machine Learning Repository. (1999). KDD Cup 1999 Data. https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014, June). Generative adversarial networks. arXiv (June 2014), retrieved from the arXiv https://arxiv.org/abs/1406.2661
University of California. (1999). KDD Cup 1999 Data. The UCI KDD Archive
Google Scholar
Labib, K., & Vemuri, V. R. (2004). Detecting denial-of-service and network probe attacks using principal component analysis. In 3rd Conference on Security and Network Architectures. http://web.cs.ucdavis.edu/~vemuri/papers/Detecting%20DoS%20and%20Probe%20Attacks%20using%20PCA.pdf
Anonymous. (2002). Maximum security: A Hacker’s guide to protecting your computer systems and network (4th ed.). Que Publishing
Google Scholar
Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A. A. (2009). A detailed analysis of the KDD CUP 99 data set. In:IEEE Symposium on Computational Intelligence in Security and Defence Applications (CISDA)
Google Scholar
Home - Keras Documentation. https://keras.io/
TensorFlow. www.tensorflow.org
scikit-learn: machine learning in Python scikit-learn 0.20.3 documentation. https://scikit-learn.org/stable/
MySQL. www.mysql.com
Hu, L., Zhang, Z., Tang, H., & Xie, N. (2015, August). An improved intrusion detection framework based on artificial neural networks. In: 2015 11th International Conference on Natural Computation (ICNC) (pp. 1115–1120). IEEE. https://ieeexplore.ieee.org/abstract/document/7378148
Yin, C., Zhu, Y., Liu, S., Fei, J., & Zhang, H. (2018). An enhancing framework for botnet detection using generative adversarial networks. In: 2018 International Conference on Artificial Intelligence and Big Data (ICAI BD) (pp. 228–234)
Google Scholar
Xie, H., Lv, K., & Hu, C. (2018, August). An effective method to generate simulated attack data based on generative adversarial nets. https://ieeexplore.ieee.org/abstract/document/8456136
Lin, Z., Shi, Y., & Xue, Z. (2018, September). Idsgan: Generative adversarial networks for attack generation against intrusion detection. arXiv Comput. Sci.
Google Scholar
Lee, H., Han, S., & Lee, J. (2017, May).textitGenerative adversarial trainer: Defense to adversarial perturbations with GAN. arxiv.org/abs/1705.03387
Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Tech. rep. www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
Arjovsky, M., & Bottou, L. (2017, January). Towards principled methods for training generative adversarial networks. arxiv.org/abs/1701.04862
Santhanam, G.K., & Grnarova, P. (2018, May). Defending against adversarial attacks by leveraging an entire GAN. arxiv.org/abs/1805.10652
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press, http://www.deeplearningbook.org
Sherman, A., Dark, M., Chan, A., Chong, R., Morris, T., Oliva, L., et al. (2017). INSuRE: collaborating centers of academic excellence engage students in cybersecurity research. IEEE Security and Privacy, 15(4), 72–78.
Article Google Scholar

Download references

Acknowledgements

This undergraduate student research project is supported via the Information Security Research and Education (INSuRE) project [21]. Lin Deng is supported by the Faculty Development and Research Committee (FDRC) award at Towson University. We thank our technical director, Dr. Benjamin Blakely at Argonne National Laboratory for his research mentorship and contribution to this project. We also thank Ksenia Tepliakova and Long Chen for their contribution to this research during the fall 2019 semester. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN Xp GPU used for this research.

Author information

Authors and Affiliations

Department of Computer and Information Sciences, Towson University, Towson, MD, USA
Tim Merino, Matt Stillwell, Mark Steele, Max Coplan, Jon Patton, Alexander Stoyanov & Lin Deng
Department of Physics, Astronomy and Geosciences, Towson University, Towson, MD, USA
Max Coplan
Department of Mathematics, Towson University, Towson, MD, USA
Tim Merino, Mark Steele, Jon Patton & Alexander Stoyanov

Authors

Tim Merino
View author publications
You can also search for this author in PubMed Google Scholar
Matt Stillwell
View author publications
You can also search for this author in PubMed Google Scholar
Mark Steele
View author publications
You can also search for this author in PubMed Google Scholar
Max Coplan
View author publications
You can also search for this author in PubMed Google Scholar
Jon Patton
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Stoyanov
View author publications
You can also search for this author in PubMed Google Scholar
Lin Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lin Deng .

Editor information

Editors and Affiliations

Software Engineering and Information Technology Institute, Central Michigan University, Mount Pleasant, MI, USA
Roger Lee

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Merino, T. et al. (2020). Expansion of Cyber Attack Data from Unbalanced Datasets Using Generative Adversarial Networks. In: Lee, R. (eds) Software Engineering Research, Management and Applications. SERA 2019. Studies in Computational Intelligence, vol 845. Springer, Cham. https://doi.org/10.1007/978-3-030-24344-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-24344-9_8
Published: 25 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24343-2
Online ISBN: 978-3-030-24344-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics