Skip to main content

Expansion of Cyber Attack Data from Unbalanced Datasets Using Generative Adversarial Networks

  • Chapter
  • First Online:
Software Engineering Research, Management and Applications (SERA 2019)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 845))

Abstract

Machine learning is commonly used for both research and operational purposes in detecting cyber attacks. However, publicly available datasets are often highly imbalanced between attack and non-attack data. Training attack detection systems on unbalanced datasets leads to inaccurate and biased algorithms. Here, we explore using Generative Adversarial Networks (GANs) to improve the training and, ultimately, performance of cyber attack detection systems. We determine the feasibility of generating cyber attack data from existing cyber attack datasets with the goal of balancing those datasets with generated data. Our findings suggest that GANs are a viable approach to improving cyber attack intrusion detection systems. Our model generates data that closely mimics the data distribution of various attack types, and could be used to balance previously unbalanced datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cieslak, D., Chawla, N., & Striegel, A. (2006). Combating imbalance in network intrusion datasets (pp. 732–737). www3.nd.edu/~dial/publications/cieslak2006combating.pdf

  2. UCI Machine Learning Repository. (1999). KDD Cup 1999 Data. https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

  3. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014, June). Generative adversarial networks. arXiv (June 2014), retrieved from the arXiv https://arxiv.org/abs/1406.2661

  4. University of California. (1999). KDD Cup 1999 Data. The UCI KDD Archive

    Google Scholar 

  5. Labib, K., & Vemuri, V. R. (2004). Detecting denial-of-service and network probe attacks using principal component analysis. In 3rd Conference on Security and Network Architectures. http://web.cs.ucdavis.edu/~vemuri/papers/Detecting%20DoS%20and%20Probe%20Attacks%20using%20PCA.pdf

  6. Anonymous. (2002). Maximum security: A Hacker’s guide to protecting your computer systems and network (4th ed.). Que Publishing

    Google Scholar 

  7. Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A. A. (2009). A detailed analysis of the KDD CUP 99 data set. In:IEEE Symposium on Computational Intelligence in Security and Defence Applications (CISDA)

    Google Scholar 

  8. Home - Keras Documentation. https://keras.io/

  9. TensorFlow. www.tensorflow.org

  10. scikit-learn: machine learning in Python scikit-learn 0.20.3 documentation. https://scikit-learn.org/stable/

  11. MySQL. www.mysql.com

  12. Hu, L., Zhang, Z., Tang, H., & Xie, N. (2015, August). An improved intrusion detection framework based on artificial neural networks. In: 2015 11th International Conference on Natural Computation (ICNC) (pp. 1115–1120). IEEE. https://ieeexplore.ieee.org/abstract/document/7378148

  13. Yin, C., Zhu, Y., Liu, S., Fei, J., & Zhang, H. (2018). An enhancing framework for botnet detection using generative adversarial networks. In: 2018 International Conference on Artificial Intelligence and Big Data (ICAI BD) (pp. 228–234)

    Google Scholar 

  14. Xie, H., Lv, K., & Hu, C. (2018, August). An effective method to generate simulated attack data based on generative adversarial nets. https://ieeexplore.ieee.org/abstract/document/8456136

  15. Lin, Z., Shi, Y., & Xue, Z. (2018, September). Idsgan: Generative adversarial networks for attack generation against intrusion detection. arXiv Comput. Sci.

    Google Scholar 

  16. Lee, H., Han, S., & Lee, J. (2017, May).textitGenerative adversarial trainer: Defense to adversarial perturbations with GAN. arxiv.org/abs/1705.03387

  17. Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Tech. rep. www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf

  18. Arjovsky, M., & Bottou, L. (2017, January). Towards principled methods for training generative adversarial networks. arxiv.org/abs/1701.04862

  19. Santhanam, G.K., & Grnarova, P. (2018, May). Defending against adversarial attacks by leveraging an entire GAN. arxiv.org/abs/1805.10652

  20. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press, http://www.deeplearningbook.org

  21. Sherman, A., Dark, M., Chan, A., Chong, R., Morris, T., Oliva, L., et al. (2017). INSuRE: collaborating centers of academic excellence engage students in cybersecurity research. IEEE Security and Privacy, 15(4), 72–78.

    Article  Google Scholar 

Download references

Acknowledgements

This undergraduate student research project is supported via the Information Security Research and Education (INSuRE) project [21]. Lin Deng is supported by the Faculty Development and Research Committee (FDRC) award at Towson University. We thank our technical director, Dr. Benjamin Blakely at Argonne National Laboratory for his research mentorship and contribution to this project. We also thank Ksenia Tepliakova and Long Chen for their contribution to this research during the fall 2019 semester. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN Xp GPU used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin Deng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Merino, T. et al. (2020). Expansion of Cyber Attack Data from Unbalanced Datasets Using Generative Adversarial Networks. In: Lee, R. (eds) Software Engineering Research, Management and Applications. SERA 2019. Studies in Computational Intelligence, vol 845. Springer, Cham. https://doi.org/10.1007/978-3-030-24344-9_8

Download citation

Publish with us

Policies and ethics