Advertisement

Malicious Software Classification Using VGG16 Deep Neural Network’s Bottleneck Features

  • Edmar Rezende
  • Guilherme Ruppert
  • Tiago Carvalho
  • Antonio Theophilo
  • Fabio Ramos
  • Paulo de Geus
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 738)

Abstract

Malicious software (malware) has been extensively employed for illegal purposes and thousands of new samples are discovered every day. The ability to classify samples with similar characteristics into families makes possible to create mitigation strategies that work for a whole class of programs. In this paper, we present a malware family classification approach using VGG16 deep neural network’s bottleneck features. Malware samples are represented as byteplot grayscale images and the convolutional layers of a VGG16 deep neural network pre-trained on the ImageNet dataset is used for bottleneck features extraction. These features are used to train a SVM classifier for the malware family classification task. The experimental results on a dataset comprising 10,136 samples from 20 different families showed that our approach can effectively be used to classify malware families with an accuracy of 92.97%, outperforming similar approaches proposed in the literature which require feature engineering and considerable domain expertise.

Keywords

Malicious software Classification Machine learning Deep learning Transfer learning 

Notes

Acknowledgements

This work has been partially supported by Brazilian National Council for Scientific and Technological Development (grants 302923/2014-4 and 313152/2015-2). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPUs used for this research.

References

  1. 1.
    Y. Bengio et al., Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)CrossRefGoogle Scholar
  2. 2.
    O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  3. 3.
    J. Yosinski, J. Clune, Y. Bengio, H. Lipson, How transferable are features in deep neural networks? in Advances in Neural Information Processing Systems (2014), pp. 3320–3328Google Scholar
  4. 4.
    K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556Google Scholar
  5. 5.
    J.Z. Kolter, M.A. Maloof, Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)MathSciNetzbMATHGoogle Scholar
  6. 6.
    A. Shabtai, R. Moskovitch, C. Feher, S. Dolev, Y. Elovici, Detecting unknown malicious code by applying classification techniques on opcode patterns. Secur. Inform. 1(1), 1–22 (2012)CrossRefGoogle Scholar
  7. 7.
    L. Nataraj, S. Karthikeyan, G. Jacob, B. Manjunath, Malware images: visualization and automatic classification, in Proceedings of the 8th International Symposium on Visualization for Cyber Security (ACM, New York, 2011), p. 4Google Scholar
  8. 8.
    B. Kolosnjaji, A. Zarras, G.D. Webster, C. Eckert, Deep learning for classification of malware system call sequences, in Australasian Conference on Artificial Intelligence (2016), pp. 137–149Google Scholar
  9. 9.
    A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105Google Scholar
  10. 10.
    G. Conti, E. Dean, M. Sinda, B. Sangster, Visual reverse engineering of binary and data files, in Visualization for Computer Security (Springer, Berlin, 2008), pp. 1–17CrossRefGoogle Scholar
  11. 11.
    M. Sebastián, R. Rivera, P. Kotzias, J. Caballero, Avclass: a tool for massive malware labeling, in International Symposium on Research in Attacks, Intrusions, and Defenses (Springer, Cham, 2016), pp. 230–253CrossRefGoogle Scholar
  12. 12.
    L. van der Maaten, G. Hinton, Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)zbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Edmar Rezende
    • 1
    • 2
  • Guilherme Ruppert
    • 3
  • Tiago Carvalho
    • 4
  • Antonio Theophilo
    • 3
  • Fabio Ramos
    • 5
  • Paulo de Geus
    • 6
  1. 1.University of CampinasCampinasBrazil
  2. 2.Center for Information Technology Renato ArcherCampinasBrazil
  3. 3.Center for Information Technology Renato ArcherCampinasBrazil
  4. 4.Federal Institute of São PauloCampinasBrazil
  5. 5.University of SydneySydneyAustralia
  6. 6.University of CampinasCampinasBrazil

Personalised recommendations