Skip to main content

A Novel Image-Based Malware Classification Model Using Deep Learning

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11954))

Included in the following conference series:

Abstract

Nowadays, the vast volume of data which needs to be evaluated potentially malicious is becoming one of the major challenges of antivirus products. In this paper, we propose a novel image-based malware classification model using deep learning to counter large-scale malware analysis. The model includes a malware embedding method called YongImage which maps instruction-level information and disassembly metadata generated by IDA disassembler tool into an image vector, and a deep neural network named malVecNet which has simpler structure and faster convergence rate.

Our proposed YongImage converts malware analysis tasks into image classification problems, which do not rely on domain knowledge and complex feature extraction. Meanwhile, we use the thought of sentence-level classification in Natural Language Processing to establish and optimize our malVecNet. Compared to previous work, malVecNet has better theoretical interpretability and can be trained more effectively.

We use 10-fold cross-validation on Microsoft malware classification challenge dataset to evaluate our model. The results demonstrate that our model can achieve \(99.49\%\) accuracy with 0.022 log loss. Although our scheme is less precise than the winner’s, it makes an orders-of-magnitude performance boost. Compared with other related work, our model also outperforms most of them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/jyker?tab=repositories.

  2. 2.

    https://www.hex-rays.com/products/ida/.

  3. 3.

    https://github.com/jyker/zklearn.

  4. 4.

    https://scikit-learn.org/stable/modules/model_evaluation.html#precision-recall-f-measure-metrics.

References

  1. Ahmadi, M., Ulyanov, D., Semenov, S., Trofimov, M., Giacinto, G.: Novel feature extraction, selection and fusion for effective Malware family classification. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 183–194. ACM (2016)

    Google Scholar 

  2. Andrew Davis, M.W.: Deep learning on disassembly data. Internet (2015). https://www.blackhat.com/docs/us-15/materials/us-15-Davis-Deep-Learning-On-Disassembly.pdf

  3. Drew, J., Hahsler, M., Moore, T.: Polymorphic Malware detection using sequence classification methods and ensembles. EURASIP J. Inf. Secur. 2017(1), 2 (2017)

    Article  Google Scholar 

  4. Garcia, F.C.C., Muga, F.P.: Random forest for malware classification. Cryptography and Security (2016). arXiv

    Google Scholar 

  5. Intel: Intel® 64 and ia-32 architectures software developer’s manual, volume 2: Instruction set reference. Internet, September 2016. https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf

  6. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)

    Google Scholar 

  7. Kebede, T.M., Djaneye-Boundjou, O., Narayanan, B.N., Ralescu, A., Kapp, D.: Classification of malware programs using autoencoders based deep learning architecture and its application to the Microsoft Malware classification challenge (big 2015) dataset. In: 2017 IEEE National Aerospace and Electronics Conference (NAECON), pp. 70–75. IEEE (2017)

    Google Scholar 

  8. Kim, H.-J.: Image-based Malware classification using convolutional neural network. In: Park, J.J., Loia, V., Yi, G., Sung, Y. (eds.) CUTE/CSA -2017. LNEE, vol. 474, pp. 1352–1357. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-7605-3_215

    Chapter  Google Scholar 

  9. Kim, Y.: Convolutional neural networks for sentence classification. In: Empirical Methods in Natural Language Processing, pp. 1746–1751 (2014)

    Google Scholar 

  10. King, G., Zeng, L.: Logistic regression in rare events data. Polit. Anal. 9(2), 137–163 (2001)

    Article  Google Scholar 

  11. Microft: Sam cybersecurity engagement kit. Internet (2018). https://assets.microsoft.com/en-nz/cybersecurity-sam-engagement-kit.pdf

  12. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  13. Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, p. 4. ACM (2011)

    Google Scholar 

  14. Raff, E., Nicholas, C.: An alternative to NCD for large sequences, Lempel-Ziv Jaccard distance. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1007–1015. ACM (2017)

    Google Scholar 

  15. Ronen, R., Radu, M., Feuerstein, C.E., Yomtov, E., Ahmadi, M.: Microsoft Malware classification challenge. Cryptography and Security (2018). arXiv

    Google Scholar 

  16. Santurkar, S., Tsipras, D., Ilyas, A., Madry, A.: How does batch normalization help optimization? In: Advances in Neural Information Processing Systems, pp. 2483–2493 (2018)

    Google Scholar 

  17. Xiaozhou Wang, J.L., Chen, X.: Microsoft Malware classification challenge (big2015): First place team: Say no to overfitting. Internet (2015). https://github.com/xiaozhouwang/kaggle_Microsoft_Malware/blob/master/Saynotooverfitting.pdf

  18. Yan, J., Qi, Y., Rao, Q.: Detecting Malware with an ensemble method based on deep neural network. Secur. Commun. Netw. 2018, 16 (2018)

    Google Scholar 

  19. Ye, Y., Li, T., Adjeroh, D.A., Iyengar, S.S.: A survey on Malware detection using data mining techniques. ACM Comput. Surv. 50(3), 41 (2017)

    Article  Google Scholar 

  20. Yergeau, F.: UTF-8, a transformation format of ISO 10646. Technical report (2003)

    Google Scholar 

Download references

Acknowledgment

This work is supported by the 2019 Science and Technology Project of SGCC “Security Protection Technology of Embedded Components and Control Units in Power System Terminal, No.2019GW-12”, National Key Research and Development Program of China (No. 2017YFB0802300), NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization (No. U1509219), and National Key Research and Development Program of China (No. 2018YFB0803500).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jiang, Y., Li, S., Wu, Y., Zou, F. (2019). A Novel Image-Based Malware Classification Model Using Deep Learning. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11954. Springer, Cham. https://doi.org/10.1007/978-3-030-36711-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36711-4_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36710-7

  • Online ISBN: 978-3-030-36711-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics