A Novel Image-Based Malware Classification Model Using Deep Learning

Jiang, Yongkang; Li, Shenghong; Wu, Yue; Zou, Futai

doi:10.1007/978-3-030-36711-4_14

Yongkang Jiang¹¹,
Shenghong Li¹¹,
Yue Wu¹¹ &
…
Futai Zou¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11954))

Included in the following conference series:

International Conference on Neural Information Processing

1904 Accesses
4 Citations

Abstract

Nowadays, the vast volume of data which needs to be evaluated potentially malicious is becoming one of the major challenges of antivirus products. In this paper, we propose a novel image-based malware classification model using deep learning to counter large-scale malware analysis. The model includes a malware embedding method called YongImage which maps instruction-level information and disassembly metadata generated by IDA disassembler tool into an image vector, and a deep neural network named malVecNet which has simpler structure and faster convergence rate.

Our proposed YongImage converts malware analysis tasks into image classification problems, which do not rely on domain knowledge and complex feature extraction. Meanwhile, we use the thought of sentence-level classification in Natural Language Processing to establish and optimize our malVecNet. Compared to previous work, malVecNet has better theoretical interpretability and can be trained more effectively.

We use 10-fold cross-validation on Microsoft malware classification challenge dataset to evaluate our model. The results demonstrate that our model can achieve \(99.49\%\) accuracy with 0.022 log loss. Although our scheme is less precise than the winner’s, it makes an orders-of-magnitude performance boost. Compared with other related work, our model also outperforms most of them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Ahmadi, M., Ulyanov, D., Semenov, S., Trofimov, M., Giacinto, G.: Novel feature extraction, selection and fusion for effective Malware family classification. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 183–194. ACM (2016)
Google Scholar
Andrew Davis, M.W.: Deep learning on disassembly data. Internet (2015). https://www.blackhat.com/docs/us-15/materials/us-15-Davis-Deep-Learning-On-Disassembly.pdf
Drew, J., Hahsler, M., Moore, T.: Polymorphic Malware detection using sequence classification methods and ensembles. EURASIP J. Inf. Secur. 2017(1), 2 (2017)
Article Google Scholar
Garcia, F.C.C., Muga, F.P.: Random forest for malware classification. Cryptography and Security (2016). arXiv
Google Scholar
Intel: Intel® 64 and ia-32 architectures software developer’s manual, volume 2: Instruction set reference. Internet, September 2016. https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Google Scholar
Kebede, T.M., Djaneye-Boundjou, O., Narayanan, B.N., Ralescu, A., Kapp, D.: Classification of malware programs using autoencoders based deep learning architecture and its application to the Microsoft Malware classification challenge (big 2015) dataset. In: 2017 IEEE National Aerospace and Electronics Conference (NAECON), pp. 70–75. IEEE (2017)
Google Scholar
Kim, H.-J.: Image-based Malware classification using convolutional neural network. In: Park, J.J., Loia, V., Yi, G., Sung, Y. (eds.) CUTE/CSA -2017. LNEE, vol. 474, pp. 1352–1357. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-7605-3_215
Chapter Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: Empirical Methods in Natural Language Processing, pp. 1746–1751 (2014)
Google Scholar
King, G., Zeng, L.: Logistic regression in rare events data. Polit. Anal. 9(2), 137–163 (2001)
Article Google Scholar
Microft: Sam cybersecurity engagement kit. Internet (2018). https://assets.microsoft.com/en-nz/cybersecurity-sam-engagement-kit.pdf
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, p. 4. ACM (2011)
Google Scholar
Raff, E., Nicholas, C.: An alternative to NCD for large sequences, Lempel-Ziv Jaccard distance. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1007–1015. ACM (2017)
Google Scholar
Ronen, R., Radu, M., Feuerstein, C.E., Yomtov, E., Ahmadi, M.: Microsoft Malware classification challenge. Cryptography and Security (2018). arXiv
Google Scholar
Santurkar, S., Tsipras, D., Ilyas, A., Madry, A.: How does batch normalization help optimization? In: Advances in Neural Information Processing Systems, pp. 2483–2493 (2018)
Google Scholar
Xiaozhou Wang, J.L., Chen, X.: Microsoft Malware classification challenge (big2015): First place team: Say no to overfitting. Internet (2015). https://github.com/xiaozhouwang/kaggle_Microsoft_Malware/blob/master/Saynotooverfitting.pdf
Yan, J., Qi, Y., Rao, Q.: Detecting Malware with an ensemble method based on deep neural network. Secur. Commun. Netw. 2018, 16 (2018)
Google Scholar
Ye, Y., Li, T., Adjeroh, D.A., Iyengar, S.S.: A survey on Malware detection using data mining techniques. ACM Comput. Surv. 50(3), 41 (2017)
Article Google Scholar
Yergeau, F.: UTF-8, a transformation format of ISO 10646. Technical report (2003)
Google Scholar

Download references

Acknowledgment

This work is supported by the 2019 Science and Technology Project of SGCC “Security Protection Technology of Embedded Components and Control Units in Power System Terminal, No.2019GW-12”, National Key Research and Development Program of China (No. 2017YFB0802300), NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization (No. U1509219), and National Key Research and Development Program of China (No. 2018YFB0803500).

Author information

Authors and Affiliations

School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Yongkang Jiang, Shenghong Li, Yue Wu & Futai Zou

Authors

Yongkang Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Shenghong Li
View author publications
You can also search for this author in PubMed Google Scholar
Yue Wu
View author publications
You can also search for this author in PubMed Google Scholar
Futai Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Wu .

Editor information

Editors and Affiliations

Australian National University, Canberra, ACT, Australia
Tom Gedeon
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, Y., Li, S., Wu, Y., Zou, F. (2019). A Novel Image-Based Malware Classification Model Using Deep Learning. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11954. Springer, Cham. https://doi.org/10.1007/978-3-030-36711-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-36711-4_14
Published: 09 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36710-7
Online ISBN: 978-3-030-36711-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics