Abstract
Nowadays, the vast volume of data which needs to be evaluated potentially malicious is becoming one of the major challenges of antivirus products. In this paper, we propose a novel image-based malware classification model using deep learning to counter large-scale malware analysis. The model includes a malware embedding method called YongImage which maps instruction-level information and disassembly metadata generated by IDA disassembler tool into an image vector, and a deep neural network named malVecNet which has simpler structure and faster convergence rate.
Our proposed YongImage converts malware analysis tasks into image classification problems, which do not rely on domain knowledge and complex feature extraction. Meanwhile, we use the thought of sentence-level classification in Natural Language Processing to establish and optimize our malVecNet. Compared to previous work, malVecNet has better theoretical interpretability and can be trained more effectively.
We use 10-fold cross-validation on Microsoft malware classification challenge dataset to evaluate our model. The results demonstrate that our model can achieve \(99.49\%\) accuracy with 0.022 log loss. Although our scheme is less precise than the winner’s, it makes an orders-of-magnitude performance boost. Compared with other related work, our model also outperforms most of them.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahmadi, M., Ulyanov, D., Semenov, S., Trofimov, M., Giacinto, G.: Novel feature extraction, selection and fusion for effective Malware family classification. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 183–194. ACM (2016)
Andrew Davis, M.W.: Deep learning on disassembly data. Internet (2015). https://www.blackhat.com/docs/us-15/materials/us-15-Davis-Deep-Learning-On-Disassembly.pdf
Drew, J., Hahsler, M., Moore, T.: Polymorphic Malware detection using sequence classification methods and ensembles. EURASIP J. Inf. Secur. 2017(1), 2 (2017)
Garcia, F.C.C., Muga, F.P.: Random forest for malware classification. Cryptography and Security (2016). arXiv
Intel: Intel® 64 and ia-32 architectures software developer’s manual, volume 2: Instruction set reference. Internet, September 2016. https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Kebede, T.M., Djaneye-Boundjou, O., Narayanan, B.N., Ralescu, A., Kapp, D.: Classification of malware programs using autoencoders based deep learning architecture and its application to the Microsoft Malware classification challenge (big 2015) dataset. In: 2017 IEEE National Aerospace and Electronics Conference (NAECON), pp. 70–75. IEEE (2017)
Kim, H.-J.: Image-based Malware classification using convolutional neural network. In: Park, J.J., Loia, V., Yi, G., Sung, Y. (eds.) CUTE/CSA -2017. LNEE, vol. 474, pp. 1352–1357. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-7605-3_215
Kim, Y.: Convolutional neural networks for sentence classification. In: Empirical Methods in Natural Language Processing, pp. 1746–1751 (2014)
King, G., Zeng, L.: Logistic regression in rare events data. Polit. Anal. 9(2), 137–163 (2001)
Microft: Sam cybersecurity engagement kit. Internet (2018). https://assets.microsoft.com/en-nz/cybersecurity-sam-engagement-kit.pdf
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, p. 4. ACM (2011)
Raff, E., Nicholas, C.: An alternative to NCD for large sequences, Lempel-Ziv Jaccard distance. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1007–1015. ACM (2017)
Ronen, R., Radu, M., Feuerstein, C.E., Yomtov, E., Ahmadi, M.: Microsoft Malware classification challenge. Cryptography and Security (2018). arXiv
Santurkar, S., Tsipras, D., Ilyas, A., Madry, A.: How does batch normalization help optimization? In: Advances in Neural Information Processing Systems, pp. 2483–2493 (2018)
Xiaozhou Wang, J.L., Chen, X.: Microsoft Malware classification challenge (big2015): First place team: Say no to overfitting. Internet (2015). https://github.com/xiaozhouwang/kaggle_Microsoft_Malware/blob/master/Saynotooverfitting.pdf
Yan, J., Qi, Y., Rao, Q.: Detecting Malware with an ensemble method based on deep neural network. Secur. Commun. Netw. 2018, 16 (2018)
Ye, Y., Li, T., Adjeroh, D.A., Iyengar, S.S.: A survey on Malware detection using data mining techniques. ACM Comput. Surv. 50(3), 41 (2017)
Yergeau, F.: UTF-8, a transformation format of ISO 10646. Technical report (2003)
Acknowledgment
This work is supported by the 2019 Science and Technology Project of SGCC “Security Protection Technology of Embedded Components and Control Units in Power System Terminal, No.2019GW-12”, National Key Research and Development Program of China (No. 2017YFB0802300), NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization (No. U1509219), and National Key Research and Development Program of China (No. 2018YFB0803500).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Jiang, Y., Li, S., Wu, Y., Zou, F. (2019). A Novel Image-Based Malware Classification Model Using Deep Learning. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11954. Springer, Cham. https://doi.org/10.1007/978-3-030-36711-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-36711-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36710-7
Online ISBN: 978-3-030-36711-4
eBook Packages: Computer ScienceComputer Science (R0)