Malware Detection Based on Opcode Sequence and ResNet
Nowadays, it is challenging for traditional static malware detection method to keep pace with the rapid development of malware variants, therefore machine learning based malware detection approaches begin to flourish. Typically, operation codes disassembled from binary programs were sent to classifiers e.g. SVM and KNN for classification recognition. However, this feature extraction method does not make full use of sequence relations between opcodes, at the same time, the classification model still has less dimensions and lower matching ability. Therefore, a malware detection model based on residual network was proposed in this paper. Firstly, the model extracts the opcode sequences using the disassembler. To improve the vector’s expressibility of opcodes, Word2Vec strategy was used in the representation of opcodes, and word vector representations of opcodes were also optimized in the process of training iteration. Unfortunately, the overlapping opcode matrix and convolution operation results in information redundancies. To overcome this problem, a method of downsampling to organize opcode sequences into opcode matrix was adopted, which can effectively control the time and space complexity. In order to improve the classification ability of the model, a classifier with more layers and cross-layer connection was proposed to match malicious code in more dimensions based on ResNet. The experiment shows that the malware classification accuracy in this paper is 98.2%. At the same time, the processing time consumption comparing with traditional classifiers is still negligible.
KeywordsOpcode N-gram ResNet Word2vec
This work is supported by the Natural Science Foundation of Jiangsu Province for Excellent Young Scholars (BK20180080).
- 2.Abou-Assaleh, T., Cercone, N., Keselj, V., et al.: N-gram-based detection of new malicious code. In: Proceedings of the International Computer Software and Applications Conference. COMPSAC 2004, vol. 2, pp. 41–42. IEEE (2004)Google Scholar
- 4.Siddiqui, M., Wang, M.C., Lee, J.: Data mining methods for malware detection using instruction sequences. In: Iasted International Conference on Artificial Intelligence and Applications, pp. 358–363. ACTA Press (2008)Google Scholar
- 6.Divandari, H., Pechaz, B., Jahan, M.V.: Malware detection using Markov Blanket based on Opcode sequences. In: International Congress on Technology, Communication and Knowledge. IEEE (2016)Google Scholar
- 7.Kang, B.J., Yerima, S.Y., Mclaughlin, K., et al.: N-Opcode Analysis for Android Malware Classification and Categorization, 1–7 (2016)Google Scholar
- 9.Kim, Y.: Convolutional Neural Networks for Sentence Classification. Eprint Arxiv (2014)Google Scholar
- 10.Lee, Y.J., Choi, S.-H., Kim, C., Lim, S.-H., Park, K.-W.: Learning binary code with deep learning to detect software weakness (2017)Google Scholar
- 11.He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition, pp. 770–778 (2015)Google Scholar
- 12.Rasmus, A., Valpola, H., Honkala, M., et al.: Semi-supervised learning with ladder networks. Comput. Sci. 9 Suppl 1(1), 1–9 (2015)Google Scholar
- 13.Microsoft Malware. https://www.kaggle.com/c/malware-classification