Abstract
Due to its seriously damage to computer and network, malware (short for malicious software) has caught the attention of both anti-malware companies and researchers for decades. Although signature-based detection is the most significant method used in commercial anti-malware, it fails to recognize new and unseen malware. To solve this problem, n-gram of the Opcodes, generated by disassembling the executables, is used to be the features for the classification process. However, many researches in the past set n small such as 1 or 2. In this paper, firstly, we use various n-gram size from 1 to 15. Then we compare different feature select methods. Lastly, we perform experiments with different MFP, short for malicious files percentage to demonstrate which setting is better.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Santos, I., Brezo, F., Ugarte-Pedrero, X., et al.: Opcode sequences as representation of executables for data mining based unknown malware detection. Inf. Sci. 231, 64–82 (2013). MLA
Griffin, K., Schneider, S., Hu, X., Chiueh, T.: Automatic generation of string signatures for malware detection (2009)
Ye, Y., Wang, D., Li, T. et al.: An intelligent PE-malware detection system based on association mining (2008)
Kuzurin, N., Shokurov, A., Varnovsky, N., Zakharov, V.: On the concept of software obfuscation in computer security. LNCS, vol. 4779, p. 281 (2007)
O’Kane, P., Sezer, S., McLaughlin, K.: Obfuscation-the hidden malware. IEEE Secur. Priv. 9(5), 41–47 (2011)
Cai, D., Theiler, J., Gokhale, M.: Detecting a malicious executable without prior knowledge of its patterns. In: Proceedings of the 2005 Defense and Security Symposium. Information Assurance, and Data Network Security, vol. 5812, pp. 1–12 (2005)
Schultz, M., Eskin, E., Zadok, F., Stolfo, S.: Data mining methods for detection of new malicious executables. In: Proceedings of the 2001 IEEE Symposium on Security and Privacy, pp. 38–49 (2001)
Wu, S., Wang, P., Li, X., Zhang, Y.: Effective detection of android malware based on the usage of data flow APIs and machine learning. In: Information and Software Technology, vol. 75, pp. 17–25 (2016)
Fan, Y., Ye, Y., Chen, L.: Malicious sequential pattern mining for automatic malware detection. Expert Syst. Appl. 52, 16–25 (2016)
Santos, I., Brezo, F., Nieves, J., et al.: Idea: opcode-sequence-based malware detection. LNCS, pp. 35–43 (2010)
Moskovitch, R., et al.: Unknown malcode detection using opcode representation, pp. 204–215 (2008)
Shabtai, A., Moskovitch, R., Feher, C., Dolev, S., Elovici, Y.: Detecting unknown malicious code by applying classification techniques on opcode patterns. Secur. Inf. 1, 1 (2012)
Santos, I., et al.: Idea: opcode-sequence-based malware detection, vol. 5965, pp. 35–43 (2010)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Acknowledgments
This work was supported by National Science Foundation of China (No. U1536122).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Li, P., Chen, Z., Cui, B. (2018). Detecting Malware Based on Opcode N-Gram and Machine Learning. In: Xhafa, F., Caballé, S., Barolli, L. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2017. Lecture Notes on Data Engineering and Communications Technologies, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-69835-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-69835-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69834-2
Online ISBN: 978-3-319-69835-9
eBook Packages: EngineeringEngineering (R0)