Abstract
Each and every day, malicious software writers continue to create new variants, new innovation, new infection, and more obfuscated malware. Malicious software classification and detection play an important role and big challenge for cyber security research. Due to the increasing rate of false alarm, the accurate classification and detection of malware is a big necessity issue to be solved. This approach provides the classification system to differentiate malware from cleanware. This paper also contributes the prominent feature extraction for Windows API (application programming interface) calls and important feature selection to discriminate malware and cleanware. Chi-square and principal component analysis (PCA) attribute selection methods have been applied in this proposed system. N-gram approach is also applied to construct the sequence of malware API features. Classification algorithms like K-nearest neighbor and random forest (RF) are used for classifying malware and cleanware executable files. The proposed system provides the accuracy of 99% on unigram- and bigram-selected API features using χ2 and PCA. The proposed approach is able to identify the malicious executable files and cleanware effectively.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
G. Tahan, L. Rokach, Y. Shahar, Automatic malware detection using common segment analysis and meta-features. J. Mach. Learn. Res. 13, 949–979 (2012)
R. Lo, K. Levitt, R. Olsson, Mcf: A malicious code filter. Comput. Secur. 14, 541–566 (1995)
M. Ahmadi, D. Ulyanov, S. Semenov, M. Trofimov, G. Giacinto, Novel feature extraction, selection and fusion for effective malware family classification. In Proceedings of the sixth ACM conference on data and application security and privacy, ACM, March, 2016, pp. 183–194
A. Kumar, A framework for malware detection with static features using machine learning algorithms, Doctoral dissertation, Department of Computer Science, Pondicherry University, 2017
A. Pektaş, Behavior based malware classification using online machine learning, Doctoral dissertation, Grenoble Alpes, 2015
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, J. Vanderplas, Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
J.Y.C. Cheng, T.S. Tsai, C.S. Yang, An information retrieval approach for malware classification based on windows API calls. 2013 International Conference on Machine Learning and Cybernetics, vol. 4, IEEE, 2013, July, pp. 1678–1683
M. Souppaya, K. Scarfone, Guide to Malware Incident Prevention and Handling for Desktops and Laptops (National Institute of Standards and Technology, Gaithersburg, 2013)
G. Liang, J. Pang, C. Dai, A behavior-based malware variant classification technique. Int. J. Inf. Educ. Technol. 6, 291–295 (2016)
H.S. Galal, Y.B. Mahdy, M.A. Atiea, Behavior-based features model for malware detection. J. Comput. Virol. Hacking Tech. 12(2), 59–67 (2016)
Y. Ki, E. Kim, H.K. Kim, A novel approach to detect malware based on API call sequence analysis. Int. J. Distrib. Sens. Netw. 2015(6: 659101), 1–9 (2015)
C.-I. Fan, H.-W. Hsiao, C.-H. Chou, Y.-F. Tseng, Malware detection systems based on API log data mining. 2015 IEEE 39th Annual Computer Software and Applications Conference, 2015, pp. 255–260
M.A. Jerlin, K. Marimuthu, A new malware detection system using machine learning techniques for API call sequences. J. Appl. Secur. Res. 13(1), 45–62 (2018)
R.S. Pirscoveanu, S.S. Hansen, T.M. Larsen, M. Stevanovic, J.M. Pedersen, A. Czech, Analysis of malware behavior: Type classification using machine learning. 2015 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), IEEE, 2015, June, pp. 1–7
A. Ninyesiga, J. Ngubiri, Malware classification using API system calls. Int. J. Technol. Manag. 3(2), 9–9 (2018)
S. Banin, G.O. Dyrkolbotn, Multinomial malware classification via low-level features. Digit. Investig. 26, S107–S117 (2018)
A.G. Kakisim, M. Nar, N. Carkaci, I. Sogukpinar, Analysis and evaluation of dynamic feature-based malware detection methods, in International Conference on Security for Information Technology and Communications, (Springer, Cham, 2018), pp. 247–258
D. Komashinskiy, I. Kotenko, Malware detection by data mining techniques based on positionally dependent features. 2010 18th Euromicro conference on parallel, distributed and network-based processing, IEEE, 2010, pp. 617–623
V. Moonsamy, R. Tian, L. Batten, Feature reduction to speed up malware classification, in Nordic Conference on Secure IT Systems, (Springer, Berlin, Heidelberg, 2011), pp. 176–188
M. Mays, N. Drabinsky, S. Brandle, Feature selection for malware classification, In MAICS, 2017, pp. 165–170
M.M. Masud, L. Khan, B. Thuraisingham, A hybrid model to detect malicious executables. 2007 IEEE International Conference on Communications, IEEE, 2007, pp. 1443–1448
C. Cepeda, D.L.C. Tien, P. Ordóñez, Feature selection and improving classification performance for malware detection. 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom)(BDCloud-SocialCom-SustainCom), IEEE, 2016, pp. 560–566
M. Belaoued, S. Mazouzi, A chi-square-based decision for real-time malware detection using PE-file features. J. Inf. Process. Syst. 12(4) (2016)
C.T. Lin, N.J. Wang, H. Xiao, C. Eckert, Feature selection and extraction for malware classification. J. Inf. Sci. Eng. 31(3), 965–992 (2015)
C.C. San, M.M.S. Thwin, N.L. Htun, Malicious software family classification using machine learning multi-class classifiers, in Computational Science and Technology: 5th ICCST 2018, Lecture Notes in Electrical Engineering, vol. 481, (Springer, Singapore, 2018), pp. 423–433
D. Sarkar, R. Bali, T. Sharma, Practical Machine Learning with Python: A Problem-Solver’s Guide to Building Real-World Intelligent Systems, 1st edn. (Apress, Berkely, 2017)
G. Hackeling, Mastering Machine Learning with Scikit-Learn (Packt Publishing Ltd, Birmingham B3 2PB, UK, 2017)
Y. Qi, Random forest for bioinformatics, http://www.cs.cmu.edu/
S. Raschka, Python Machine Learning (Packt Publishing Ltd, Birmingham B3 2PB, UK, 2015)
M. Khan, S.M.K. Quadri, Effects of using filter based feature selection on the performance of machine learners using different datasets. Bharati vidyapeeth’s institute of computer applications and management’s International Journal of Information Technology 5 (2013)
Z. Salehi, M. Ghiasi, A. Sami, A miner for malware detection based on API function calls and their arguments. The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012), IEEE, 2012, pp. 563–568
C. Guarnieri, A. Tanasi, J. Bremer, M. Schloesser, The Cuckoo Sandbox, 2012
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
San, C.C., Thwin, M.M.S. (2019). Proposed Effective Feature Extraction and Selection for Malicious Software Classification. In: Sinha, G. (eds) Advances in Biometrics. Springer, Cham. https://doi.org/10.1007/978-3-030-30436-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-30436-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30435-5
Online ISBN: 978-3-030-30436-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)