Skip to main content

Proposed Effective Feature Extraction and Selection for Malicious Software Classification

  • Chapter
  • First Online:
  • 373 Accesses

Abstract

Each and every day, malicious software writers continue to create new variants, new innovation, new infection, and more obfuscated malware. Malicious software classification and detection play an important role and big challenge for cyber security research. Due to the increasing rate of false alarm, the accurate classification and detection of malware is a big necessity issue to be solved. This approach provides the classification system to differentiate malware from cleanware. This paper also contributes the prominent feature extraction for Windows API (application programming interface) calls and important feature selection to discriminate malware and cleanware. Chi-square and principal component analysis (PCA) attribute selection methods have been applied in this proposed system. N-gram approach is also applied to construct the sequence of malware API features. Classification algorithms like K-nearest neighbor and random forest (RF) are used for classifying malware and cleanware executable files. The proposed system provides the accuracy of 99% on unigram- and bigram-selected API features using χ2 and PCA. The proposed approach is able to identify the malicious executable files and cleanware effectively.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://tracker.virusshare.com:6969/

References

  1. G. Tahan, L. Rokach, Y. Shahar, Automatic malware detection using common segment analysis and meta-features. J. Mach. Learn. Res. 13, 949–979 (2012)

    Google Scholar 

  2. R. Lo, K. Levitt, R. Olsson, Mcf: A malicious code filter. Comput. Secur. 14, 541–566 (1995)

    Article  Google Scholar 

  3. M. Ahmadi, D. Ulyanov, S. Semenov, M. Trofimov, G. Giacinto, Novel feature extraction, selection and fusion for effective malware family classification. In Proceedings of the sixth ACM conference on data and application security and privacy, ACM, March, 2016, pp. 183–194

    Google Scholar 

  4. A. Kumar, A framework for malware detection with static features using machine learning algorithms, Doctoral dissertation, Department of Computer Science, Pondicherry University, 2017

    Google Scholar 

  5. A. Pektaş, Behavior based malware classification using online machine learning, Doctoral dissertation, Grenoble Alpes, 2015

    Google Scholar 

  6. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, J. Vanderplas, Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    Google Scholar 

  7. J.Y.C. Cheng, T.S. Tsai, C.S. Yang, An information retrieval approach for malware classification based on windows API calls. 2013 International Conference on Machine Learning and Cybernetics, vol. 4, IEEE, 2013, July, pp. 1678–1683

    Google Scholar 

  8. M. Souppaya, K. Scarfone, Guide to Malware Incident Prevention and Handling for Desktops and Laptops (National Institute of Standards and Technology, Gaithersburg, 2013)

    Book  Google Scholar 

  9. G. Liang, J. Pang, C. Dai, A behavior-based malware variant classification technique. Int. J. Inf. Educ. Technol. 6, 291–295 (2016)

    Google Scholar 

  10. H.S. Galal, Y.B. Mahdy, M.A. Atiea, Behavior-based features model for malware detection. J. Comput. Virol. Hacking Tech. 12(2), 59–67 (2016)

    Article  Google Scholar 

  11. Y. Ki, E. Kim, H.K. Kim, A novel approach to detect malware based on API call sequence analysis. Int. J. Distrib. Sens. Netw. 2015(6: 659101), 1–9 (2015)

    Google Scholar 

  12. C.-I. Fan, H.-W. Hsiao, C.-H. Chou, Y.-F. Tseng, Malware detection systems based on API log data mining. 2015 IEEE 39th Annual Computer Software and Applications Conference, 2015, pp. 255–260

    Google Scholar 

  13. M.A. Jerlin, K. Marimuthu, A new malware detection system using machine learning techniques for API call sequences. J. Appl. Secur. Res. 13(1), 45–62 (2018)

    Article  Google Scholar 

  14. R.S. Pirscoveanu, S.S. Hansen, T.M. Larsen, M. Stevanovic, J.M. Pedersen, A. Czech, Analysis of malware behavior: Type classification using machine learning. 2015 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), IEEE, 2015, June, pp. 1–7

    Google Scholar 

  15. A. Ninyesiga, J. Ngubiri, Malware classification using API system calls. Int. J. Technol. Manag. 3(2), 9–9 (2018)

    Google Scholar 

  16. S. Banin, G.O. Dyrkolbotn, Multinomial malware classification via low-level features. Digit. Investig. 26, S107–S117 (2018)

    Article  Google Scholar 

  17. A.G. Kakisim, M. Nar, N. Carkaci, I. Sogukpinar, Analysis and evaluation of dynamic feature-based malware detection methods, in International Conference on Security for Information Technology and Communications, (Springer, Cham, 2018), pp. 247–258

    Google Scholar 

  18. D. Komashinskiy, I. Kotenko, Malware detection by data mining techniques based on positionally dependent features. 2010 18th Euromicro conference on parallel, distributed and network-based processing, IEEE, 2010, pp. 617–623

    Google Scholar 

  19. V. Moonsamy, R. Tian, L. Batten, Feature reduction to speed up malware classification, in Nordic Conference on Secure IT Systems, (Springer, Berlin, Heidelberg, 2011), pp. 176–188

    Google Scholar 

  20. M. Mays, N. Drabinsky, S. Brandle, Feature selection for malware classification, In MAICS, 2017, pp. 165–170

    Google Scholar 

  21. M.M. Masud, L. Khan, B. Thuraisingham, A hybrid model to detect malicious executables. 2007 IEEE International Conference on Communications, IEEE, 2007, pp. 1443–1448

    Google Scholar 

  22. C. Cepeda, D.L.C. Tien, P. Ordóñez, Feature selection and improving classification performance for malware detection. 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom)(BDCloud-SocialCom-SustainCom), IEEE, 2016, pp. 560–566

    Google Scholar 

  23. M. Belaoued, S. Mazouzi, A chi-square-based decision for real-time malware detection using PE-file features. J. Inf. Process. Syst. 12(4) (2016)

    Google Scholar 

  24. C.T. Lin, N.J. Wang, H. Xiao, C. Eckert, Feature selection and extraction for malware classification. J. Inf. Sci. Eng. 31(3), 965–992 (2015)

    Google Scholar 

  25. C.C. San, M.M.S. Thwin, N.L. Htun, Malicious software family classification using machine learning multi-class classifiers, in Computational Science and Technology: 5th ICCST 2018, Lecture Notes in Electrical Engineering, vol. 481, (Springer, Singapore, 2018), pp. 423–433

    Chapter  Google Scholar 

  26. D. Sarkar, R. Bali, T. Sharma, Practical Machine Learning with Python: A Problem-Solver’s Guide to Building Real-World Intelligent Systems, 1st edn. (Apress, Berkely, 2017)

    Google Scholar 

  27. G. Hackeling, Mastering Machine Learning with Scikit-Learn (Packt Publishing Ltd, Birmingham B3 2PB, UK, 2017)

    Google Scholar 

  28. Y. Qi, Random forest for bioinformatics, http://www.cs.cmu.edu/

  29. S. Raschka, Python Machine Learning (Packt Publishing Ltd, Birmingham B3 2PB, UK, 2015)

    Google Scholar 

  30. M. Khan, S.M.K. Quadri, Effects of using filter based feature selection on the performance of machine learners using different datasets. Bharati vidyapeeth’s institute of computer applications and management’s International Journal of Information Technology 5 (2013)

    Google Scholar 

  31. Z. Salehi, M. Ghiasi, A. Sami, A miner for malware detection based on API function calls and their arguments. The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012), IEEE, 2012, pp. 563–568

    Google Scholar 

  32. C. Guarnieri, A. Tanasi, J. Bremer, M. Schloesser, The Cuckoo Sandbox, 2012

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cho Cho San .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

San, C.C., Thwin, M.M.S. (2019). Proposed Effective Feature Extraction and Selection for Malicious Software Classification. In: Sinha, G. (eds) Advances in Biometrics. Springer, Cham. https://doi.org/10.1007/978-3-030-30436-2_3

Download citation

Publish with us

Policies and ethics