Skip to main content
Log in

Improvement of Malware Classification Using Hybrid Feature Engineering

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Polymorphic malware has evolved as a major threat in Computer Systems. Their creation technology is constantly evolving using sophisticated tactics to create multiple instances of the existing ones. Current solutions are not yet able to sufficiently address this problem. They are mostly signature based; however, a changing malware means a changing signature. They, therefore, easily evade detection. Classifying them into their respective families is also hard, thus making elimination harder. In this paper, we propose a new feature engineering (NFE) approach for a better classification of polymorphic malware based on a hybrid of structural and behavioural features. We use accuracy, recall, precision, and F score to evaluate our approach. We achieve an improvement of 12% on accuracy between raw features and NFE features. We also demonstrated the robustness of NFE on feature selection as compared to other feature selection techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. ELF is a mathematical function that converts a given input value into another cryptographic hash value of fixed length.

References

  1. Bhuiyan ZA, Wang T, Hayajneh T, Weiss GM. Maintaining the Balance between Privacy and Data Integrity in Internet of Things. In: Proceedings of the 2017 international conference on management engineering, software engineering and service sciences, 2017.

  2. McKenna B. Symantec’s Thompson pronounces old style IT security dead. Netw Secur. 2016;2:1–3.

    Google Scholar 

  3. Unuchek R, Sinitsyn F, Parinov D, Liskin A. IT threat evolution Q3 2017. Statistics. 2017. https://securelist.com/it-threat-evolution-q3-2017-statistics/83131/. Accessed 27 Nov 2017.

  4. Chau M, Alan Wang G, Chen H. A syntactic approach for detecting viral polymorphic malware variants. Lecture notes computer science (including its subseries lecture notes in artificial intelligence (LNAI) and lecture notes in bioinformatics), vol. 9650, no. April, 2016.

  5. Masabo E, Kaawaase KS, Sansa-otim J, Ngubiri J. A state of the art survey on polymorphic malware analysis and detection techniques. ICTACT J Soft Comput 2018;8(4):1762–74.

    Google Scholar 

  6. Kumar A, Kuppusamy KS, Aghila G. A learning model to detect maliciousness of portable executable using integrated feature set. J King Saud Univ Comput Inf Sci 2019;31(2):252–65.

    Article  Google Scholar 

  7. Jiang Q. A feature selection method for malware detection. In: Proceeding IEEE International Conference on Information and Automation, no. June, pp. 890–895, 2011.

  8. Lin C-T. Feature selection and extraction for malware classification. J Inf Sci Eng. 2015;31:965–92.

    Google Scholar 

  9. VanderPals J. Python data science handbook | python data science handbook. Sebastopol: O’Reilly; 2016.

    Google Scholar 

  10. Feffer S. It’s all about the features. 2017. https://www.reality.ai/single-post/2017/09/01/It-is-all-about-the-features. Accessed 22 Nov 2017.

  11. Dornhack H, Kadletz K, Luh R, Tavolato P. Malicious behavior patterns. In: 2014 IEEE 8th international symposium, pp. 384–389, 2014.

  12. Damodaran A, Di Troia F, Visaggio CA, Austin TH, Stamp M. A comparison of static, dynamic, and hybrid analysis for malware detection. J Comput Virol Hacking Tech. 2017;13(1):1–2.

    Article  Google Scholar 

  13. Naidu V. Using different substitution matrices in a string-matching technique for identifying viral polymorphic malware variants. In: 2016 IEEE congress on evolutionary computation (CEC), pp. 2903–2910, 2016.

  14. Narayanan A, Chen Y, Pang S, Tao B. The effects of different representations on static structure analysis of computer malware signatures. Sci World J. 2013;2013:671096.

    Article  Google Scholar 

  15. Drew J, Hahsler M, Moore T. Polymorphic malware detection using sequence classification methods and ensembles. EURASIP J Inf Secur. 2017;2017(1):2.

    Article  Google Scholar 

  16. Naidu V, Narayanan A. Needleman–Wunsch and Smith–Waterman Algorithms for Identifying Viral Polymorphic Malware Variants. In: 2016 IEEE 14th international conference on dependable, Autonomic and Secure Computing, 14th international conference on pervasive intelligence and computing, 2nd international conference on big data intelligence and computing and cyber science and technology congress, no. August, pp. 326–333, 2016.

  17. Sharma P, Kaur S, Arora J. An advanced approach to polymorphic/metamorphic malware detection using hybrid clustering approach. Int Res J Eng Technol. 2016;3(6):2229–32.

    Google Scholar 

  18. Arshi D, Singh M. Behavior analysis of malware using machine learning. In: 2015 eighth international conference on contemporary computing (IC3), 2015, pp. 481–486.

  19. Ahmadi M, Sami A, Rahimi H, Yadegari B. Malware detection by behavioural sequential patterns. Comput Fraud Secur. 2013;2013(8):11–9.

    Article  Google Scholar 

  20. Fraley JB, Figueroa M. Polymorphic malware detection using topological feature extraction with data mining. SoutheastCon. 2016;2016:1–7.

    Google Scholar 

  21. Kaur R, Singh M. Efficient hybrid technique for detecting zero-day polymorphic worms. In: Souvenir of the 2014 IEEE international advance computing conference, IACC, no. September 2011, pp. 95–100, 2014.

  22. Saleh M, Li T, Xu S. Multi-context features for detecting malicious programs. J Comput Virol Hacking Tech. 2018;14(2):181–93.

    Article  Google Scholar 

  23. Farrokhmanesh M, Hamzeh A. Music classification as a new approach for malware detection. J Comput Virol Hacking Tech. 2018;15:77–96.

    Article  Google Scholar 

  24. Gibert D, Mateu C, Planes J, Vicens R. Using convolutional neural networks for classification of malware represented as images. J Comput Virol Hacking Tech. 2018;15:15–28.

    Article  Google Scholar 

  25. Ramilli M. Malware training sets: a machine learning dataset for everyone. 2016. http://marcoramilli.blogspot.it/2016/12/malware-training-sets-machine-learning.html. Accessed 05 Oct 2017.

  26. Trinius P, Willems C, Holz T, Rieck K. A malware instruction set for behavior-based analysis. In: Sicherheit Schutz und Zuverlässigkeit SICHERHEIT, no. TR-2009-07, pp. 1–11, 2011.

  27. Truică CO, Boicea A, Trifan I. CRUD Operations in MongoDB. In: International conference on advanced computer science and information systems (ICACSEI 2013), no. ICACSEI, pp. 347–350, 2013.

  28. Willems K. Python exploratory data analysis tutorial. https://www.datacamp.com/community/tutorials/exploratory-data-analysis-python. Accessed 30 Nov 2017.

  29. Zaiontz C. Wilcoxon signed-ranks test. 2019. http://www.real-statistics.com/non-parametric-tests/wilcoxon-signed-ranks-test/. Accessed 25 Jul 2019.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emmanuel Masabo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Masabo, E., Kaawaase, K.S., Sansa-Otim, J. et al. Improvement of Malware Classification Using Hybrid Feature Engineering. SN COMPUT. SCI. 1, 17 (2020). https://doi.org/10.1007/s42979-019-0017-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-019-0017-9

Keywords

Navigation