Abstract
This paper distinguishes malware families from a specific category (i.e., ransomware) via dynamic analysis. We collect samples from four ransomware families and use Cuckoo sandbox environment, to observe their runtime behaviour. This study aims to provide new insight into malware family classification by comparing possible runtime features, and application of different extraction and selection techniques on them. As we try many extraction models on call traces such as bag-of-words, ngram sequences and wildcard patterns, we also look for other behavioural features such as files, registry and mutex artefacts. While wildcard patterns on call traces are designed to overcome advanced evasion strategies such as the insertion of junk API calls (causing ngram searches to fail), for the models generating too many features, we adapt new feature selection techniques with a classwise fashion to avoid unfair representation of families in the feature set which leads to poor detection performance. To our knowledge, no research paper has applied a classwise approach to the multi-class malware family identification. With a 96.05% correct classification ratio for four families, this study outperforms most studies applying similar techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
11 of the worst ransomware - we name the internet nastiest extortion malware - Gallery - Computerworld UK. https://goo.gl/wNDoL4
Cuckoo Sandbox: Automated Malware Analysis. https://cuckoosandbox.org/
Hunting the Mutex - Palo Alto Networks Blog. https://researchcenter.paloaltonetworks.com/2014/08/hunting-mutex/
TrendLabs Security Intelligence BlogPOWELIKS: Malware Hides In Windows Registry - TrendLabs Security Intelligence Blog. https://goo.gl/3nrgo7
Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004, COMPSAC 2004. vol. 2, pp. 41–42. IEEE (2004). https://doi.org/10.1109/CMPSAC.2004.1342667
Bayer, U., Kruegel, C., Kirda, E.: TTAnalyze: A tool for analyzing malware. In: 15th Annual Conference on European Institute for Computer Antivirus Research, pp. 180–192 (2006)
Canali, D., Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., Kirda, E.: A quantitative study of accuracy in system call-based malware detection. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis - ISSTA 2012, p. 122 (2012). https://doi.org/10.1145/2338965.2336768
Fukushima, Y., Sakai, A., Hori, Y., Sakurai, K.: A behavior based malware detection scheme for avoiding false positive. 2010 6th IEEE Workshop on Secure Network Protocols (NPSec), pp. 79–84 (2010)
Geden, M.: Ngram and signature based malware detection in android platform. Msc dissertation, University College London (2015). https://goo.gl/uKJsHv
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software. ACM SIGKDD Explor. 11(1), 10–18 (2009). https://doi.org/10.1145/1656274.1656278
Hansen, S.S., Larsen, T.M.T., Stevanovic, M., Pedersen, J.M.: An approach for detection and family classification of malware based on behavioral analysis. In: 2016 International Conference on Computing, Networking and Communications, ICNC 2016, pp. 1–5. IEEE (2016). https://doi.org/10.1109/ICCNC.2016.7440587
Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006). https://doi.org/10.1002/asi.20427
McAfee: McAfee Labs Threats Report March (2018). https://goo.gl/ZeugSV
Nair, V.P., Jain, H., Golecha, Y.K., Gaur, M.S., Laxmi, V.: MEDUSA: MEtamorphic malware dynamic analysis using signature from API. In: Proceedings of the 3rd International Conference on Security of Information and Networks - SIN 2010 (January), p. 263 (2010). https://doi.org/10.1145/1854099.1854152
Pirscoveanu, R., Hansen, S.S., Larsen, T., Stevanovic, M. Pedersen, J., Czech, A.: Analysis of malware behavior: type classification using machine learning. In: International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), pp. 1–7 (2015). https://doi.org/10.1109/CyberSA.2015.7166128
Reddy, D.K.S., Pujari, A.K.: N-gram analysis for computer virus detection. J. Comput. Virol. 2(3), 231–239 (2006)
Salehi, Z., Ghiasi, M., Sami, A.: A miner for malware detection based on API function calls and their arguments. In: The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012), pp. 563–568. IEEE, May 2012. https://doi.org/10.1109/AISP.2012.6313810
Sami, A., Yadegari, B., Peiravian, N., Hashemi, S., Hamze, A.: Malware detection based on mining API calls. In: Proceedings of the 2010 ACM Symposium on Applied Computing - SAC 2010, p. 1020 (2010). https://doi.org/10.1145/1774088.1774303
Schultz, M., Eskin, E., Zadok, F., Stolfo, S.: Data mining methods for detection of new malicious executables. In: Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001, pp. 38–49. IEEE Computer Society (2001). https://doi.org/10.1109/SECPRI.2001.924286
Sebastián, M., Rivera, R., Kotzias, P., Caballero, J.: AVclass: A tool for massive malware labeling. In: Monrose, F., Dacier, M., Blanc, G., Garcia-Alfaro, J. (eds.) RAID 2016. LNCS, vol. 9854, pp. 230–253. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45719-2_11
Shabtai, A., Fledel, Y., Elovici, Y.: Automated static code analysis for classifying android applications using machine learning. In: Proceedings - 2010 International Conference on Computational Intelligence and Security, CIS 2010, pp. 329–333 (2010). https://doi.org/10.1109/CIS.2010.77
Tsyganok, K., Tumoyan, E., Babenko, L., Anikeev, M.: Classification of polymorphic and metamorphic malware samples based on their behavior. In: Proceedings of the Fifth International Conference on Security of Information and Networks - SIN 2012, pp. 111–116 (2012). https://doi.org/10.1145/2388576.2388591
Uppal, D., Sinha, R., Mehra, V., Jain, V.: Malware detection and classification based on extraction of API sequences. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2337–2342. IEEE, September 2014. https://doi.org/10.1109/ICACCI.2014.6968547
Willems, C., Holz, T., Freiling, F.: Toward automated dynamic malware analysis using CWSandbox. IEEE Secur. Priv. Mag. 5(2), 32–39 (2007). https://doi.org/10.1109/MSP.2007.45
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Machine Learning-International Workshop Then Conference, pp. 412–420 (1997). https://doi.org/10.1093/bioinformatics/bth267
Ye, Y., Wang, D., Li, T., Ye, D., Jiang, Q.: An intelligent PE-malware detection system based on association mining. J. Comput. Virol. 4(4), 323–334 (2008). https://doi.org/10.1007/s11416-008-0082-4
Yerima, S.Y., Sezer, S., McWilliams, G.: Analysis of Bayesian classification-based approaches for android malware detection. IET Inf. Secur. 8(1), 25–36 (2014). https://doi.org/10.1049/iet-ifs.2013.0095
Zhang, P., Tan, Y.: Class-wise information gain. In: 2013 IEEE Third International Conference on Information Science and Technology (ICIST), pp. 972–978. IEEE, March 2013. https://doi.org/10.1109/ICIST.2013.6747700
Acknowledgements
We want to thank VirusTotal community for providing a private API to our research that enabled us to search for and download the ransomware samples.
Cuckoo reports (1.4GB) of the samples and framework’s source code: Reports: https://goo.gl/e8jbXq
Source code: https://bitbucket.org/msgeden/familyclassifier
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Geden, M., Happa, J. (2018). Classification of Malware Families Based on Runtime Behaviour. In: Castiglione, A., Pop, F., Ficco, M., Palmieri, F. (eds) Cyberspace Safety and Security. CSS 2018. Lecture Notes in Computer Science(), vol 11161. Springer, Cham. https://doi.org/10.1007/978-3-030-01689-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-01689-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01688-3
Online ISBN: 978-3-030-01689-0
eBook Packages: Computer ScienceComputer Science (R0)