Malware Clustering Based on Called API During Runtime

Széles, Gergő János; Coleşa, Adrian

doi:10.1007/978-3-030-12085-6_10

Gergő János Széles^15,16 &
Adrian Coleşa¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11398))

Included in the following conference series:

International Workshop on Information and Operational Technology Security Systems

735 Accesses
2 Citations

Abstract

Malware growth was exponential in the last years, therefore it is a tedious work to manually analyze them in order to observe when a new strain appears. In this article we present a dynamic analysis system which clusters suspicious executable files in different malware families, based on the behavioral similarities their running processes exhibit thus reducing the workload of malware analysts. We identified similarities between our approach and the problem of text clustering based on topic, achieving similar results to text clustering without semantic analysis involved. We modeled the behavior of a process by extracting sequences of Windows API functions called by that process during its execution. We separated the registered API calls on three levels, based on their impact on the system, and dealt with them as text-like terms. More complex terms were constructed with N-grams and the features were represented with TF-IDF scores. We clustered the processes with variants of the k-means algorithm and derived a method for analyzing cluster characteristics in order to determine the best number of clusters to be considered. Finally, we identified the API level and N-gram lengths required to obtain relevant clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)
Google Scholar
AV-TEST: Number of malware throughout 2009–2018. https://www.av-test.org/en/statistics/malware/
Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: NDSS, vol. 9, pp. 8–11. Citeseer (2009)
Google Scholar
Bergeron, J., Debbabi, M., Desharnais, J., Erhioui, M.M., Lavoie, Y., Tawbi, N., et al.: Static detection of malicious code in executable programs. Int. J. Req. Eng. 2001(184–189), 79 (2001)
Google Scholar
Buchta, C., Kober, M., Feinerer, I., Hornik, K.: Spherical k-means clustering. J. Stat. Softw. 50(10), 1–22 (2012)
Google Scholar
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat.-Theory Methods 3(1), 1–27 (1974)
Article MathSciNet Google Scholar
Cavnar, W.B., Trenkle, J.M., et al.: N-gram-based text categorization. Ann arbor mi 48113(2), 161–175 (1994)
Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 2, 224–227 (1979)
Article Google Scholar
Galkovsky, M.: Dlls the dynamic way. MSDN Library Website (1999)
Google Scholar
Hassani, M., Seidl, T.: Using internal evaluation measures to validate the quality of diverse stream clustering algorithms. Vietnam J. Comput. Sci. 4(3), 171–183 (2017)
Article Google Scholar
Huang, A.: Similarity measures for text document clustering. In: Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC 2008), Christchurch, New Zealand, pp. 49–56 (2008)
Google Scholar
Khabia, A., Chandak, M.: A cluster based approach with n-grams at word level for document classification. Int. J. Comput. Appl. 117(23), 38–42 (2015)
Google Scholar
Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, pp. 7–15. Cambridge University Press, Cambridge (2014)
Book Google Scholar
Li, P., Liu, L., Gao, D., Reiter, M.K.: On challenges in evaluating malware clustering. In: Jha, S., Sommer, R., Kreibich, C. (eds.) RAID 2010. LNCS, vol. 6307, pp. 238–255. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15512-3_13
Chapter Google Scholar
Malwarebytes: Cybercrime tactics and techniques: Q1 2018. https://www.malwarebytes.com/pdf/white-papers/CTNT-Q1-2018.pdf
Perdisci, R., et al.: VAMO: towards a fully automated malware clustering validity analysis. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 329–338. ACM (2012)
Google Scholar
Qiao, Y., He, J., Yang, Y., Ji, L.: Analyzing malware by abstracting the frequent itemsets in API call sequences. In: 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp. 265–270. IEEE (2013)
Google Scholar
Ramos, J., et al.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 133–142 (2003)
Google Scholar
Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Secur. 19(4), 639–668 (2011)
Article Google Scholar
Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (2007)
Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article Google Scholar
Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1177–1178. ACM (2010)
Google Scholar
Shankarapani, M.K., Ramamoorthy, S., Movva, R.S., Mukkamala, S.: Malware detection using assembly and API call sequences. J. Comput. Virol. 7(2), 107–119 (2011)
Article Google Scholar
Willems, C., Holz, T., Freiling, F.: Toward automated dynamic malware analysis using cwsandbox. IEEE Secur. Privacy 5(2), 32–39 (2007)
Article Google Scholar

Download references

Acknowledgment

This work was supported by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS/CCCDI-UEFISCDI, project number PN-III-P2-2.1-PED-2016-2073, within PNCDI III.

Author information

Authors and Affiliations

Computer Science Department, Technical University of Cluj-Napoca, Cluj-Napoca, Romania
Gergő János Széles & Adrian Coleşa
Cyber Threat Proactive Defense Lab, Bitdefender, Cluj-Napoca, Romania
Gergő János Széles

Authors

Gergő János Széles
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Coleşa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Gergő János Széles or Adrian Coleşa .

Editor information

Editors and Affiliations

University of Patras, Patras, Greece
Apostolos P. Fournaris
University of Patras, Patras, Greece
Konstantinos Lampropoulos
Advanced Network Architectures Lab, Barcelona, Spain
Eva Marín Tordera

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Széles, G.J., Coleşa, A. (2019). Malware Clustering Based on Called API During Runtime. In: Fournaris, A., Lampropoulos, K., Marín Tordera, E. (eds) Information and Operational Technology Security Systems. IOSec 2018. Lecture Notes in Computer Science(), vol 11398. Springer, Cham. https://doi.org/10.1007/978-3-030-12085-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-12085-6_10
Published: 30 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12084-9
Online ISBN: 978-3-030-12085-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics