Skip to main content

Malware Clustering Based on Called API During Runtime

  • Conference paper
  • First Online:
Book cover Information and Operational Technology Security Systems (IOSec 2018)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11398))

Abstract

Malware growth was exponential in the last years, therefore it is a tedious work to manually analyze them in order to observe when a new strain appears. In this article we present a dynamic analysis system which clusters suspicious executable files in different malware families, based on the behavioral similarities their running processes exhibit thus reducing the workload of malware analysts. We identified similarities between our approach and the problem of text clustering based on topic, achieving similar results to text clustering without semantic analysis involved. We modeled the behavior of a process by extracting sequences of Windows API functions called by that process during its execution. We separated the registered API calls on three levels, based on their impact on the system, and dealt with them as text-like terms. More complex terms were constructed with N-grams and the features were represented with TF-IDF scores. We clustered the processes with variants of the k-means algorithm and derived a method for analyzing cluster characteristics in order to determine the best number of clusters to be considered. Finally, we identified the API level and N-gram lengths required to obtain relevant clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)

    Google Scholar 

  2. AV-TEST: Number of malware throughout 2009–2018. https://www.av-test.org/en/statistics/malware/

  3. Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: NDSS, vol. 9, pp. 8–11. Citeseer (2009)

    Google Scholar 

  4. Bergeron, J., Debbabi, M., Desharnais, J., Erhioui, M.M., Lavoie, Y., Tawbi, N., et al.: Static detection of malicious code in executable programs. Int. J. Req. Eng. 2001(184–189), 79 (2001)

    Google Scholar 

  5. Buchta, C., Kober, M., Feinerer, I., Hornik, K.: Spherical k-means clustering. J. Stat. Softw. 50(10), 1–22 (2012)

    Google Scholar 

  6. Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat.-Theory Methods 3(1), 1–27 (1974)

    Article  MathSciNet  Google Scholar 

  7. Cavnar, W.B., Trenkle, J.M., et al.: N-gram-based text categorization. Ann arbor mi 48113(2), 161–175 (1994)

    Google Scholar 

  8. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 2, 224–227 (1979)

    Article  Google Scholar 

  9. Galkovsky, M.: Dlls the dynamic way. MSDN Library Website (1999)

    Google Scholar 

  10. Hassani, M., Seidl, T.: Using internal evaluation measures to validate the quality of diverse stream clustering algorithms. Vietnam J. Comput. Sci. 4(3), 171–183 (2017)

    Article  Google Scholar 

  11. Huang, A.: Similarity measures for text document clustering. In: Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC 2008), Christchurch, New Zealand, pp. 49–56 (2008)

    Google Scholar 

  12. Khabia, A., Chandak, M.: A cluster based approach with n-grams at word level for document classification. Int. J. Comput. Appl. 117(23), 38–42 (2015)

    Google Scholar 

  13. Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, pp. 7–15. Cambridge University Press, Cambridge (2014)

    Book  Google Scholar 

  14. Li, P., Liu, L., Gao, D., Reiter, M.K.: On challenges in evaluating malware clustering. In: Jha, S., Sommer, R., Kreibich, C. (eds.) RAID 2010. LNCS, vol. 6307, pp. 238–255. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15512-3_13

    Chapter  Google Scholar 

  15. Malwarebytes: Cybercrime tactics and techniques: Q1 2018. https://www.malwarebytes.com/pdf/white-papers/CTNT-Q1-2018.pdf

  16. Perdisci, R., et al.: VAMO: towards a fully automated malware clustering validity analysis. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 329–338. ACM (2012)

    Google Scholar 

  17. Qiao, Y., He, J., Yang, Y., Ji, L.: Analyzing malware by abstracting the frequent itemsets in API call sequences. In: 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp. 265–270. IEEE (2013)

    Google Scholar 

  18. Ramos, J., et al.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 133–142 (2003)

    Google Scholar 

  19. Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Secur. 19(4), 639–668 (2011)

    Article  Google Scholar 

  20. Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (2007)

    Google Scholar 

  21. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  Google Scholar 

  22. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  Google Scholar 

  23. Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1177–1178. ACM (2010)

    Google Scholar 

  24. Shankarapani, M.K., Ramamoorthy, S., Movva, R.S., Mukkamala, S.: Malware detection using assembly and API call sequences. J. Comput. Virol. 7(2), 107–119 (2011)

    Article  Google Scholar 

  25. Willems, C., Holz, T., Freiling, F.: Toward automated dynamic malware analysis using cwsandbox. IEEE Secur. Privacy 5(2), 32–39 (2007)

    Article  Google Scholar 

Download references

Acknowledgment

This work was supported by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS/CCCDI-UEFISCDI, project number PN-III-P2-2.1-PED-2016-2073, within PNCDI III.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Gergő János Széles or Adrian Coleşa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Széles, G.J., Coleşa, A. (2019). Malware Clustering Based on Called API During Runtime. In: Fournaris, A., Lampropoulos, K., Marín Tordera, E. (eds) Information and Operational Technology Security Systems. IOSec 2018. Lecture Notes in Computer Science(), vol 11398. Springer, Cham. https://doi.org/10.1007/978-3-030-12085-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-12085-6_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-12084-9

  • Online ISBN: 978-3-030-12085-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics