Formal Methods in System Design

, Volume 41, Issue 1, pp 107–128 | Cite as

Recognizing malicious software behaviors with tree automata inference

  • Domagoj Babić
  • Daniel Reynaud
  • Dawn Song


We explore how formal methods and tools of the verification trade could be used for malware detection and analysis. In particular, we propose a new approach to learning and generalizing from observed malware behaviors based on tree automata inference. Our approach infers k-testable tree automata from system call dataflow dependency graphs. We show how inferred automata can be used for malware recognition and classification.


Tree automata inference Behavioral malware detection 



We are grateful to Matt Fredrikson and Somesh Jha for sharing their library of classified malware [14] with us. We thank Emiliano Martinez Contreras for giving us an account at that we used to double-check the classification, and for giving us support in using the API through which our experimentation scripts communicated with We would especially like to thank Lorenzo Martignoni, who wrote the libwst library [28] for extracting and parsing arguments of Windows’s system calls. We also thank reviewers for their insightful and constructive comments.


  1. 1.
    Aho AV, Sethi R, Ullman JD (1986) Compilers: principles, techniques, and tools. Addison-Wesley Longman, Boston Google Scholar
  2. 2.
    Babić D (2008) Exploiting structure for scalable software verification. PhD thesis, University of British, Columbia, Vancouver, Canada Google Scholar
  3. 3.
    Babić D, Reynaud D, Song D (2011) Malware analysis with tree automata inference. In: CAV’11: proceedings of the 23rd int conference on computer aided verification. Lecture notes in computer science, vol 6806. Springer, Berlin, pp 116–131 Google Scholar
  4. 4.
    Bonfante G, Kaczmarek M, Marion J-Y (2009) Architecture of a morphological malware detector. J Comput Virol 5:263–270 CrossRefGoogle Scholar
  5. 5.
    Brumley D, Hartwig C, Zhenkai Liang JN, Song D, Yin H (2008) Automatically identifying trigger-based behavior in malware. In: Botnet detection countering the largest security threat. Advances in information security, vol 36. Springer, Berlin, pp 65–88 Google Scholar
  6. 6.
    Chow J, Pfaff B, Garfinkel T, Christopher K, Rosenblum M (2004) Understanding data lifetime via whole system simulation. In: Proc of 13th USENIX security symposium Google Scholar
  7. 7.
    Christodorescu M, Jha S (2004) Testing malware detectors. In: ISSTA’04: proc of the 2004 ACM SIGSOFT int symp on software testing and analysis. ACM Press, New York, pp 34–44 CrossRefGoogle Scholar
  8. 8.
    Christodorescu M, Jha S, Seshia SA, Song D, Bryant RE (2005) Semantics-aware malware detection. In: SP’05: proc of the 2005 IEEE symp. on security and privacy. IEEE Computer Society Press, Los Alamitos, pp 32–46 Google Scholar
  9. 9.
    Christodorescu M, Jha S, Kruegel C (2007) Mining specifications of malicious behavior. In: Proc of the 6th joint meeting of the European software engineering conf and the ACM SIGSOFT symp on the foundations of software engineering. ACM Press, New York, pp 5–14 Google Scholar
  10. 10.
    Comon H, Dauchet M, Gilleron R, Löding C, Jacquemard F, Lugiez D, Tison S, Tommasi M (2007) Tree automata techniques and applications.
  11. 11.
    Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms, 2nd edn. The MIT Press, Cambridge MATHGoogle Scholar
  12. 12.
    Crandall J, Chong F (2005) Minos: control data attack prevention orthogonal to memory model. In: Proc of the 37th int symp on microarchitecture. IEEE Press, New York, pp 221–232 Google Scholar
  13. 13.
    Forrest S, Hofmeyr SA, Somayaji A, Longstaff TA (1996) A sense of self for Unix processes. In: Proc of the 1996 IEEE symp on security and privacy. IEEE Computer Society Press, Los Alamitos, pp 120–129 CrossRefGoogle Scholar
  14. 14.
    Fredrikson M, Jha S, Christodorescu M, Sailer R, Yan X (2010) Synthesizing near-optimal malware specifications from suspicious behaviors. In: Proc of the 2010 IEEE symposium on security and privacy. IEEE Computer Society Press, Los Alamitos, pp 45–60 CrossRefGoogle Scholar
  15. 15.
    García P (1993) Learning k-testable tree sets from positive data. Technical report, Dept. Syst. Inform. Comput., Univ. Politecnica Valencia, Valencia, Spain Google Scholar
  16. 16.
    García P, Vidal E (1990) Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Trans Pattern Anal Mach Intell 12:920–925 CrossRefGoogle Scholar
  17. 17.
    Godefroid P, Klarlund N, Sen K (2005) DART: directed automated random testing. In: PLDI’05: proc of the ACM SIGPLAN conf on prog lang design and implementation. ACM Press, New York, pp 213–223 Google Scholar
  18. 18.
    Gold EM (1978) Complexity of automaton identification from given data. Inf Control 37(3):302–320 MathSciNetMATHCrossRefGoogle Scholar
  19. 19.
    Holzer A, Kinder J, Veith H (2007) Using verification technology to specify and detect malware. In: Lecture notes in computer science, vol 4739. Springer, Berlin, pp 497–504 Google Scholar
  20. 20.
    Kang MG, McCamant S, Poosankam P, Song D (2011) DTA++: dynamic taint analysis with targeted control-flow propagation. In: Proceedings of the 18th annual network and distributed system security symposium, San Diego, CA Google Scholar
  21. 21.
    Khoussainov B, Nerode A (2001) Automata theory and its applications. Birkhauser, Basel MATHCrossRefGoogle Scholar
  22. 22.
    Kinder J, Katzenbeisser S, Schallhart C, Veith H (2005) Detecting malicious code by model checking. In: Julisch K, Krügel C (eds) GI SIG SIDAR conference on detection of intrusions and malware and vulnerability assessment. Lecture notes in computer science, vol 3548. Springer, Berlin, pp 174–187 CrossRefGoogle Scholar
  23. 23.
    King JC (1976) Symbolic execution and program testing. Commun ACM 19(7):385–394 MATHCrossRefGoogle Scholar
  24. 24.
    Knuutila T (1993) Inference of k-testable tree languages. In: Bunke H (ed) Advances in structural and syntactic pattern recognition: proc of the int workshop. World Scientific, Singapore, pp 109–120 Google Scholar
  25. 25.
    Kolbitsch C, Milani P, Kruegel C, Kirda E, Zhou X, Wang X (2009) Effective and efficient malware detection at the end host. In: The 18th USENIX security symposium Google Scholar
  26. 26.
    López D, Sempere JM, García P (2004) Inference of reversible tree languages. IEEE Trans Syst Man Cybern, Part B, Cybern 34(4):1658–1665 CrossRefGoogle Scholar
  27. 27.
    Luk C, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi V, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: PLDI’05: proc of the 2005 ACM SIGPLAN conf on prog lang design and impl. ACM Press, New York, pp 190–200 Google Scholar
  28. 28.
    Martignoni L, Paleari R (2010) The libwst library (a Part of WUSSTrace) Google Scholar
  29. 29.
    Matrosov A, Rodionov E, Harley D, Malcho J (2010) Stuxnet under the microscope. Technical report, Eset Google Scholar
  30. 30.
    Moser A, Kruegel C, Kirda E (2007) Exploring multiple execution paths for malware analysis. In: SP’07: proc of the 2007 IEEE symposium on security and privacy, Washington, DC, USA. IEEE Computer Society Press, Los Alamitos, pp 231–245 Google Scholar
  31. 31.
    Newsome J, Song D (2005) Dynamic taint analysis: automatic detection, analysis, and signature generation of exploit attacks on commodity software. In: Proc of the network and distributed systems security symposium Google Scholar
  32. 32.
    Suh G, Lee J, Zhang D, Devadas S (2004) Secure program execution via dynamic information flow tracking. Oper Syst Rev 38(5):85–96 CrossRefGoogle Scholar
  33. 33.
    Symantec (2010) Symantec global internet security threat report: trends for 2009, vol xv. Technical report, Symantec, April Google Scholar
  34. 34.
    Wagner D, Dean D (2001) Intrusion detection via static analysis. In: Proc of the 2001 IEEE symposium on security and privacy. IEEE Computer Society Press, Los Alamitos, p 156 Google Scholar
  35. 35.
    Wagner D, Soto P (2002) Mimicry attacks on host-based intrusion detection systems. In: CCS’02: proc of the 9th ACM conf on comp and comm security. ACM Press, New York, pp 255–264 CrossRefGoogle Scholar
  36. 36.
    You I, Yim K (2010) Malware obfuscation techniques: a brief survey. In: Int conf on broadband, wireless computing, communication and applications, pp 297–300 CrossRefGoogle Scholar
  37. 37.
    Zalcstein Y (1972) Locally testable languages. J Comput Syst Sci 6(2):151–167 MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Computer Science DivisionUniversity of CaliforniaBerkeleyUSA

Personalised recommendations