Skip to main content

DepSim: A Dependency-Based Malware Similarity Comparison System

  • Conference paper
Book cover Information Security and Cryptology (Inscrypt 2010)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 6584))

Included in the following conference series:

Abstract

It is important for malware analysis that comparing unknown files to previously-known malicious samples to quickly characterize the type of behavior and generate signatures. Malware writers often use obfuscation, such as packing, junk-insertion and other means of techniques to thwart traditional similarity comparison methods. In this paper, we introduce DepSim, a novel technique for finding dependency similarities between malicious binary programs. DepSim constructs dependency graphs of control flow and data flow of the program by taint analysis, and then conducts similarity analysis using a new graph isomorphism technique. In order to promote the accuracy and anti-interference capability, we reduce redundant loops and remove junk actions at the dependency graph pre-processing phase, which can also greatly improve the performance of our comparison algorithm. We implemented a prototype of DepSim and evaluated it to malware in the wild. Our prototype system successfully identified some semantic similarities between malware and revealed their inner similarity in program logic and behavior. The results demonstrate that our technique is accurate.

Supported by the National Natural Science Foundation of China under Grant No. 60703076, 61073179; the National High-Tech Research and Development Plan of China under Grant No. 2007AA01Z451, 2009AA01Z435.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gao, D., Reiter, M., Song, D.: Binhunt: Automatically Finding Semantic Differences in Binary Programs. In: Proceedings of the International Conference on Information and Communications Security, pp. 238–255 (2008)

    Google Scholar 

  2. Wang, Z., Pierce, K., McFarling, S.: BMAT – a binary matching tool for stale profile propagation. The Journal of Instruction-Level Parallelism 2 (May 2000)

    Google Scholar 

  3. Microsoft Security Intelligence Report (January through June 2009), http://www.microsoft.com/downloads/details.aspx?FamilyID=037f3771-330e-4457-a52c-5b085dc0a4cd&displaylang=en

  4. Bayer, U., Kruegel, C., Kirda, E.: TTAnalyze: A Tool for Analyzing Malware. In: Proc. of the 15th European Institute for Computer Antivirus Research Annual Conference (April 2006)

    Google Scholar 

  5. Yin, H., Song, D., Egele, M., Kruegel, C., Kirda, E.: Panorama: capturing system-wide information flow for malware detection and analysis. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, Virginia, USA, Alexandria, October 28-31 (2007)

    Google Scholar 

  6. Bellard, F.: QEMU, a fast and portable dynamic translator. In: In Proc. of the USENIX Annual Technical Conference, pp. 41–46 (April 2005)

    Google Scholar 

  7. Dullien, T., Rolles, R.: Graph-based comparison of executable objects. In: Proceedings of SSTIC 2005 (2005)

    Google Scholar 

  8. Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, August 22-25 (2004)

    Google Scholar 

  9. Bilar, D.: Statistical Structures: Tolerant Fingerprinting for Classification and Analysis given at BH 2006, Las Vegas, NV. Blackhat Briefings, USA (August 2006)

    Google Scholar 

  10. Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 178–197. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  11. Lee, T., Mody, J.J.: Behavioral classification (2006), http://www.microsoft.com/downloads/details.aspx?FamilyID=7b5d8cc8-b336-4091-abb5-2cc500a6c41a&displaylang=en

  12. Bayer, U., Comparetti, P.M., Hlauscheck, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: Network and Distributed System Security Symposium, NDSS (2009)

    Google Scholar 

  13. Baker, B.S., Manber, U.: Deducing Similarities in Java Sources from Bytecodes, pp. 179–190 (1998)

    Google Scholar 

  14. Sreedhar, V.C., Gao, G.R., Lee, Y.-F.: Identifying loops using DJ graphs. ACM Transactions on Programming Languages and Systems (TOPLAS) 18(6), 649–658 (1996)

    Article  Google Scholar 

  15. Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behavior. In: Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, Dubrovnik, Croatia, September 03-07 (2007)

    Google Scholar 

  16. VXHeavens, http://www.netlux.org

  17. Lee, W., Stolfo, S.: Data mining approaches for intrusion detection. In: Proceedings of the 7th USENIX Security Symposium (1998)

    Google Scholar 

  18. Udis86, http://udis86.sourceforge.net/

  19. Christodorescu, M., Kinder, J., Jha, S., Katzenbeisser, S., Veith, H.: Malware normalization. Technical Report 1539, University of Wisconsin, Madison, Wisconsin, USA (November 2005)

    Google Scholar 

  20. Walenstein, A., Venable, M., Hayes, M., Thompson, C., Lakhotia, A.: Exploiting similarity between variants to defeat malware: “Vilo” method for comparing and searching binary programs. In: Proceedings of BlackHat, DC 2007 (2007)

    Google Scholar 

  21. Newsome, J., Song, D.: Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In: Proceedings of NDSS 2005, San Diego, California, USA (February 2005)

    Google Scholar 

  22. Willems, C., Holz, T., Freiling, F.: CWSandbox: Towards automated dynamic binary analysis. IEEE Security and Privacy 5(2) (2007)

    Google Scholar 

  23. Anubis: Analyzing Unknown Binaries, http://anubis.iseclab.org/

  24. Norman SandBox, http://www.norman.com/enterprise/all_products/malware_analyzer/norman_sandbox_analyzer/no

  25. NetSky Wiki, http://en.wikipedia.org/wiki/Netsky_%28computer_worm%29

  26. Lee, W., Stolfo, S.: Data mining approaches for intrusion detection. In: Proceedings of the 7th USENIX Security Symposium (1998)

    Google Scholar 

  27. Jordan, M.: Dealing with metamorphism. Virus Bulletin, 4–6 (October 2002)

    Google Scholar 

  28. Zhuge, J., Holz, T., Han, X., Guo, J., Zou, W.: Characterizing the IRC-based Botnet Phenomenon, Reihe Informatik Technical Report TR-2007-010 (December 2007)

    Google Scholar 

  29. Lingyun, Y., Purui, S., Dengguo, F., Xianggen, W., Yi, Y., Yu, L.: ReconBin: Reconstructing Binary File from Execution for Software Analysis. In: Proceedings of the 2009 Third IEEE International Conference on Secure Software Integration and Reliability Improvement, pp. 222–229 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yi, Y., Lingyun, Y., Rui, W., Purui, S., Dengguo, F. (2011). DepSim: A Dependency-Based Malware Similarity Comparison System. In: Lai, X., Yung, M., Lin, D. (eds) Information Security and Cryptology. Inscrypt 2010. Lecture Notes in Computer Science, vol 6584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21518-6_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21518-6_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21517-9

  • Online ISBN: 978-3-642-21518-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics