Skip to main content

Compiler Provenance Attribution

  • Chapter
  • First Online:
Binary Code Fingerprinting for Cybersecurity

Abstract

Compiler identification is an essential component of binary toolchain analysis with a multitude of applications in reverse engineering and malware analysis . Security investigators and cyber incident responders are often tasked with the analysis and attribution of binary files obtained from malicious campaigns which need to be inspected quickly and reliably. Such binaries can be a source of intelligence on adversary tactics, techniques, and procedures. Compiler provenance information can aid binary analysis by uncovering fingerprints of the development environment and related libraries, leading to an accelerated analysis. In this chapter, we present BinComp, which provides a practical approach for analyzing the syntax, structure, and semantics of disassembled functions to extract compiler provenance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/BinSigma/BinComp/tree/master/Dataset.

References

  1. BinSourcerer. https://github.com/BinSigma/BinSourcerer. Accessed: April, 2015.

  2. GNU GCC Internals—Machine Descriptions and Instruction Patterns. https://gcc.gnu.org/onlinedocs/gccint/Machine-Desc.html. Accessed: October 2017.

  3. Google Code Jam Contest Dataset. http://code.google.com/codejam/, 2008–2017. Accessed: February, 2018.

  4. The PEiD tool. Available from:. http://www.woodmann.com/collaborative/tools/index.php/PEiD, 2017. Accessed: May, 2016.

  5. Hex-Rays IDA Pro. https://www.hex-rays.com/products/ida/, 2019. Accessed: June 2019.

  6. RDG_Packer_Detector. http://www.rdgsoft.net/, 2019. Accessed: June 2019.

  7. Saed Alrabaee, Noman Saleem, Stere Preda, Lingyu Wang, and Mourad Debbabi. OBA2: an onion approach to binary code authorship attribution. Digital Investigation, 11:S94–S103, 2014.

    Article  Google Scholar 

  8. Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. SIGMA: a semantic integrated graph matching approach for identifying reused functions in binary code. Digital Investigation, 12:S61–S71, 2015.

    Article  Google Scholar 

  9. Gogul Balakrishnan and Thomas Reps. WYSINWYX: What you see is not what you eXecute. ACM Transactions on Programming Languages and Systems (TOPLAS), 32(6):23, 2010.

    Google Scholar 

  10. Tobias JK Edler von Koch, Björn Franke, Pranav Bhandarkar, and Anshuman Dasgupta. Exploiting function similarity for code size reduction. ACM SIGPLAN Notices, 49(5):85–94, 2014.

    Article  Google Scholar 

  11. Mohammad Reza Farhadi, Benjamin Fung, Philippe Charland, and Mourad Debbabi. BinClone: detecting code clones in malware. In Eighth International Conference on Software Security and Reliability (SERE), pages 78–87. IEEE, 2014.

    Google Scholar 

  12. Fredrik Farnstrom, James Lewis, and Charles Elkan. Scalability for clustering algorithms revisited. SIGKDD explorations, 2(1):51–57, 2000.

    Article  Google Scholar 

  13. Hugo Gascon, Fabian Yamaguchi, Daniel Arp, and Konrad Rieck. Structural detection of Android malware using embedded call graphs. In Proceedings of the 2013 ACM workshop on Artificial intelligence and security, pages 45–54. ACM, 2013.

    Google Scholar 

  14. Greg Hamerly and Charles Elkan. Learning the k in k-means. In Advances in neural information processing systems, pages 281–288, 2004.

    Google Scholar 

  15. Shohei Hido and Hisashi Kashima. A linear-time graph kernel. In Ninth IEEE International Conference on Data Mining (ICDM’09), pages 179–188. IEEE, 2009.

    Google Scholar 

  16. Emily R Jacobson, Nathan Rosenblum, and Barton P Miller. Labeling library functions in stripped binaries. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools (PASTE), pages 1–8. ACM, 2011.

    Google Scholar 

  17. Lina Nouh, Ashkan Rahimian, Djedjiga Mouheb, Mourad Debbabi, and Aiman Hanna. BinSign: fingerprinting binary functions to support automated analysis of code executables. In IFIP International Conference on ICT Systems Security and Privacy Protection, pages 341–355. Springer, 2017.

    Google Scholar 

  18. Ashkan Rahimian, Philippe Charland, Stere Preda, and Mourad Debbabi. RESource: a framework for online matching of assembly with open source code. In International Symposium on Foundations and Practice of Security, pages 211–226. Springer, 2012.

    Google Scholar 

  19. Nathan Rosenblum, Barton P Miller, and Xiaojin Zhu. Recovering the toolchain provenance of binary code. In Proceedings of the International Symposium on Software Testing and Analysis, pages 100–110. ACM, 2011.

    Google Scholar 

  20. Nathan Rosenblum, Xiaojin Zhu, and Barton P Miller. Who wrote this code? identifying the authors of program binaries. In European Symposium on Research in Computer Security (ESORICS), pages 172–189. Springer, 2011.

    Google Scholar 

  21. Nathan E Rosenblum, Barton P Miller, and Xiaojin Zhu. Extracting compiler provenance from program binaries. In Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, pages 21–28. ACM, 2010.

    Google Scholar 

  22. Brian Ruttenberg, Craig Miles, Lee Kellogg, Vivek Notani, Michael Howard, Charles LeDoux, Arun Lakhotia, and Avi Pfeffer. Identifying shared software components to support malware forensics. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), pages 21–40. Springer, 2014.

    Google Scholar 

  23. Saša Stojanović, Zaharije Radivojević, and Miloš Cvetanović. Approach for estimating similarity between procedures in differently compiled binaries. Information and Software Technology, 58:259–271, 2015.

    Article  Google Scholar 

  24. Annie H Toderici and Mark Stamp. Chi-squared distance and metamorphic virus detection. Journal of Computer Virology and Hacking Techniques, 9(1):1–14, 2013.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Alrabaee, S. et al. (2020). Compiler Provenance Attribution. In: Binary Code Fingerprinting for Cybersecurity. Advances in Information Security, vol 78. Springer, Cham. https://doi.org/10.1007/978-3-030-34238-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34238-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34237-1

  • Online ISBN: 978-3-030-34238-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics