Abstract
Compiler identification is an essential component of binary toolchain analysis with a multitude of applications in reverse engineering and malware analysis . Security investigators and cyber incident responders are often tasked with the analysis and attribution of binary files obtained from malicious campaigns which need to be inspected quickly and reliably. Such binaries can be a source of intelligence on adversary tactics, techniques, and procedures. Compiler provenance information can aid binary analysis by uncovering fingerprints of the development environment and related libraries, leading to an accelerated analysis. In this chapter, we present BinComp, which provides a practical approach for analyzing the syntax, structure, and semantics of disassembled functions to extract compiler provenance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
BinSourcerer. https://github.com/BinSigma/BinSourcerer. Accessed: April, 2015.
GNU GCC Internals—Machine Descriptions and Instruction Patterns. https://gcc.gnu.org/onlinedocs/gccint/Machine-Desc.html. Accessed: October 2017.
Google Code Jam Contest Dataset. http://code.google.com/codejam/, 2008–2017. Accessed: February, 2018.
The PEiD tool. Available from:. http://www.woodmann.com/collaborative/tools/index.php/PEiD, 2017. Accessed: May, 2016.
Hex-Rays IDA Pro. https://www.hex-rays.com/products/ida/, 2019. Accessed: June 2019.
RDG_Packer_Detector. http://www.rdgsoft.net/, 2019. Accessed: June 2019.
Saed Alrabaee, Noman Saleem, Stere Preda, Lingyu Wang, and Mourad Debbabi. OBA2: an onion approach to binary code authorship attribution. Digital Investigation, 11:S94–S103, 2014.
Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. SIGMA: a semantic integrated graph matching approach for identifying reused functions in binary code. Digital Investigation, 12:S61–S71, 2015.
Gogul Balakrishnan and Thomas Reps. WYSINWYX: What you see is not what you eXecute. ACM Transactions on Programming Languages and Systems (TOPLAS), 32(6):23, 2010.
Tobias JK Edler von Koch, Björn Franke, Pranav Bhandarkar, and Anshuman Dasgupta. Exploiting function similarity for code size reduction. ACM SIGPLAN Notices, 49(5):85–94, 2014.
Mohammad Reza Farhadi, Benjamin Fung, Philippe Charland, and Mourad Debbabi. BinClone: detecting code clones in malware. In Eighth International Conference on Software Security and Reliability (SERE), pages 78–87. IEEE, 2014.
Fredrik Farnstrom, James Lewis, and Charles Elkan. Scalability for clustering algorithms revisited. SIGKDD explorations, 2(1):51–57, 2000.
Hugo Gascon, Fabian Yamaguchi, Daniel Arp, and Konrad Rieck. Structural detection of Android malware using embedded call graphs. In Proceedings of the 2013 ACM workshop on Artificial intelligence and security, pages 45–54. ACM, 2013.
Greg Hamerly and Charles Elkan. Learning the k in k-means. In Advances in neural information processing systems, pages 281–288, 2004.
Shohei Hido and Hisashi Kashima. A linear-time graph kernel. In Ninth IEEE International Conference on Data Mining (ICDM’09), pages 179–188. IEEE, 2009.
Emily R Jacobson, Nathan Rosenblum, and Barton P Miller. Labeling library functions in stripped binaries. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools (PASTE), pages 1–8. ACM, 2011.
Lina Nouh, Ashkan Rahimian, Djedjiga Mouheb, Mourad Debbabi, and Aiman Hanna. BinSign: fingerprinting binary functions to support automated analysis of code executables. In IFIP International Conference on ICT Systems Security and Privacy Protection, pages 341–355. Springer, 2017.
Ashkan Rahimian, Philippe Charland, Stere Preda, and Mourad Debbabi. RESource: a framework for online matching of assembly with open source code. In International Symposium on Foundations and Practice of Security, pages 211–226. Springer, 2012.
Nathan Rosenblum, Barton P Miller, and Xiaojin Zhu. Recovering the toolchain provenance of binary code. In Proceedings of the International Symposium on Software Testing and Analysis, pages 100–110. ACM, 2011.
Nathan Rosenblum, Xiaojin Zhu, and Barton P Miller. Who wrote this code? identifying the authors of program binaries. In European Symposium on Research in Computer Security (ESORICS), pages 172–189. Springer, 2011.
Nathan E Rosenblum, Barton P Miller, and Xiaojin Zhu. Extracting compiler provenance from program binaries. In Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, pages 21–28. ACM, 2010.
Brian Ruttenberg, Craig Miles, Lee Kellogg, Vivek Notani, Michael Howard, Charles LeDoux, Arun Lakhotia, and Avi Pfeffer. Identifying shared software components to support malware forensics. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), pages 21–40. Springer, 2014.
Saša Stojanović, Zaharije Radivojević, and Miloš Cvetanović. Approach for estimating similarity between procedures in differently compiled binaries. Information and Software Technology, 58:259–271, 2015.
Annie H Toderici and Mark Stamp. Chi-squared distance and metamorphic virus detection. Journal of Computer Virology and Hacking Techniques, 9(1):1–14, 2013.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Alrabaee, S. et al. (2020). Compiler Provenance Attribution. In: Binary Code Fingerprinting for Cybersecurity. Advances in Information Security, vol 78. Springer, Cham. https://doi.org/10.1007/978-3-030-34238-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-34238-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34237-1
Online ISBN: 978-3-030-34238-8
eBook Packages: Computer ScienceComputer Science (R0)