Skip to main content

Authorship Attribution

  • Chapter
  • First Online:
Binary Code Fingerprinting for Cybersecurity

Abstract

Binary authorship attribution refers to the process of discovering information related to the author(s) of anonymous binary code on the basis of stylometric characteristics extracted from the code. However, in practice, authorship attribution for binary code still requires considerable manual and error-prone reverse engineering analysis, which can be a daunting task given the sheer volume and complexity of today’s malware. In this chapter, we propose BinAuthor, a novel and the first compiler-agnostic method for identifying the authors of program binaries. Having filtered out unrelated functions (compiler and library) to detect user-related functions, it converts user-related functions into a canonical form to eliminate compiler/compilation effects. Then, it leverages a set of features based on collections of authors’ choices made during coding. These features capture an author’s programming coding habits.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/g4hsean/BinAuthor.

  2. 2.

    https://www.aldeid.com/wiki/PEiD.

  3. 3.

    https://github.com/calaylin/CodeStylometry/tree/master.

References

  1. Contagio: malware dump. http://contagiodump.blogspot.ca. Accessed: February, 2018.

  2. VirusSign: Malware Research & Data Center, Virus Free. http://www.virussign.com/. Accessed: February, 2017.

  3. Google Code Jam Contest Dataset. http://code.google.com/codejam/, 2008–2017. Accessed: February, 2018.

  4. GitHub-Build software better. https://github.com/, 2011. Accessed: May, 2019.

  5. Materials supplement for the paper “Who Wrote This Code? Identifying the Authors of Program Binaries”. http://pages.cs.wisc.edu/~nater/esorics-supp/, 2011. Accessed: May, 2017.

  6. Mcafee: Technical report. www.mcafee.com/ca/resources/wp-citadel-trojan-summary.pdf, 2011. Accessed: Mar, 2017.

  7. Gephi plugin for nneo4j. https://marketplace.gephi.org/plugin/neo4j-graph-database-support/, 2015. Accessed: February, 2016.

  8. Planet source code. http://www.planet-source-code.com/vb/default.asp?lngWId=3\ \#ContentWinners, 2015. Accessed: March, 2017.

    Google Scholar 

  9. Programmer De-anonymization from Binary Executables. https://github.com/calaylin/bda, 2015. Accessed: January, 2017.

  10. C++ refactoring tools for visual studio. http://www.wholetomato.com/, 2016. Accessed: February 2016.

  11. Refactoring tool. https://www.devexpress.com/Products/CodeRush/, 2018. Accessed: February 2018.

  12. Hex-Rays Decompiler. https://www.hex-rays.com/products/decompiler/, 2019. Accessed: June 2019.

  13. Hex-Rays IDA Pro. https://www.hex-rays.com/products/ida/, 2019. Accessed: June 2019.

  14. Saed Alrabaee, Noman Saleem, Stere Preda, Lingyu Wang, and Mourad Debbabi. OBA2: an onion approach to binary code authorship attribution. Digital Investigation, 11:S94–S103, 2014.

    Article  Google Scholar 

  15. Saed Alrabaee, Paria Shirani, Mourad Debbabi, and Lingyu Wang. On the feasibility of malware authorship attribution. In International Symposium on Foundations and Practice of Security, pages 256–272. Springer, 2016.

    Google Scholar 

  16. Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. SIGMA: a semantic integrated graph matching approach for identifying reused functions in binary code. Digital Investigation, 12:S61–S71, 2015.

    Article  Google Scholar 

  17. Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. FOSSIL: a resilient and efficient system for identifying FOSS functions in malware binaries. ACM Transactions on Privacy and Security (TOPS), 21(2):8, 2018.

    Google Scholar 

  18. Aylin Caliskan-Islam, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan. When coding style survives compilation: De-anonymizing programmers from executable binaries. The 25th Annual Network and Distributed System Security Symposium (NDSS), pages 255–270, 2018.

    Google Scholar 

  19. Rudi Cilibrasi and Paul Vitanyi. Clustering by compression. IEEE Transactions on Information Theory, 51(4):1523–1545, 2005.

    Article  MathSciNet  Google Scholar 

  20. Yaniv David, Nimrod Partush, and Eran Yahav. Similarity of binaries through re-optimization. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 79–94. ACM, 2017.

    Google Scholar 

  21. Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. Obfuscator-LLVM: software protection for the masses. In Proceedings of the 1st International Workshop on Software PROtection (SPRO), pages 3–9. IEEE Press, 2015.

    Google Scholar 

  22. Tommi A Junttila and Petteri Kaski. Engineering an efficient canonical labeling tool for large and sparse graphs. In Proceedings of the Ninth Workshop on Algorithm Engineering and Experiments (ALENEX), volume 7, pages 135–149. SIAM, 2007.

    Google Scholar 

  23. Donald E Knuth. Backus normal form vs. backus naur form. Communications of the ACM, 7(12):735–736, 1964.

    Article  Google Scholar 

  24. Ivan Krsul and Eugene H Spafford. Authorship analysis: Identifying the author of a program. Computers & Security, 16(3):233–257, 1997.

    Article  Google Scholar 

  25. Kaspersky Lab. Resource 207: Kaspersky Lab Research proves that Stuxnet and Flame developers are connected. http://newsroom.kaspersky.eu/fileadmin/user_upload/en/Images/Lifestyle/20120611_Kaspersky_Lab_Press_Release_Flame_Stuxnet_cooperation_final_-_UK.pdf, 2012. Accessed: February, 2018.

    Google Scholar 

  26. Prasanta Chandra Mahalanobis. On the generalized distance in statistics. Proceedings of the National Institute of Sciences (Calcutta), 2:49–55, 1936.

    Google Scholar 

  27. Marion Marschalek. Big Game Hunting: Nation-state malware research, BlackHat. https://www.blackhat.com/docs/webcast/08202015-big-game-hunting.pdf/, 2015. Accessed: February, 2018.

    Google Scholar 

  28. Nicholas Nethercote and Julian Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In ACM Sigplan notices, volume 42, pages 89–100. ACM, 2007.

    Google Scholar 

  29. Gary Palmer et al. A road map for digital forensic research. In First Digital Forensic Research Workshop, pages 27–30, 2001.

    Google Scholar 

  30. Václav Rajlich. Software evolution and maintenance. In Proceedings of the Future of Software Engineering, pages 133–144. ACM, 2014.

    Google Scholar 

  31. Nathan Rosenblum, Xiaojin Zhu, and Barton P Miller. Who wrote this code? identifying the authors of program binaries. In European Symposium on Research in Computer Security (ESORICS), pages 172–189. Springer, 2011.

    Google Scholar 

  32. Saul Schleimer, Daniel S Wilkerson, and Alex Aiken. Winnowing: local algorithms for document fingerprinting. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pages 76–85. ACM, 2003.

    Google Scholar 

  33. Paria Shirani, Lingyu Wang, and Mourad Debbabi. BinShape: Scalable and robust binary library function identification using function shape. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), pages 301–324. Springer, 2017.

    Google Scholar 

  34. Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, et al. Sok:(state of) the art of war: Offensive techniques in binary analysis. In IEEE Symposium on Security and Privacy (SP), pages 138–157. IEEE, 2016.

    Google Scholar 

  35. Eugene H Spafford and Stephen A Weeber. Software forensics: Can we track code to its authors? Computers & Security, 12(6):585–595, 1993.

    Article  Google Scholar 

  36. Jean-Baptiste Tristan, Paul Govereau, and Greg Morrisett. Evaluating value-graph translation validation for llvm. ACM Sigplan Notices, 46(6):295–305, 2011.

    Article  Google Scholar 

  37. Jason Tsong-Li Wang, Qicheng Ma, Dennis Shasha, and Cathy H. Wu. New techniques for extracting features from protein sequences. IBM Systems Journal, 40(2):426–441, 2001.

    Article  Google Scholar 

  38. Li Yujian and Liu Bo. A normalized levenshtein distance metric. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):1091–1095, 2007.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Alrabaee, S. et al. (2020). Authorship Attribution. In: Binary Code Fingerprinting for Cybersecurity. Advances in Information Security, vol 78. Springer, Cham. https://doi.org/10.1007/978-3-030-34238-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34238-8_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34237-1

  • Online ISBN: 978-3-030-34238-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics