Skip to main content

On the Feasibility of Malware Authorship Attribution

  • Conference paper
  • First Online:
Foundations and Practice of Security (FPS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10128))

Included in the following conference series:

Abstract

There are many occasions in which the security community is interested to discover the authorship of malware binaries, either for digital forensics analysis of malware corpora or for thwarting live threats of malware invasion. Such a discovery of authorship might be possible due to stylistic features inherent to software codes written by human programmers. Existing studies of authorship attribution of general purpose software mainly focus on source code, which is typically based on the style of programs and environment. However, those features critically depend on the availability of the program source code, which is usually not the case when dealing with malware binaries. Such program binaries often do not retain many semantic or stylistic features due to the compilation process. Therefore, authorship attribution in the domain of malware binaries based on features and styles that will survive the compilation process is challenging. This paper provides the state of the art in this literature. Further, we analyze the features involved in those techniques. By using a case study, we identify features that can survive the compilation process. Finally, we analyze existing works on binary authorship attribution and study their applicability to real malware binaries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Refactoring tool. https://www.devexpress.com/Products/CodeRush/

  2. The Google Code Jam (2008–2015). http://code.google.com/codejam/

  3. GitHub-Build software better (2011). https://github.com/trending/cpp

  4. IDA pro Fast Library Identification and Recognition Technology (2011). https://www.hex-rays.com/products/ida/tech/

  5. The materials supplement for the paper: Who Wrote This Code? Identifying the Authors of Program Binaries (2011). http://pages.cs.wisc.edu/~nater/esorics-supp/

  6. Hex-Ray decompiler (2015). https://www.hex-rays.com/products/decompiler/

  7. Microsoft Malware Classification Challenge (BIG 2015) (2015). https://www.kaggle.com/c/malware-classification/data

  8. Programmer De-anonymization from Binary Executables (2015). https://github.com/calaylin/bda

  9. The Gephi plugin for nneo4j (2015). https://marketplace.gephi.org/plugin/neo4j-graph-database-support/

  10. The Scalable Native Graph Database (2015). http://neo4j.com/

  11. C++ refactoring tools for visual studio (2016). http://www.wholetomato.com/

  12. Aiken, A., et al.: Moss: a system for detecting software plagiarism. University of California–Berkeley (2005). www.cs.berkeley.edu/aiken/moss.html 9

  13. Alrabaee, S., Saleem, N., Preda, S., Wang, L., Debbabi, M.: Oba2: an onion approach to binary code authorship attribution. Digit. Invest. 11, S94–S103 (2014)

    Article  Google Scholar 

  14. Alrabaee, S., Shirani, P., Wang, L., Debbabi, M.: Sigma: a semantic integrated graph matching approach for identifying reused functions in binary code. Digit. Invest. 12, S61–S71 (2015)

    Article  Google Scholar 

  15. Alrabaee, S., Wang, L., Debbabi, M.: Bingold: towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (sfgs). Digit. Invest. 18, S11–S22 (2016)

    Article  Google Scholar 

  16. Burrows, S., Tahaghoghi, S.M.: Source code authorship attribution using n-grams. Citeseer (2007)

    Google Scholar 

  17. Burrows, S., Uitdenbogerd, A.L., Turpin, A.: Application of information retrieval techniques for source code authorship attribution. In: Zhou, X., Yokota, H., Deng, K., Liu, Q. (eds.) DASFAA 2009. LNCS, vol. 5463, pp. 699–713. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00887-0_61

    Chapter  Google Scholar 

  18. Caliskan-Islam, A., Harang, R., Liu, A., Narayanan, A., Voss, C., Yamaguchi, F., Greenstadt, R.: De-anonymizing programmers via code stylometry. In: 24th USENIX Security Symposium (USENIX Security 2015) , pp. 255–270 (2015)

    Google Scholar 

  19. Caliskan-Islam, A., Yamaguchi, F., Dauber, E., Harang, R., Rieck, K., Greenstadt, R., Narayanan, A.: When coding style survives compilation: de-anonymizing programmers from executable binaries. arXiv preprint arXiv:1512.08546 (2015)

  20. Can, F., Patton, J.M.: Change of writing style with time. Comput. Humanit. 38(1), 61–82 (2004)

    Article  Google Scholar 

  21. Canali, D., Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., Kirda, E.: A quantitative study of accuracy in system call-based malware detection. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis, pp. 122–132. ACM (2012)

    Google Scholar 

  22. Chen, R., Hong, L., Lü, C., Deng, W.: Author identification of software source code with program dependence graphs. In: 2010 IEEE 34th Annual Computer Software and Applications Conference Workshops (COMPSACW), pp. 281–286. IEEE (2010)

    Google Scholar 

  23. Edwards, N., Chen, L.: An historical examination of open source releases and their vulnerabilities. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 183–194. ACM (2012)

    Google Scholar 

  24. Ferrante, J., Ottenstein, K.J., Warren, J.D.: The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. (TOPLAS) 9(3), 319–349 (1987)

    Article  MATH  Google Scholar 

  25. Fowler, M.: Refactoring: Improving the Design of Existing Code. Pearson Education India, New Delhi (2009)

    MATH  Google Scholar 

  26. Frantzeskou, G., Stamatatos, E., Gritzalis, S., Katsikas, S.: Source code author identification based on n-gram author profiles. In: Maglogiannis, I., Karpouzis, K., Bramer, M. (eds.) AIAI 2006. IIFIP, vol. 204, pp. 508–515. Springer, Heidelberg (2006). doi:10.1007/0-387-34224-9_59

    Chapter  Google Scholar 

  27. Holmes, D.I.: Authorship attribution. Comput. Humanit. 28(2), 87–106 (1994)

    Article  Google Scholar 

  28. Jang, J., Brumley, D., Venkataraman, S.: Bitshred: feature hashing malware for scalable triage and semantic analysis. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, pp. 309–320. ACM (2011)

    Google Scholar 

  29. Junod, P., Rinaldini, J., Wehrli, J., Michielin, J.: Obfuscator-llvm: software protection for the masses. In: Proceedings of the 1st International Workshop on Software Protection, pp. 3–9. IEEE Press (2015)

    Google Scholar 

  30. Kephart, J.O., et al.: A biologically inspired immune system for computers. In: Artificial Life IV: Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, pp. 130–139 (1994)

    Google Scholar 

  31. Khoo, W.M., Mycroft, A., Anderson, R.: Rendezvous: a search engine for binary code. In: Proceedings of the 10th Working Conference on Mining Software Repositories, pp. 329–338. IEEE Press (2013)

    Google Scholar 

  32. Knuth, D.E.: Backus normal form vs. backus naur form. Commun. ACM 7(12), 735–736 (1964)

    Article  Google Scholar 

  33. Kothari, J., Shevertalov, M., Stehle, E., Mancoridis, S.: A probabilistic approach to source code authorship identification. In: Fourth International Conference on Information Technology, ITNG 2007, pp. 243–248. IEEE (2007)

    Google Scholar 

  34. Krsul, I., Spafford, E.H.: Authorship analysis: identifying the author of a program. Comput. Secur. 16(3), 233–257 (1997)

    Article  Google Scholar 

  35. Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006). doi:10.1007/11663812_11

    Chapter  Google Scholar 

  36. Pržulj, N., Corneil, D.G., Jurisica, I.: Modeling interactome: scale-free or geometric? Bioinformatics 20(18), 3508–3515 (2004)

    Article  Google Scholar 

  37. Rahimian, A., Shirani, P., Alrbaee, S., Wang, L., Debbabi, M.: Bincomp: a stratified approach to compiler provenance attribution. Digit. Invest. 14, S146–S155 (2015)

    Article  Google Scholar 

  38. Rosenblum, N., Zhu, X., Miller, B.P.: Who wrote this code? Identifying the authors of program binaries. In: Atluri, V., Diaz, C. (eds.) ESORICS 2011. LNCS, vol. 6879, pp. 172–189. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23822-2_10

    Chapter  Google Scholar 

  39. Santos, I., Penya, Y.K., Devesa, J., Bringas, P.G.: N-grams-based file signatures for malware detection. In: Proceedings of the ICEIS, vol. 2(9), pp. 317–320 (2009)

    Google Scholar 

  40. Shevertalov, M., Kothari, J., Stehle, E., Mancoridis, S.: On the use of discretized source code metrics for author identification. In: 2009 1st International Symposium on Search Based Software Engineering, pp. 69–78. IEEE (2009)

    Google Scholar 

  41. Spafford, E.H., Weeber, S.A.: Software forensics: can we track code to its authors? Comput. Secur. 12(6), 585–595 (1993)

    Article  Google Scholar 

  42. Weiser, M.: Program slicing. In: Proceedings of the 5th International Conference on Software Engineering, pp. 439–449. IEEE Press (1981)

    Google Scholar 

  43. Yang, K.-X., Hu, L., Zhang, N., Huo, Y.-M., Zhao, K.: Improving the defence against web server fingerprinting by eliminating compliance variation. In: 2010 Fifth International Conference on Frontier of Computer Science and Technology (FCST), pp. 227–232. IEEE (2010)

    Google Scholar 

Download references

Acknowledgments

The authors thank the anonymous reviewers for their valuable comments. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsoring organizations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saed Alrabaee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Alrabaee, S., Shirani, P., Debbabi, M., Wang, L. (2017). On the Feasibility of Malware Authorship Attribution. In: Cuppens, F., Wang, L., Cuppens-Boulahia, N., Tawbi, N., Garcia-Alfaro, J. (eds) Foundations and Practice of Security. FPS 2016. Lecture Notes in Computer Science(), vol 10128. Springer, Cham. https://doi.org/10.1007/978-3-319-51966-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-51966-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-51965-4

  • Online ISBN: 978-3-319-51966-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics