Abstract
There are many occasions in which the security community is interested to discover the authorship of malware binaries, either for digital forensics analysis of malware corpora or for thwarting live threats of malware invasion. Such a discovery of authorship might be possible due to stylistic features inherent to software codes written by human programmers. Existing studies of authorship attribution of general purpose software mainly focus on source code, which is typically based on the style of programs and environment. However, those features critically depend on the availability of the program source code, which is usually not the case when dealing with malware binaries. Such program binaries often do not retain many semantic or stylistic features due to the compilation process. Therefore, authorship attribution in the domain of malware binaries based on features and styles that will survive the compilation process is challenging. This paper provides the state of the art in this literature. Further, we analyze the features involved in those techniques. By using a case study, we identify features that can survive the compilation process. Finally, we analyze existing works on binary authorship attribution and study their applicability to real malware binaries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Refactoring tool. https://www.devexpress.com/Products/CodeRush/
The Google Code Jam (2008–2015). http://code.google.com/codejam/
GitHub-Build software better (2011). https://github.com/trending/cpp
IDA pro Fast Library Identification and Recognition Technology (2011). https://www.hex-rays.com/products/ida/tech/
The materials supplement for the paper: Who Wrote This Code? Identifying the Authors of Program Binaries (2011). http://pages.cs.wisc.edu/~nater/esorics-supp/
Hex-Ray decompiler (2015). https://www.hex-rays.com/products/decompiler/
Microsoft Malware Classification Challenge (BIG 2015) (2015). https://www.kaggle.com/c/malware-classification/data
Programmer De-anonymization from Binary Executables (2015). https://github.com/calaylin/bda
The Gephi plugin for nneo4j (2015). https://marketplace.gephi.org/plugin/neo4j-graph-database-support/
The Scalable Native Graph Database (2015). http://neo4j.com/
C++ refactoring tools for visual studio (2016). http://www.wholetomato.com/
Aiken, A., et al.: Moss: a system for detecting software plagiarism. University of California–Berkeley (2005). www.cs.berkeley.edu/aiken/moss.html 9
Alrabaee, S., Saleem, N., Preda, S., Wang, L., Debbabi, M.: Oba2: an onion approach to binary code authorship attribution. Digit. Invest. 11, S94–S103 (2014)
Alrabaee, S., Shirani, P., Wang, L., Debbabi, M.: Sigma: a semantic integrated graph matching approach for identifying reused functions in binary code. Digit. Invest. 12, S61–S71 (2015)
Alrabaee, S., Wang, L., Debbabi, M.: Bingold: towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (sfgs). Digit. Invest. 18, S11–S22 (2016)
Burrows, S., Tahaghoghi, S.M.: Source code authorship attribution using n-grams. Citeseer (2007)
Burrows, S., Uitdenbogerd, A.L., Turpin, A.: Application of information retrieval techniques for source code authorship attribution. In: Zhou, X., Yokota, H., Deng, K., Liu, Q. (eds.) DASFAA 2009. LNCS, vol. 5463, pp. 699–713. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00887-0_61
Caliskan-Islam, A., Harang, R., Liu, A., Narayanan, A., Voss, C., Yamaguchi, F., Greenstadt, R.: De-anonymizing programmers via code stylometry. In: 24th USENIX Security Symposium (USENIX Security 2015) , pp. 255–270 (2015)
Caliskan-Islam, A., Yamaguchi, F., Dauber, E., Harang, R., Rieck, K., Greenstadt, R., Narayanan, A.: When coding style survives compilation: de-anonymizing programmers from executable binaries. arXiv preprint arXiv:1512.08546 (2015)
Can, F., Patton, J.M.: Change of writing style with time. Comput. Humanit. 38(1), 61–82 (2004)
Canali, D., Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., Kirda, E.: A quantitative study of accuracy in system call-based malware detection. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis, pp. 122–132. ACM (2012)
Chen, R., Hong, L., Lü, C., Deng, W.: Author identification of software source code with program dependence graphs. In: 2010 IEEE 34th Annual Computer Software and Applications Conference Workshops (COMPSACW), pp. 281–286. IEEE (2010)
Edwards, N., Chen, L.: An historical examination of open source releases and their vulnerabilities. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 183–194. ACM (2012)
Ferrante, J., Ottenstein, K.J., Warren, J.D.: The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. (TOPLAS) 9(3), 319–349 (1987)
Fowler, M.: Refactoring: Improving the Design of Existing Code. Pearson Education India, New Delhi (2009)
Frantzeskou, G., Stamatatos, E., Gritzalis, S., Katsikas, S.: Source code author identification based on n-gram author profiles. In: Maglogiannis, I., Karpouzis, K., Bramer, M. (eds.) AIAI 2006. IIFIP, vol. 204, pp. 508–515. Springer, Heidelberg (2006). doi:10.1007/0-387-34224-9_59
Holmes, D.I.: Authorship attribution. Comput. Humanit. 28(2), 87–106 (1994)
Jang, J., Brumley, D., Venkataraman, S.: Bitshred: feature hashing malware for scalable triage and semantic analysis. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, pp. 309–320. ACM (2011)
Junod, P., Rinaldini, J., Wehrli, J., Michielin, J.: Obfuscator-llvm: software protection for the masses. In: Proceedings of the 1st International Workshop on Software Protection, pp. 3–9. IEEE Press (2015)
Kephart, J.O., et al.: A biologically inspired immune system for computers. In: Artificial Life IV: Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, pp. 130–139 (1994)
Khoo, W.M., Mycroft, A., Anderson, R.: Rendezvous: a search engine for binary code. In: Proceedings of the 10th Working Conference on Mining Software Repositories, pp. 329–338. IEEE Press (2013)
Knuth, D.E.: Backus normal form vs. backus naur form. Commun. ACM 7(12), 735–736 (1964)
Kothari, J., Shevertalov, M., Stehle, E., Mancoridis, S.: A probabilistic approach to source code authorship identification. In: Fourth International Conference on Information Technology, ITNG 2007, pp. 243–248. IEEE (2007)
Krsul, I., Spafford, E.H.: Authorship analysis: identifying the author of a program. Comput. Secur. 16(3), 233–257 (1997)
Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006). doi:10.1007/11663812_11
Pržulj, N., Corneil, D.G., Jurisica, I.: Modeling interactome: scale-free or geometric? Bioinformatics 20(18), 3508–3515 (2004)
Rahimian, A., Shirani, P., Alrbaee, S., Wang, L., Debbabi, M.: Bincomp: a stratified approach to compiler provenance attribution. Digit. Invest. 14, S146–S155 (2015)
Rosenblum, N., Zhu, X., Miller, B.P.: Who wrote this code? Identifying the authors of program binaries. In: Atluri, V., Diaz, C. (eds.) ESORICS 2011. LNCS, vol. 6879, pp. 172–189. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23822-2_10
Santos, I., Penya, Y.K., Devesa, J., Bringas, P.G.: N-grams-based file signatures for malware detection. In: Proceedings of the ICEIS, vol. 2(9), pp. 317–320 (2009)
Shevertalov, M., Kothari, J., Stehle, E., Mancoridis, S.: On the use of discretized source code metrics for author identification. In: 2009 1st International Symposium on Search Based Software Engineering, pp. 69–78. IEEE (2009)
Spafford, E.H., Weeber, S.A.: Software forensics: can we track code to its authors? Comput. Secur. 12(6), 585–595 (1993)
Weiser, M.: Program slicing. In: Proceedings of the 5th International Conference on Software Engineering, pp. 439–449. IEEE Press (1981)
Yang, K.-X., Hu, L., Zhang, N., Huo, Y.-M., Zhao, K.: Improving the defence against web server fingerprinting by eliminating compliance variation. In: 2010 Fifth International Conference on Frontier of Computer Science and Technology (FCST), pp. 227–232. IEEE (2010)
Acknowledgments
The authors thank the anonymous reviewers for their valuable comments. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsoring organizations.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Alrabaee, S., Shirani, P., Debbabi, M., Wang, L. (2017). On the Feasibility of Malware Authorship Attribution. In: Cuppens, F., Wang, L., Cuppens-Boulahia, N., Tawbi, N., Garcia-Alfaro, J. (eds) Foundations and Practice of Security. FPS 2016. Lecture Notes in Computer Science(), vol 10128. Springer, Cham. https://doi.org/10.1007/978-3-319-51966-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-51966-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51965-4
Online ISBN: 978-3-319-51966-1
eBook Packages: Computer ScienceComputer Science (R0)