Abstract
Authorship attribution of source code is the task of deciding who wrote a program, given its source code. Applications include software forensics, plagiarism detection, and determining software ownership. A number of methods for the authorship attribution of source code have been proposed. This paper presents an overview and critique of the state of the art in the field. An independent comparative study is presented using an unprecedented experimental design and data set, as well as proposals for improvements and future work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Spafford, E.H., Weeber, S.A.: Software Forensics: Can We Track Code to its Authors? Computers & Security (COMPSEC) 12(6), 585–595 (1993)
Zhao, Y., Zobel, J.: Effective and Scalable Authorship Attribution Using Function Words. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.-H. (eds.) AIRS 2005. LNCS, vol. 3689, pp. 174–189. Springer, Heidelberg (2005)
Juola, P.: Authorship attribution. Foundations and Trends in Information Retrieval 1(3), 233–334 (2007)
Stamatatos, E.: A Survey of Modern Authorship Attribution Methods. Journal of the American Society for Information Science and Technology 60(3), 538–556 (2009)
Oman, P.W., Cook, C.R.: Programming Style Authorship Analysis. In: Proceedings of the 17th Conference on ACM Annual Computer Science Conference (CSC), pp. 320–326 (1989)
Gray, A., Sallis, P., MacDonell, S.: IDENTIFIED (Integrated Dictionary-based Extraction of Non-language Dependent Token Information for Forensic Identification, Examination, and Discrimination): A Dictionary-based System for Extracting Source Code Metrics for Software Forensics. In: Proceedings of the International Conference on Software Engineering (ICSE), pp. 252–259 (1998)
Krsul, I., Spafford, E.H.: Authorship Analysis: Identifying the Author of a Program. Computers & Security (COMPSEC) 16(3), 233–257 (1997)
MacDonell, S.G., Gray, A.R., MacLennan, G.,Sallis, P.J.: Software Forensics for Discriminating between Program Authors using Case-based Reasoning, Feedforward Neural Networks and Multiple Discriminant Analysis. In: Proceedings of the 6th International Conference on Neural Information Processing (ICONIP), pp. 66-71 (1999)
Ding, H., Samadzadeh, M.H.: Extraction of Java Program Fingerprints for Software Authorship Identification. The Journal of Systems and Software 72, 49–57 (2004)
Lange, R., Mancoridis, S.: Using Code Metric Histograms and Genetic Algorithms to Perform Author Identification for Software Forensics. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation (GECCO), pp. 2082–2089 (2007)
Elenbogen, B.S., Seliya, N.: Detecting Outsourced Student Programming Assignments. Journal of Computing Sciences in Colleges 23(3), 50–57 (2008)
Frantzeskou, G., Stamatatos, E., Gritzalis, S., Chaski, C.E., Howald, B.S.: Identifying Authorship by Byte-Level N-Grams: The Source Code Author Profile (SCAP) Method. International Journal of Digital Evidence 6(1), 1–18 (2007)
Burrows, S.D.: Source Code Authorship Attribution. Dissertation. RMIT University, Melbourne, Australia (2010)
Keselj, V., Peng, F., Cercone, N., Thomas, C.: N-gram Based Author Profiles for Authorship Attribution. In: Proceedings of the Pacific Association for Computational Linguistics, pp. 255–264 (2003)
Robertson, S.E., Walker, S.: Okapi/Keenbow at TREC-8. In: Voorhees, E., Harman, D. (eds.) Proceedings of the Eighth Text Retrieval Conference, pp. 151–162. National Institute of Standards and Technology, Gaithersburg (1999)
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, San Francisco (1999)
Uitdenbogerd, A.L., Zobel, J.: Music ranking techniques evaluated. In: Oudshoorn, M., Pose, R. (eds.) Proceedings of the Twenty-Fifth Australasian Computer Science Conference, pp. 275–283. Australian Computer Society, Melbourne (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Tennyson, M.F. (2013). On Improving Authorship Attribution of Source Code. In: Rogers, M., Seigfried-Spellar, K.C. (eds) Digital Forensics and Cyber Crime. ICDF2C 2012. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 114. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39891-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-39891-9_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39890-2
Online ISBN: 978-3-642-39891-9
eBook Packages: Computer ScienceComputer Science (R0)