Skip to main content

On Improving Authorship Attribution of Source Code

  • Conference paper
Digital Forensics and Cyber Crime (ICDF2C 2012)

Abstract

Authorship attribution of source code is the task of deciding who wrote a program, given its source code. Applications include software forensics, plagiarism detection, and determining software ownership. A number of methods for the authorship attribution of source code have been proposed. This paper presents an overview and critique of the state of the art in the field. An independent comparative study is presented using an unprecedented experimental design and data set, as well as proposals for improvements and future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Spafford, E.H., Weeber, S.A.: Software Forensics: Can We Track Code to its Authors? Computers & Security (COMPSEC) 12(6), 585–595 (1993)

    Article  Google Scholar 

  2. Zhao, Y., Zobel, J.: Effective and Scalable Authorship Attribution Using Function Words. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.-H. (eds.) AIRS 2005. LNCS, vol. 3689, pp. 174–189. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Juola, P.: Authorship attribution. Foundations and Trends in Information Retrieval 1(3), 233–334 (2007)

    Article  Google Scholar 

  4. Stamatatos, E.: A Survey of Modern Authorship Attribution Methods. Journal of the American Society for Information Science and Technology 60(3), 538–556 (2009)

    Article  Google Scholar 

  5. Oman, P.W., Cook, C.R.: Programming Style Authorship Analysis. In: Proceedings of the 17th Conference on ACM Annual Computer Science Conference (CSC), pp. 320–326 (1989)

    Google Scholar 

  6. Gray, A., Sallis, P., MacDonell, S.: IDENTIFIED (Integrated Dictionary-based Extraction of Non-language Dependent Token Information for Forensic Identification, Examination, and Discrimination): A Dictionary-based System for Extracting Source Code Metrics for Software Forensics. In: Proceedings of the International Conference on Software Engineering (ICSE), pp. 252–259 (1998)

    Google Scholar 

  7. Krsul, I., Spafford, E.H.: Authorship Analysis: Identifying the Author of a Program. Computers & Security (COMPSEC) 16(3), 233–257 (1997)

    Article  Google Scholar 

  8. MacDonell, S.G., Gray, A.R., MacLennan, G.,Sallis, P.J.: Software Forensics for Discriminating between Program Authors using Case-based Reasoning, Feedforward Neural Networks and Multiple Discriminant Analysis. In: Proceedings of the 6th International Conference on Neural Information Processing (ICONIP), pp. 66-71 (1999)

    Google Scholar 

  9. Ding, H., Samadzadeh, M.H.: Extraction of Java Program Fingerprints for Software Authorship Identification. The Journal of Systems and Software 72, 49–57 (2004)

    Article  Google Scholar 

  10. Lange, R., Mancoridis, S.: Using Code Metric Histograms and Genetic Algorithms to Perform Author Identification for Software Forensics. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation (GECCO), pp. 2082–2089 (2007)

    Google Scholar 

  11. Elenbogen, B.S., Seliya, N.: Detecting Outsourced Student Programming Assignments. Journal of Computing Sciences in Colleges 23(3), 50–57 (2008)

    Google Scholar 

  12. Frantzeskou, G., Stamatatos, E., Gritzalis, S., Chaski, C.E., Howald, B.S.: Identifying Authorship by Byte-Level N-Grams: The Source Code Author Profile (SCAP) Method. International Journal of Digital Evidence 6(1), 1–18 (2007)

    Google Scholar 

  13. Burrows, S.D.: Source Code Authorship Attribution. Dissertation. RMIT University, Melbourne, Australia (2010)

    Google Scholar 

  14. Keselj, V., Peng, F., Cercone, N., Thomas, C.: N-gram Based Author Profiles for Authorship Attribution. In: Proceedings of the Pacific Association for Computational Linguistics, pp. 255–264 (2003)

    Google Scholar 

  15. Robertson, S.E., Walker, S.: Okapi/Keenbow at TREC-8. In: Voorhees, E., Harman, D. (eds.) Proceedings of the Eighth Text Retrieval Conference, pp. 151–162. National Institute of Standards and Technology, Gaithersburg (1999)

    Google Scholar 

  16. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, San Francisco (1999)

    MATH  Google Scholar 

  17. Uitdenbogerd, A.L., Zobel, J.: Music ranking techniques evaluated. In: Oudshoorn, M., Pose, R. (eds.) Proceedings of the Twenty-Fifth Australasian Computer Science Conference, pp. 275–283. Australian Computer Society, Melbourne (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Tennyson, M.F. (2013). On Improving Authorship Attribution of Source Code. In: Rogers, M., Seigfried-Spellar, K.C. (eds) Digital Forensics and Cyber Crime. ICDF2C 2012. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 114. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39891-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39891-9_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39890-2

  • Online ISBN: 978-3-642-39891-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics