Abstract
Forensic analysis of questioned electronic documents is difficult because the nature of the documents eliminates many kinds of informative differences. Recent work in authorship attribution demonstrates the practicality of analyzing documents based on authorial style, but the state of the art is confusing. Analyses are difficult to apply, little is known about error types and rates, and no best practices are available. This paper discusses efforts to address these issues, partly through the development of a systematic testbed for multilingual, multigenre authorship attribution accuracy, and partly through the development and concurrent analysis of a uniform and portable software tool that applies multiple methods to analyze electronic documents for authorship based on authorial style.
Chapter PDF
Similar content being viewed by others
References
S. Argamon and S. Levitan, Measuring the usefulness of function words for authorship attribution, Proceedings of the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 2005.
R. Baayen, H. van Halteren, A. Neijt and F. Tweedie, An experiment in authorship attribution, Proceedings of JADT 2002: Sixth International Conference on Textual Data Statistical Analysis, pp. 29–37, 2002.
J. Burrows, Word-patterns and story-shapes: The statistical analysis of narrative style, Literary and Linguistic Computing, vol. 2, pp. 61–70, 1987.
J. Burrows, “an ocean where each kind....:” Statistical analysis and some major determinants of literary style, Computers and the Humanities, vol. 23(4–5), pp. 309–321, 1989.
J. Burrows, Questions of authorships: Attribution and beyond, Computers and the Humanities, vol. 37(1), pp. 5–32, 2003.
C. Chaski, Who’s at the keyboard: Authorship attribution in digital evidence invesigations, International Journal of Digital Evidence, vol. 4(1), 2005.
G. Easson, The linguistic implications of shibboleths, presented at the Annual Meeting of the Canadian Linguistics Association, 2002.
J. Farringdon, Analyzing for Authorship: A Guide to the Cusum Technique, University of Wales Press, Cardiff, United Kingdom, 1996.
R. Forsyth, Towards a text benchmark suite, Proceedings of the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 1997.
D. Holmes, Authorship attribution, Computers and the Humanities, vol. 28(2), pp. 87–106, 1994.
D. Hoover, Delta prime? Literary and Linguistic Computing, vol. 19(4), pp. 477–495, 2004.
P. Juola, The time course of language change, Computers and the Humanities, vol. 37(1), pp. 77–96, 2003.
P. Juola, Ad-hoc authorship attribution competition, Proceedings of the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 2004.
P. Juola, On composership attribution, Proceedings of the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 2004.
P. Juola, Becoming Jack London, to appear in Journal of Quantitative Linguistics.
P. Juola and H. Baayen, A controlled-corpus experiment in authorship attribution by cross-entropy, Literary and Linguistic Computing, vol. 20, pp. 59–67, 2005.
P. Juola, J, Sofko and P. Brennan, A prototype for authorship attribution studies, to appear in Literary and Linguistic Computing, 2006.
V. Kešelj and N. Cercone, CNG method with weighted voting, presented at the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 2004.
M. Koppel and J. Schler, Ad-hoc authorship attribution competition approach outline, presented at the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 2004.
H. Kuĉera and W. Francis, Computational Analysis of Present-Day American English, Brown University Press, Providence, Rhode Island, 1967.
O. Kukushkina, A. Polikarpov and D. Khmelev, Using literal and grammatical statistics for authorship attribution, Problemy Peredachi Informatii, vol. 37(2), pp. 96–198, 2000; translated in Problems of Information Transmission, MAIK Nauka/Interperiodica, Moscow, Russia, pp. 172–184, 2000.
T. Merriam, An application of authorship attribution by intertextual distance in English, Corpus, vol. 2, 2003.
J. Rudman, The state of authorship attribution studies: Some problems and solutions, Computers and the Humanities, vol. 31, pp. 351–365, 1998.
E. Stamatatos, N. Fakotakis and G. Kokkinakis, Automatic authorship attribution, Proceedings of the Ninth Conference of the European Chapter of the Association for Computational Linguistics, pp. 158–164, 1999.
Supreme Court of the United States, Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579, no. 92–102, 1993.
H. van Halteren, R. Baayen, F. Tweedie, M. Haverkort and A. Neijt, New machine learning methods demonstrate the existence of a human stylome, Journal of Quantitative Linguistics, vol. 12(1), pp. 65–77, 2005.
F. Wellman, The Art of Cross-Examination, MacMillan, New York, 1936.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 IFIP Internatonal Federation for Information Processing
About this paper
Cite this paper
Juola, P. (2006). Authorship Attribution for Electronic Documents. In: Olivier, M.S., Shenoi, S. (eds) Advances in Digital Forensics II. DigitalForensics 2006. IFIP Advances in Information and Communication, vol 222. Springer, Boston, MA. https://doi.org/10.1007/0-387-36891-4_10
Download citation
DOI: https://doi.org/10.1007/0-387-36891-4_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-36890-0
Online ISBN: 978-0-387-36891-7
eBook Packages: Computer ScienceComputer Science (R0)