Abstract
Authorship attribution is an important and emerging security tool. However, just as criminals may wear gloves to hide their fingerprints, so too may criminal authors mask their writing styles to escape detection. Most authorship studies have focused on cooperative and/or unaware authors who do not take such precautions. This paper analyzes the methods implemented in the Java Graphical Authorship Attribution Program (JGAAP) against essays in the Brennan-Greenstadt obfuscation corpus that were written in deliberate attempts to mask style. The results demonstrate that many of the more robust and accurate methods implemented in JGAAP are effective in the presence of active deception.
Chapter PDF
Similar content being viewed by others
References
D. Balota, M. Yap, M. Cortese, K. Hutchison, B. Kessler, B. Loftis, J. Neely, D. Nelson, G. Simpson and R. Treiman, The English Lexicon Project, Behavior Research Methods, vol. 39, pp. 445–459, 2007.
M. Brennan and R. Greenstadt, Practical attacks against authorship recognition techniques, Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence, pp. 60–65, 2009.
C. Chaski, Empirical evaluations of language-based author identification techniques, International Journal of Speech, Language and the Law, vol. 8(1), pp. 1–65, 2001.
G. Crane, What do you do with a million books? D-Lib Magazine, vol. 12(3), 2006.
R. Forsyth, Towards a text benchmark suite, Proceedings of the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 1997.
D. Holmes, Authorship attribution, Computers and the Humanities, vol. 28(2), pp. 87–106, 1994.
D. Holmes and R. Forsyth, The Federalist revisited: New directions in authorship attribution, Literary and Linguistic Computing, vol. 10(2), pp. 111–127, 1995.
D. Hoover, Delta prime? Literary and Linguistic Computing, vol. 19(4), pp. 477–495, 2004.
P. Juola, Ad hoc Authorship Attribution Competition, Proceedings of the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 2004.
P. Juola, Authorship attribution for electronic documents, in Advances in Digital Forensics II, M. Olivier and S. Shenoi (Eds.), Springer, Boston, Massachusetts, pp. 119–130, 2006.
P. Juola, Authorship attribution, Foundations and Trends in Information Retrieval, vol. 1(3), pp. 233–334, 2008.
P. Juola, 20,000 ways not to do authorship attribution – and a few that work, presented at the Biennial Conference of the International Association of Forensic Linguists, 2009.
P. Juola, Cross-linguistic transference of authorship attribution, or why English-only prototypes are acceptable, presented at the Digital Humanities Conference, 2009.
P. Juola and H. Baayen, A controlled-corpus experiment in authorship attribution by cross-entropy, Literary and Linguistic Computing, vol. 20, pp. 59–67, 2005.
P. Juola, J. Noecker, M. Ryan and S. Speer, JGAAP 4.0 – A revised authorship attribution tool, presented at the Digital Humanities Conference, 2009.
P. Juola, J. Sofko and P. Brennan, A prototype for authorship attribution studies, Literary and Linguistic Computing, vol. 21(2), pp. 169–178, 2006.
P. Juola and D. Vescovi, Empirical evaluation of authorship obfuscation using JGAAP, Proceedings of the Third ACM Workshop on Artificial Intelligence and Security, pp. 14–18, 2010.
C. Martindale and D. McKenzie, On the utility of content analysis in authorship attribution: The Federalist papers, Computers and the Humanities, vol. 29(4), pp. 259–270, 1995.
T. Mendenhall, The characteristic curves of composition, Science, vol. IX, pp. 237–249, 1887.
F. Mosteller and D. Wallace, Inference and Disputed Authorship: The Federalist, Addison-Wesley, Reading, Massachusetts, 1964.
J. Nerbonne, The data deluge: development and delights, Proceedings of the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 2004.
M. Rockeach, R. Homant and L. Penner, A value analysis of the disputed Federalist papers, Journal of Personality and Social Psychology, vol. 16, pp. 245–250, 1970.
J. Rudman, The non-traditional case for the authorship of the twelve disputed Federalist papers: A monument built on sand, Proceedings of the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 2005.
F. Tweedie, S. Singh and D. Holmes, Neural network applications in stylometry: The Federalist papers, Computers and the Humanities, vol. 30(1), pp. 1–10, 1996.
P. Willett, The Porter stemming algorithm: Then and now, Program: Electronic Library and Information Systems, vol. 40(3), pp. 219–223, 2006.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 IFIP International Federation for Information Processing
About this paper
Cite this paper
Juola, P., Vescovi, D. (2011). Analyzing Stylometric Approaches to Author Obfuscation. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics VII. DigitalForensics 2011. IFIP Advances in Information and Communication Technology, vol 361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24212-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-24212-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24211-3
Online ISBN: 978-3-642-24212-0
eBook Packages: Computer ScienceComputer Science (R0)