Analyzing Stylometric Approaches to Author Obfuscation

Juola, Patrick; Vescovi, Darren

doi:10.1007/978-3-642-24212-0_9

Patrick Juola³ &
Darren Vescovi³

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 361))

Included in the following conference series:

IFIP International Conference on Digital Forensics

2061 Accesses
6 Citations
22 Altmetric

Abstract

Authorship attribution is an important and emerging security tool. However, just as criminals may wear gloves to hide their fingerprints, so too may criminal authors mask their writing styles to escape detection. Most authorship studies have focused on cooperative and/or unaware authors who do not take such precautions. This paper analyzes the methods implemented in the Java Graphical Authorship Attribution Program (JGAAP) against essays in the Brennan-Greenstadt obfuscation corpus that were written in deliberate attempts to mask style. The results demonstrate that many of the more robust and accurate methods implemented in JGAAP are effective in the presence of active deception.

Download to read the full chapter text

Chapter PDF

The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation

Overview of PAN 2018

Secure Obfuscation of Authoring Style

Keywords

References

D. Balota, M. Yap, M. Cortese, K. Hutchison, B. Kessler, B. Loftis, J. Neely, D. Nelson, G. Simpson and R. Treiman, The English Lexicon Project, Behavior Research Methods, vol. 39, pp. 445–459, 2007.
Article Google Scholar
M. Brennan and R. Greenstadt, Practical attacks against authorship recognition techniques, Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence, pp. 60–65, 2009.
Google Scholar
C. Chaski, Empirical evaluations of language-based author identification techniques, International Journal of Speech, Language and the Law, vol. 8(1), pp. 1–65, 2001.
Google Scholar
G. Crane, What do you do with a million books? D-Lib Magazine, vol. 12(3), 2006.
Google Scholar
R. Forsyth, Towards a text benchmark suite, Proceedings of the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 1997.
Google Scholar
D. Holmes, Authorship attribution, Computers and the Humanities, vol. 28(2), pp. 87–106, 1994.
Article Google Scholar
D. Holmes and R. Forsyth, The Federalist revisited: New directions in authorship attribution, Literary and Linguistic Computing, vol. 10(2), pp. 111–127, 1995.
Article Google Scholar
D. Hoover, Delta prime? Literary and Linguistic Computing, vol. 19(4), pp. 477–495, 2004.
Article Google Scholar
P. Juola, Ad hoc Authorship Attribution Competition, Proceedings of the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 2004.
Google Scholar
P. Juola, Authorship attribution for electronic documents, in Advances in Digital Forensics II, M. Olivier and S. Shenoi (Eds.), Springer, Boston, Massachusetts, pp. 119–130, 2006.
Google Scholar
P. Juola, Authorship attribution, Foundations and Trends in Information Retrieval, vol. 1(3), pp. 233–334, 2008.
Article Google Scholar
P. Juola, 20,000 ways not to do authorship attribution – and a few that work, presented at the Biennial Conference of the International Association of Forensic Linguists, 2009.
Google Scholar
P. Juola, Cross-linguistic transference of authorship attribution, or why English-only prototypes are acceptable, presented at the Digital Humanities Conference, 2009.
Google Scholar
P. Juola and H. Baayen, A controlled-corpus experiment in authorship attribution by cross-entropy, Literary and Linguistic Computing, vol. 20, pp. 59–67, 2005.
Article Google Scholar
P. Juola, J. Noecker, M. Ryan and S. Speer, JGAAP 4.0 – A revised authorship attribution tool, presented at the Digital Humanities Conference, 2009.
Google Scholar
P. Juola, J. Sofko and P. Brennan, A prototype for authorship attribution studies, Literary and Linguistic Computing, vol. 21(2), pp. 169–178, 2006.
Article Google Scholar
P. Juola and D. Vescovi, Empirical evaluation of authorship obfuscation using JGAAP, Proceedings of the Third ACM Workshop on Artificial Intelligence and Security, pp. 14–18, 2010.
Chapter Google Scholar
C. Martindale and D. McKenzie, On the utility of content analysis in authorship attribution: The Federalist papers, Computers and the Humanities, vol. 29(4), pp. 259–270, 1995.
Article Google Scholar
T. Mendenhall, The characteristic curves of composition, Science, vol. IX, pp. 237–249, 1887.
Article Google Scholar
F. Mosteller and D. Wallace, Inference and Disputed Authorship: The Federalist, Addison-Wesley, Reading, Massachusetts, 1964.
MATH Google Scholar
J. Nerbonne, The data deluge: development and delights, Proceedings of the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 2004.
Google Scholar
M. Rockeach, R. Homant and L. Penner, A value analysis of the disputed Federalist papers, Journal of Personality and Social Psychology, vol. 16, pp. 245–250, 1970.
Article Google Scholar
J. Rudman, The non-traditional case for the authorship of the twelve disputed Federalist papers: A monument built on sand, Proceedings of the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 2005.
Google Scholar
F. Tweedie, S. Singh and D. Holmes, Neural network applications in stylometry: The Federalist papers, Computers and the Humanities, vol. 30(1), pp. 1–10, 1996.
Article Google Scholar
P. Willett, The Porter stemming algorithm: Then and now, Program: Electronic Library and Information Systems, vol. 40(3), pp. 219–223, 2006.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Duquesne University, Pittsburgh, Pennsylvania, USA
Patrick Juola & Darren Vescovi

Authors

Patrick Juola
View author publications
You can also search for this author in PubMed Google Scholar
Darren Vescovi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Air Force Institute of Technology, Wright-Patterson Air Force Base, 45433-7765, OH, USA
Gilbert Peterson
Department of Computer Science, University of Tulsa, 74104-3189, Tulsa, OK, USA
Sujeet Shenoi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Juola, P., Vescovi, D. (2011). Analyzing Stylometric Approaches to Author Obfuscation. In: Peterson, G., Shenoi, S. (eds) Advances in Digital Forensics VII. DigitalForensics 2011. IFIP Advances in Information and Communication Technology, vol 361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24212-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-24212-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24211-3
Online ISBN: 978-3-642-24212-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Analyzing Stylometric Approaches to Author Obfuscation

Abstract

Chapter PDF

Similar content being viewed by others

The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation

Overview of PAN 2018

Secure Obfuscation of Authoring Style

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Analyzing Stylometric Approaches to Author Obfuscation

Abstract

Chapter PDF

Similar content being viewed by others

The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation

Overview of PAN 2018

Secure Obfuscation of Authoring Style

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation