Explanation in Computational Stylometry

Daelemans, Walter

doi:10.1007/978-3-642-37256-8_37

Walter Daelemans¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7817))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

3106 Accesses
32 Citations

Abstract

Computational stylometry, as in authorship attribution or profiling, has a large potential for applications in diverse areas: literary science, forensics, language psychology, sociolinguistics, even medical diagnosis. Yet, many of the basic research questions of this field are not studied systematically or even at all. In this paper we will go into these problems, and suggest that a reinterpretation of current and historical methods in the framework and methodology of machine learning of natural language processing would be helpful. We also argue for more attention in research for explanation in computational stylometry as opposed to purely quantitative evaluation measures and propose a strategy for data collection and analysis for achieving progress in computational stylometry. We also introduce a fairly new application of computational stylometry in internet security.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

van Halteren, H., Baayen, H., Tweedie, F., Haverkort, M., Neijt, A.: New machine learning methods demonstrate the existence of a human stylome. Journal of Quantitative Linguistics 12(1), 65–77 (2005)
Article Google Scholar
Stamatatos, E.: A survey of modern authorship attribution methods. JASIST 60(3), 538–556 (2009)
Article Google Scholar
Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. JASIST 60(1), 9–26 (2008)
Article Google Scholar
Juola, P.: Author attribution. Foundations and Trends in Information Retrieval 1(3), 233–334 (2008)
Article Google Scholar
Pennebaker, J.: The Secret Life of Pronouns. Bloomsbury Press, New York (2011)
Google Scholar
Fan, J., Kalyanpur, A., Gondek, D., Ferrucci, D.: Automatic knowledge extraction from documents. IBM Journal of Research and Development 56(3/4), 1–10 (2012)
Google Scholar
Liu, B.: Sentiment Analysis and Opinion Mining, 180 pages. Morgan & Claypool Publishers(2012)
Google Scholar
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Koppel, M., Schler, J., Argamon, S.: Authorship attribution in the wild. Language Resources and Evaluation 45, 83–94 (2011)
Article Google Scholar
Daelemans, W., Van den Bosch, A.: Memory-based language processing. Cambridge University Press, Cambridge (2005)
Book Google Scholar
Argamon, S.: Interpreting Burrow’s Delta: Geometric and Probabilistic Foundations. Literary and Linguistic Computing 23(3), 131–147 (2008)
Google Scholar
Koppel, M., Schler, J., Bonchel-Dokov, E.: Measuring differentiability: unmasking pseudonymous authors. Journal of Machine Learning Research 8, 1261–1276 (2007)
MATH Google Scholar
Rudman, J.: The state of authorship attribution studies: some problems and solutions. Computers and the Humanities 31(4), 351–365 (1997)
Article Google Scholar
Rudman, J.: The satet of non-traditional authorship studies 2010: some problems and solutions. In: Proceedings of the Digital Humanities, pp. 217–219 (2010)
Google Scholar
Stamou, C.: Stylochronometry: stylistic development, sequence of composition, and relative dating. Literary and Linguistic Computing 23(2), 181–199 (2008)
Article MathSciNet Google Scholar
Brennan, M., Afroz, S., Greenstadt, R.: Adversarial Stylometry: circumventing authorship recognition to preserve privacy and anonymity. ACM Transactions on Information and System Security 15(3), 12:1–22 (2012)
Google Scholar
Luyckx, K., Daelemans, W.: The effect of author set size and data size in authorship attribution. Literary and Linguistic Computing 26(1), 35–55 (2011)
Article Google Scholar
Grieve, J.: Quantitative authorship attribution: an evaluation of techniques. Literary and Linguistic Computing 22(3), 251–270 (2007)
Article Google Scholar
Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Proceedings 21st International Conference on Machine Learning, pp. 489–495 (2004)
Google Scholar
Koppel, M., Schler, J., Argamon, S., Winter, Y.: The Fundamental Problem of Authorship Attribution. English Studies 93(3), 284–291 (2012)
Article Google Scholar
Luyckx, K.: Scalability Issues in Authorship Attribution. UPA, Antwerp (2010)
Google Scholar
Daumé III, H.: Marcu. D.: Domain Adaptation for Statistical Classifiers. Journal of Artificial Intelligence Research 26, 101–126 (2006)
MathSciNet MATH Google Scholar
Kestemont, M., Luyckx, K., Daelemans, W., Crombez, T.: Cross-genre authorship verification using unmasking. English Studies 93(3), 340–356 (2012)
Article Google Scholar
Stein, B., Lipka, N., Prettenhofer, P.: Intrinsic plagiarism analysis. Language Resources and Evaluation 45(1), 63–82 (2011)
Article Google Scholar
Sanderson, C., Guenter, S.: Short text authorship attribution via sequence kernels, markov chains and author unmasking: an investigation. In: Proceedings of the 2006 EMNLP, pp. 482–491 (2006)
Google Scholar
Koppel, M., Argamon, S., Shimoni, S.: Automatically categorizing written texts by author gender. Literary and Linguistic Computing 17(4), 401–412 (2003)
Article Google Scholar
Peersman, C., Daelemans, W., Van Vaerenbergh, L.: Predicting Age and Gender in Online Social Networks. In: 3rd International Workshop on Search and Mining User-generated Contents (SMUC 2011), pp. 37–44 (2012)
Google Scholar
Peersman, C., Vaassen, F., Van Asch, V., Daelemans, W.: Conversation Level Constraints on Pedophile Detection in Chat Rooms. In: CLEF 2012 Conference and Labs of the Evaluation Forum, pp. 1–13 (2012)
Google Scholar
Luyckx, K., Vaassen, F., Peersman, C., Daelemans, W.: Fine-Grained Emotion Detection in Suicide Notes: A Thresholding Approach to Multi-Label Classification. Biomedical Informatics Insights 5(suppl. 1), 61–69 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

CLiPS, University of Antwerp, Belgium
Walter Daelemans

Authors

Walter Daelemans
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico D.F., Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Daelemans, W. (2013). Explanation in Computational Stylometry. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7817. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37256-8_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-37256-8_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37255-1
Online ISBN: 978-3-642-37256-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics