Advertisement

Abstract

Computational stylometry, as in authorship attribution or profiling, has a large potential for applications in diverse areas: literary science, forensics, language psychology, sociolinguistics, even medical diagnosis. Yet, many of the basic research questions of this field are not studied systematically or even at all. In this paper we will go into these problems, and suggest that a reinterpretation of current and historical methods in the framework and methodology of machine learning of natural language processing would be helpful. We also argue for more attention in research for explanation in computational stylometry as opposed to purely quantitative evaluation measures and propose a strategy for data collection and analysis for achieving progress in computational stylometry. We also introduce a fairly new application of computational stylometry in internet security.

Keywords

Social Network Site Machine Learning Method Short Text Supervise Machine Learning Knowledge Extraction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    van Halteren, H., Baayen, H., Tweedie, F., Haverkort, M., Neijt, A.: New machine learning methods demonstrate the existence of a human stylome. Journal of Quantitative Linguistics 12(1), 65–77 (2005)CrossRefGoogle Scholar
  2. 2.
    Stamatatos, E.: A survey of modern authorship attribution methods. JASIST 60(3), 538–556 (2009)CrossRefGoogle Scholar
  3. 3.
    Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. JASIST 60(1), 9–26 (2008)CrossRefGoogle Scholar
  4. 4.
    Juola, P.: Author attribution. Foundations and Trends in Information Retrieval 1(3), 233–334 (2008)CrossRefGoogle Scholar
  5. 5.
    Pennebaker, J.: The Secret Life of Pronouns. Bloomsbury Press, New York (2011)Google Scholar
  6. 6.
    Fan, J., Kalyanpur, A., Gondek, D., Ferrucci, D.: Automatic knowledge extraction from documents. IBM Journal of Research and Development 56(3/4), 1–10 (2012)Google Scholar
  7. 7.
    Liu, B.: Sentiment Analysis and Opinion Mining, 180 pages. Morgan & Claypool Publishers(2012)Google Scholar
  8. 8.
    Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  9. 9.
    Koppel, M., Schler, J., Argamon, S.: Authorship attribution in the wild. Language Resources and Evaluation 45, 83–94 (2011)CrossRefGoogle Scholar
  10. 10.
    Daelemans, W., Van den Bosch, A.: Memory-based language processing. Cambridge University Press, Cambridge (2005)CrossRefGoogle Scholar
  11. 11.
    Argamon, S.: Interpreting Burrow’s Delta: Geometric and Probabilistic Foundations. Literary and Linguistic Computing 23(3), 131–147 (2008)Google Scholar
  12. 12.
    Koppel, M., Schler, J., Bonchel-Dokov, E.: Measuring differentiability: unmasking pseudonymous authors. Journal of Machine Learning Research 8, 1261–1276 (2007)zbMATHGoogle Scholar
  13. 13.
    Rudman, J.: The state of authorship attribution studies: some problems and solutions. Computers and the Humanities 31(4), 351–365 (1997)CrossRefGoogle Scholar
  14. 14.
    Rudman, J.: The satet of non-traditional authorship studies 2010: some problems and solutions. In: Proceedings of the Digital Humanities, pp. 217–219 (2010)Google Scholar
  15. 15.
    Stamou, C.: Stylochronometry: stylistic development, sequence of composition, and relative dating. Literary and Linguistic Computing 23(2), 181–199 (2008)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Brennan, M., Afroz, S., Greenstadt, R.: Adversarial Stylometry: circumventing authorship recognition to preserve privacy and anonymity. ACM Transactions on Information and System Security 15(3), 12:1–22 (2012)Google Scholar
  17. 17.
    Luyckx, K., Daelemans, W.: The effect of author set size and data size in authorship attribution. Literary and Linguistic Computing 26(1), 35–55 (2011)CrossRefGoogle Scholar
  18. 18.
    Grieve, J.: Quantitative authorship attribution: an evaluation of techniques. Literary and Linguistic Computing 22(3), 251–270 (2007)CrossRefGoogle Scholar
  19. 19.
    Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Proceedings 21st International Conference on Machine Learning, pp. 489–495 (2004)Google Scholar
  20. 20.
    Koppel, M., Schler, J., Argamon, S., Winter, Y.: The Fundamental Problem of Authorship Attribution. English Studies 93(3), 284–291 (2012)CrossRefGoogle Scholar
  21. 21.
    Luyckx, K.: Scalability Issues in Authorship Attribution. UPA, Antwerp (2010)Google Scholar
  22. 22.
    Daumé III, H.: Marcu. D.: Domain Adaptation for Statistical Classifiers. Journal of Artificial Intelligence Research 26, 101–126 (2006)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Kestemont, M., Luyckx, K., Daelemans, W., Crombez, T.: Cross-genre authorship verification using unmasking. English Studies 93(3), 340–356 (2012)CrossRefGoogle Scholar
  24. 24.
    Stein, B., Lipka, N., Prettenhofer, P.: Intrinsic plagiarism analysis. Language Resources and Evaluation 45(1), 63–82 (2011)CrossRefGoogle Scholar
  25. 25.
    Sanderson, C., Guenter, S.: Short text authorship attribution via sequence kernels, markov chains and author unmasking: an investigation. In: Proceedings of the 2006 EMNLP, pp. 482–491 (2006)Google Scholar
  26. 26.
    Koppel, M., Argamon, S., Shimoni, S.: Automatically categorizing written texts by author gender. Literary and Linguistic Computing 17(4), 401–412 (2003)CrossRefGoogle Scholar
  27. 27.
    Peersman, C., Daelemans, W., Van Vaerenbergh, L.: Predicting Age and Gender in Online Social Networks. In: 3rd International Workshop on Search and Mining User-generated Contents (SMUC 2011), pp. 37–44 (2012)Google Scholar
  28. 28.
    Peersman, C., Vaassen, F., Van Asch, V., Daelemans, W.: Conversation Level Constraints on Pedophile Detection in Chat Rooms. In: CLEF 2012 Conference and Labs of the Evaluation Forum, pp. 1–13 (2012)Google Scholar
  29. 29.
    Luyckx, K., Vaassen, F., Peersman, C., Daelemans, W.: Fine-Grained Emotion Detection in Suicide Notes: A Thresholding Approach to Multi-Label Classification. Biomedical Informatics Insights 5(suppl. 1), 61–69 (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Walter Daelemans
    • 1
  1. 1.CLiPSUniversity of AntwerpBelgium

Personalised recommendations