Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4123))

Abstract

We survey the authorship attribution of documents given some prior stylistic characteristics of the author’s writing extracted from a corpus of known works, e.g., authentication of disputed documents or literary works. Although the pioneering paper based on word length histograms appeared at the very end of the nineteenth century, the resolution power of this and other stylometry approaches is yet to be studied both theoretically and on case studies such that additional information can assist finding the correct attribution.

We survey several theoretical approaches including ones approximating the apparently nearly optimal one based on Kolmogorov conditional complexity and some case studies: attributing Shakespeare canon and newly discovered works as well as allegedly M. Twain’s newly-discovered works, Federalist papers binary (Madison vs. Hamilton) discrimination using Naive Bayes and other classifiers, and steganography presence testing. The latter topic is complemented by a sketch of an anagrams ambiguity study based on the Shannon cryptography theory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abramyan, A.: The Armenian Cryptography (in Armenian). Yerevan University Press (1974)

    Google Scholar 

  2. Bakhtin, M.: Problemy Poetiki Dostoevskogo, English translation. University of Minnesota Press (1984)

    Google Scholar 

  3. Bosch, R., Smith, J.: Separating hyperplanes and the authorship of the disputed Federalist Papers. American Mathematical Monthly 105(7), 601–608 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  4. Brinegar, C.: Mark Twain and the Quintus Curtis Snodgrass Letters: A statistical test of authorship. Journal of American Statistical Association 58(301), 85–96 (1963)

    Article  Google Scholar 

  5. Cilibasi, R., Vitanyi, P.: Clustering by compression, CWI manuscript (submitted, 2003)

    Google Scholar 

  6. Burges, C.: A Tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 955–974 (1998)

    Article  Google Scholar 

  7. Cover, T., Thomas, J.: Elements of Information Theory. Wiley, N.Y (1991)

    Book  MATH  Google Scholar 

  8. Donnelly, I.: The Great Cryptogram, 1, 1888, reprinted by Bell and Howell, Cleveland (1969)

    Google Scholar 

  9. Efron, B., Thisted, R.: Estimating the number of unseen species; How many words did Shakespeare know? Biometrika 63, 435–437 (1975)

    MATH  Google Scholar 

  10. Thisted, R., Efron, B.: Did Shakespeare write a newly discovered poem? Biometrika 74, 445–455 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  11. Friedman, W., Friedman, E.: The Shakespearean Ciphers Exposed. Cambridge University Press, Cambridge (1957)

    Google Scholar 

  12. Katirai, H.: Filtering junk e-mail (1999), See his web-site, http://members.rogers.com/hoomank/

  13. Khmelev, D., Tweedy, F.J.: Using markov chains for identification of writers. Literary and Linguistic Computing 16(4), 299–307 (2001)

    Article  Google Scholar 

  14. Kukushkina, O., Polikarpov, A., Khmelev, D.: Text authorship attribution using letter and grammatical information. Problems of Information Transmission 37(2), 172–184 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  15. Markov, A.: On application of statistical method. Comptes Rendus of Imper. Academy of Sciences, Ser. VI, X, 153 (1913), 239 (1916)

    Google Scholar 

  16. Matus, I.: Shakespeare, in Fact, Continuum, N.Y. (1994)

    Google Scholar 

  17. Mendenhall, T.A.: The characteristic curves of composition. Science 11, 237–249 (1887)

    Article  MATH  Google Scholar 

  18. Mendenhall, T.A.: A mechanical solution to a literary problem. Popular Science Monthly 60, 97–105 (1901)

    Google Scholar 

  19. Mitchell, J.: Who Wrote Shakespeare. Thames and Hudson Ltd., London (1996)

    Google Scholar 

  20. Mosteller, F., Wallace, D.: Inference and Disputed Authorship. Addison-Wesley, Reading (1964)

    MATH  Google Scholar 

  21. Nicholl, C.: The Reckoning, 2nd edn. Chicago University Press (1992)

    Google Scholar 

  22. Price, D.: Shakespeare’s Unorthodox Biography. Greenwood Press, London (2001)

    Google Scholar 

  23. Rosenfeld, R.: A maximum entropy approach to adaptive statistical language, Modeling. Computer, Speech and Language 10, 187–228 (1996); A shortened version of the author’s PhD thesis, Carnegie Mellon University (1994)

    Google Scholar 

  24. Thompson, J.W., Padover, S.K.: Secret Diplomacy; Espionage and Cryptography, pp. 1500–1815. F. Ungar Pub. Co., N.Y (1963)

    Google Scholar 

  25. De Vel, O., Anderson, A., Corney, M., Mohay, G.: Multi-Topic E-mail Authorship Attribution Forensics. In: Proc. Workshop on Data Mining for Security Applications, 8th ACM Conference on Computer Security (CCS 2001) (2001)

    Google Scholar 

  26. Williams, C.: Word-length distribution in the works of Shakespeare and Bacon. Biometrika 62, 207–212 (1975)

    Article  MATH  Google Scholar 

  27. Zhao, J.: The impact of cross-entropy on language modeling, PhD thesis, Mississippi State University (1999), http://www.isip.msstate.edu/publications/courses/ece_7000_speech/lectures/1999/lecture_06/paper/paper_v1.pdf

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Malyutov, M.B. (2006). Authorship Attribution of Texts: A Review. In: Ahlswede, R., et al. General Theory of Information Transfer and Combinatorics. Lecture Notes in Computer Science, vol 4123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11889342_20

Download citation

  • DOI: https://doi.org/10.1007/11889342_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-46244-6

  • Online ISBN: 978-3-540-46245-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics