Skip to main content

Detection of Different Authorship of Text Sequences through Self-organizing Maps and Mutual Information Function

  • Conference paper
Advances in Soft Computing (MICAI 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6438))

Included in the following conference series:

Abstract

Writers tend to express their ideas with different styles, defined with the so called firm or stylome, which is an abstraction of the general constraints and specific combinations of words within their language they decide to follow. Although capturing this style has proven to be very difficult, some advances have been achieved. Here, we present a novel system that is trained with texts from the same author, and is able to unveil some of its features, and to apply them to detect texts not written by the same author, or, at least, not written with the previously learned features. The system is an hybrid model based in self-organizing maps and in information-theoretic aspects. In the model, mutual information function of unknown texts are compared to the mutual information function of texts from a known author. If the distance between these two distributions exceeds a certain threshold, then the unknown text is from a different author, otherwise the authorship is the same. The decision threshold is obtained by the self-organizing map trained with the texts from the same author. We present results in authorship identification in several contexts including classic literature, journalism (political, economical, sports), and scientific divulgation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Malyutov, M.: Authorship attribution of texts: a review. Electronic Notes in Discrete Mathematics 21, 353–357 (2005)

    Article  MATH  Google Scholar 

  2. Markou, M., Singh, S.: Novelty detection: a review part 1: statistical approaches. Signal Processing 83, 2481–2497 (2003)

    Article  MATH  Google Scholar 

  3. Markou, M., Singh, S.: Noveltydetection: a review part 2: neural network based approaches. Signal Processing 83, 2499–2521 (2003)

    Article  MATH  Google Scholar 

  4. Juszczak, P., Tax, D., Pekalska, E., Duin, R.: Minimum spanning tree based one-class classifier. Neurocomputing 72, 1859–1869 (2009)

    Article  Google Scholar 

  5. Harmeling, S., Dornhege, G., Tax, D., i Meinecke F, Mueller K.: From outliers to prototypes: ordering data. Neurocomputing 69(13-15), 1608–1618 (2006)

    Google Scholar 

  6. Van Halteren, H., Baayen, R., Tweedie, F., Haverkort, M., Neijt, A.: New Machine Learning Methods Demonstrate the Existence of a Human Stylome. Journal of Quantitative Linguistics 12(1), 65–77 (2005), doi:10.1080/09296170500055350

    Article  Google Scholar 

  7. Coulthard, M.: Author identification, idiolect, and linguistic uniqueness. Journal of Applied Linguistics 25(4), 431–447 (2004)

    Article  Google Scholar 

  8. Clark, J., Hannond, C.: A Classifier System for Author Recognition Using Synonym-Based Features. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 839–849. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Dinu, L., Popescu, M.: Ordinal measures in authorship identification. In: Overview of the 1st International Competition on Plagiarism Detectioni, pp. 1–9 (2009)

    Google Scholar 

  10. Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer, Heidelberg (2000)

    MATH  Google Scholar 

  11. Ritter, H.: Self-Organizing Maps on non-euclidean Spaces Kohonen Maps. In: Oja, E., Kaski, S. (eds.) pp. 97–108 (1999)

    Google Scholar 

  12. Cottrell, M., Fort, J.C., Pagés, G.: Theoretical aspects of the SOM algorithm. Neurocomputing 21, 119–138 (1998)

    Article  MATH  Google Scholar 

  13. Barreto, G., Aguayo, L.: Time Series Clustering for Anomaly Detection Using Competitive Neural Networks. In: Príncipe, J.C., Miikkulainen, R. (eds.) WSOM 2009. LNCS, vol. 5629, pp. 28–36. Springer, Heidelberg (2009), doi:10.1007/978-3-642-02397-2

    Google Scholar 

  14. Cellucci, C., Albano, A., Rapp, P.: Statistical validation of mutual information calculations: Comparison of alternative numerical algorithms. Physical Review E 71, 066208 (2005)

    Article  Google Scholar 

  15. Gross, E., Herzel, H., Buldyrev, S., Stanley, E.: Species independence of mutual information in coding and noncoding DNA. PRE 61(5), 5624–5629 (2000)

    Article  Google Scholar 

  16. Bauer, M., Schsuter, S., Sayood, K.: The Average Mutual Information Profile as a Genomic Signature. BMC Bioinformatics 9(48) (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Neme, A., Lugo, B., Cervera, A. (2010). Detection of Different Authorship of Text Sequences through Self-organizing Maps and Mutual Information Function. In: Sidorov, G., Hernández Aguirre, A., Reyes García, C.A. (eds) Advances in Soft Computing. MICAI 2010. Lecture Notes in Computer Science(), vol 6438. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16773-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16773-7_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16772-0

  • Online ISBN: 978-3-642-16773-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics