Abstract
Writers tend to express their ideas with different styles, defined with the so called firm or stylome, which is an abstraction of the general constraints and specific combinations of words within their language they decide to follow. Although capturing this style has proven to be very difficult, some advances have been achieved. Here, we present a novel system that is trained with texts from the same author, and is able to unveil some of its features, and to apply them to detect texts not written by the same author, or, at least, not written with the previously learned features. The system is an hybrid model based in self-organizing maps and in information-theoretic aspects. In the model, mutual information function of unknown texts are compared to the mutual information function of texts from a known author. If the distance between these two distributions exceeds a certain threshold, then the unknown text is from a different author, otherwise the authorship is the same. The decision threshold is obtained by the self-organizing map trained with the texts from the same author. We present results in authorship identification in several contexts including classic literature, journalism (political, economical, sports), and scientific divulgation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Malyutov, M.: Authorship attribution of texts: a review. Electronic Notes in Discrete Mathematics 21, 353–357 (2005)
Markou, M., Singh, S.: Novelty detection: a review part 1: statistical approaches. Signal Processing 83, 2481–2497 (2003)
Markou, M., Singh, S.: Noveltydetection: a review part 2: neural network based approaches. Signal Processing 83, 2499–2521 (2003)
Juszczak, P., Tax, D., Pekalska, E., Duin, R.: Minimum spanning tree based one-class classifier. Neurocomputing 72, 1859–1869 (2009)
Harmeling, S., Dornhege, G., Tax, D., i Meinecke F, Mueller K.: From outliers to prototypes: ordering data. Neurocomputing 69(13-15), 1608–1618 (2006)
Van Halteren, H., Baayen, R., Tweedie, F., Haverkort, M., Neijt, A.: New Machine Learning Methods Demonstrate the Existence of a Human Stylome. Journal of Quantitative Linguistics 12(1), 65–77 (2005), doi:10.1080/09296170500055350
Coulthard, M.: Author identification, idiolect, and linguistic uniqueness. Journal of Applied Linguistics 25(4), 431–447 (2004)
Clark, J., Hannond, C.: A Classifier System for Author Recognition Using Synonym-Based Features. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 839–849. Springer, Heidelberg (2007)
Dinu, L., Popescu, M.: Ordinal measures in authorship identification. In: Overview of the 1st International Competition on Plagiarism Detectioni, pp. 1–9 (2009)
Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer, Heidelberg (2000)
Ritter, H.: Self-Organizing Maps on non-euclidean Spaces Kohonen Maps. In: Oja, E., Kaski, S. (eds.) pp. 97–108 (1999)
Cottrell, M., Fort, J.C., Pagés, G.: Theoretical aspects of the SOM algorithm. Neurocomputing 21, 119–138 (1998)
Barreto, G., Aguayo, L.: Time Series Clustering for Anomaly Detection Using Competitive Neural Networks. In: Príncipe, J.C., Miikkulainen, R. (eds.) WSOM 2009. LNCS, vol. 5629, pp. 28–36. Springer, Heidelberg (2009), doi:10.1007/978-3-642-02397-2
Cellucci, C., Albano, A., Rapp, P.: Statistical validation of mutual information calculations: Comparison of alternative numerical algorithms. Physical Review E 71, 066208 (2005)
Gross, E., Herzel, H., Buldyrev, S., Stanley, E.: Species independence of mutual information in coding and noncoding DNA. PRE 61(5), 5624–5629 (2000)
Bauer, M., Schsuter, S., Sayood, K.: The Average Mutual Information Profile as a Genomic Signature. BMC Bioinformatics 9(48) (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Neme, A., Lugo, B., Cervera, A. (2010). Detection of Different Authorship of Text Sequences through Self-organizing Maps and Mutual Information Function. In: Sidorov, G., Hernández Aguirre, A., Reyes García, C.A. (eds) Advances in Soft Computing. MICAI 2010. Lecture Notes in Computer Science(), vol 6438. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16773-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-16773-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16772-0
Online ISBN: 978-3-642-16773-7
eBook Packages: Computer ScienceComputer Science (R0)