Dynamics of Style and the Case of the Diario Postumo by Eugenio Montale: A Quantitative Approach

Chapter
Part of the Lecture Notes in Morphogenesis book series (LECTMORPH)

Abstract

Here we face a concrete problem of integrity of a very specific textual corpus, namely the 84 poems now forming, after a troubled journey, the so-called Diario Postumo (DP) by Eugenio Montale. Our approach is rather simple to describe: it is based on two distinct methods that measure similarity between texts, namely two different algorithms that given any pair of texts return a positive number which is smaller for similar texts. The first similarity measure, called entropic distance (or lzwe), is based on the use of cross entropy to measure differences between sequences of symbols, as learned from data compression theory. The second similarity distance, called n-gram distance, is also very simple to describe and it is based on the frequency of sequences of consecutive n characters. Both distances have been described elsewhere, but for completeness we report their precise description in the Appendix. The main purpose of our analysis is to test if it is possible, trough purely automatic and quantitative methods, to reveal anomalies in the poems forming the DP. Such anomalies exist and are compatible and coherent with the hypothesis that they are the result of several elaborations of authentic Montale material, originally created and recorded in different forms. Our research on the DP is just a part of a wider research project aimed at exploring the possibility of combining philological qualitative methods with mathematical quantitative approaches to solve problems in authorship attribution (A.A.), forgery detection and integrity texting of textual corpora.

Keywords

Textual Corpus Cross Entropy Forgery Detection Plagiarism Detection Candidate Text 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Altmann, E.G., Cristadoro, G., Degli Esposti, M.: On the origin of long-range correlations in texts. Proc. Natl. Acad. Sci. 109, 11582–11587 (2012)CrossRefGoogle Scholar
  2. 2.
    Basile, C., Benedetto, D., Caglioti, E., Degli Esposti, M.: An example of mathematical authorship attribution. J. Math. Phys. 49, 1–20 (2008)CrossRefGoogle Scholar
  3. 3.
    Benedetto, D., Caglioti, E., Loreto, V.: Language trees and zipping. Phys. Rev. Lett. 88(4), 48702 (2002)CrossRefGoogle Scholar
  4. 4.
    Benedetto, D., Degli Esposti, M., Maspero, C.: The puzzle of Basils Epistula 38: a mathematical approach to a philological problem. J. Quant. Linguist. 20(4), 267–287 (2013)Google Scholar
  5. 5.
    Bennet, W.R.: Scientific and Engineering Problem-Solving with the Computer. Prentice-Hall, Englewood Cliffs (1976)Google Scholar
  6. 6.
    Canettieri, P., Italia, P.: Un caso di attribuzionismo novecentesco: il Diario Postumo di Montale. Cogn. Philol. 6 (2013)Google Scholar
  7. 7.
    Clement, R., Sharp, D.: Ngram and Bayesian classification of documents. Lit. Linguist. Comput. 18, 423–447 (2003)Google Scholar
  8. 8.
    Condello, F.: I filologi e gli angeli. E’ di Eugenio Montale il Diario postumo?. Bononia University Press, ISBN-13: 978-8873959786 (2014)Google Scholar
  9. 9.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, Hoboken (2006)Google Scholar
  10. 10.
    Juola, P.: Authorship attribution. FNT Inf. Retr. 1, 233–334 (2007)CrossRefGoogle Scholar
  11. 11.
    Kes̆elj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based author profiles for authorship attribution. In: Kes̆elj, V., Endo, T. (eds.) Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING’03, pp. 255–264. Dalhousie University, Halifax (2003)Google Scholar
  12. 12.
    Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Inf. Theory IT–22(1), 75–81 (1976)CrossRefGoogle Scholar
  13. 13.
    Lempel, A., Ziv, J.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)CrossRefGoogle Scholar
  14. 14.
    Lempel, A., Ziv, J.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory IT–24(5), 530–536 (1978)Google Scholar
  15. 15.
    Mendenhall, T.C.: The characteristic curves of composition. Science 9(214), 237–249 (1887)CrossRefGoogle Scholar
  16. 16.
    Proceedings of the Workshop, A carte scoperte. Eugenio Montale. E’ il “Diario Postumo” un falso ?, Bologna, 11 novembre 2014. Bononia University Press (to appear, 2016)Google Scholar
  17. 17.
    Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60, 538–556 (2009)CrossRefGoogle Scholar
  18. 18.
    Stamatatos, E., Daelemans, W., Verhoeven, B., Potthast, M., Stein, B., Juola, P., Sanchez-perez, M.A., Barrón-cedeño, A.: Overview of the author identification task at PAN-2013. Notebook Papers of CLEF 2013 LABs and Workshops (CLEF-2013) (2013)Google Scholar
  19. 19.
    Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948)Google Scholar
  20. 20.
    Wyner, A.D., Ziv, J., Wyner, A.J.: On the role of pattern matching in information theory. IEEE Trans. Inf. Theory 44(6), 2045–2056 (1998)CrossRefGoogle Scholar
  21. 21.
    Ziv, J., Merhav, N.: A measure of relative entropy between individual sequences with application to universal classification. IEEE Trans. Inf. Theory 39(4), 1270–1279 (1993)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Dipartimento di MatematicaSapienza Università di RomaRomeItaly
  2. 2.Dipartimento di MatematicaUniversità di BolognaBolognaItaly

Personalised recommendations