Creativity and Universality in Language pp 157-176 | Cite as

# Dynamics of Style and the Case of the **Diario Postumo** by Eugenio Montale: A Quantitative Approach

**Diario Postumo**

## Abstract

Here we face a concrete problem of integrity of a very specific textual corpus, namely the 84 poems now forming, after a troubled journey, the so-called *Diario Postumo* (DP) by Eugenio Montale. Our approach is rather simple to describe: it is based on two distinct methods that measure similarity between texts, namely two different algorithms that given any pair of texts return a positive number which is smaller for similar texts. The first similarity measure, called *entropic distance* (or *lzwe*), is based on the use of cross entropy to measure *differences* between sequences of symbols, as learned from data compression theory. The second similarity distance, called *n-gram* distance, is also very simple to describe and it is based on the frequency of sequences of consecutive *n* characters. Both distances have been described elsewhere, but for completeness we report their precise description in the Appendix. The main purpose of our analysis is to test if it is possible, trough purely automatic and quantitative methods, to reveal anomalies in the poems forming the DP. Such anomalies exist and are compatible and coherent with the hypothesis that they are the result of several elaborations of authentic Montale material, originally created and recorded in different forms. Our research on the DP is just a part of a wider research project aimed at exploring the possibility of combining philological qualitative methods with mathematical quantitative approaches to solve problems in authorship attribution (A.A.), forgery detection and integrity texting of textual corpora.

## Keywords

Textual Corpus Cross Entropy Forgery Detection Plagiarism Detection Candidate Text## References

- 1.Altmann, E.G., Cristadoro, G., Degli Esposti, M.: On the origin of long-range correlations in texts. Proc. Natl. Acad. Sci.
**109**, 11582–11587 (2012)CrossRefGoogle Scholar - 2.Basile, C., Benedetto, D., Caglioti, E., Degli Esposti, M.: An example of mathematical authorship attribution. J. Math. Phys.
**49**, 1–20 (2008)CrossRefGoogle Scholar - 3.Benedetto, D., Caglioti, E., Loreto, V.: Language trees and zipping. Phys. Rev. Lett.
**88**(4), 48702 (2002)CrossRefGoogle Scholar - 4.Benedetto, D., Degli Esposti, M., Maspero, C.: The puzzle of Basils Epistula 38: a mathematical approach to a philological problem. J. Quant. Linguist.
**20**(4), 267–287 (2013)Google Scholar - 5.Bennet, W.R.: Scientific and Engineering Problem-Solving with the Computer. Prentice-Hall, Englewood Cliffs (1976)Google Scholar
- 6.Canettieri, P., Italia, P.: Un caso di attribuzionismo novecentesco: il Diario Postumo di Montale. Cogn. Philol.
**6**(2013)Google Scholar - 7.Clement, R., Sharp, D.: Ngram and Bayesian classification of documents. Lit. Linguist. Comput.
**18**, 423–447 (2003)Google Scholar - 8.Condello, F.: I filologi e gli angeli. E’ di Eugenio Montale il Diario postumo?. Bononia University Press, ISBN-13: 978-8873959786 (2014)Google Scholar
- 9.Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, Hoboken (2006)Google Scholar
- 10.Juola, P.: Authorship attribution. FNT Inf. Retr.
**1**, 233–334 (2007)CrossRefGoogle Scholar - 11.Kes̆elj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based author profiles for authorship attribution. In: Kes̆elj, V., Endo, T. (eds.) Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING’03, pp. 255–264. Dalhousie University, Halifax (2003)Google Scholar
- 12.Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Inf. Theory
**IT–22**(1), 75–81 (1976)CrossRefGoogle Scholar - 13.Lempel, A., Ziv, J.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory
**23**(3), 337–343 (1977)CrossRefGoogle Scholar - 14.Lempel, A., Ziv, J.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory
**IT–24**(5), 530–536 (1978)Google Scholar - 15.Mendenhall, T.C.: The characteristic curves of composition. Science
**9**(214), 237–249 (1887)CrossRefGoogle Scholar - 16.Proceedings of the Workshop, A carte scoperte. Eugenio Montale. E’ il “Diario Postumo” un falso ?, Bologna, 11 novembre 2014. Bononia University Press (to appear, 2016)Google Scholar
- 17.Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol.
**60**, 538–556 (2009)CrossRefGoogle Scholar - 18.Stamatatos, E., Daelemans, W., Verhoeven, B., Potthast, M., Stein, B., Juola, P., Sanchez-perez, M.A., Barrón-cedeño, A.: Overview of the author identification task at PAN-2013. Notebook Papers of CLEF 2013 LABs and Workshops (CLEF-2013) (2013)Google Scholar
- 19.Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J.
**27**, 379–423, 623–656 (1948)Google Scholar - 20.Wyner, A.D., Ziv, J., Wyner, A.J.: On the role of pattern matching in information theory. IEEE Trans. Inf. Theory
**44**(6), 2045–2056 (1998)CrossRefGoogle Scholar - 21.Ziv, J., Merhav, N.: A measure of relative entropy between individual sequences with application to universal classification. IEEE Trans. Inf. Theory
**39**(4), 1270–1279 (1993)Google Scholar