Abstract
Here we show that the recently reported presence of long-range correlations in the distribution of words along texts is due to the complex distribution of the keywords, while common words are not correlated. Indeed we prove that the degree of long-range correlations of a word at long scales is a good measure of its relevance to the text. Additionally, we develop a model able to reproduce the spatial distribution of a word in a text, based on the long-range correlations observed for the word. The model not only reproduces the complex behaviour characterized by the presence of correlations at long scales and the degree of relevance of the word, but also the probability distribution of the inter-occurrences distances in the whole range of scales.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
It has been downloaded from the Project Gutenberg web page. http://www.gutenberg.org.
References
Carpena P, Bernaola-Galván P, Hackenberg M, Coronado AV, Oliver JL (2009) Level statistics of words: finding keywords in literary texts and symbolic sequences. Phys Rev E 79:035102(R)
Montemurro MA, Zanette DH (2010) Towards the quantification of the semantic information encoded in written language. Adv Complex Syst 13(2):135–153
Montemurro MA, Pury PA (2002) Long-range fractal correlations in literary corpora. Fractals 10:451–461
Bhan J, Kim S, Kim J, Kwon Y, Yang S, Lee K (2006) Long-range correlations in Korean literary corpora. Chaos Solitons Fractals 29:69–81
Şahin G, Erentürk M, Hacinliyan A (2009) Detrended fluctuation analysis in natural languages using non-corpus parametrization. Chaos Solitons Fractals 41:198–205
Altmann EG, Pierrehumbert JB, Motter AE (2009) Beyond word frequency: bursts, lulls, and scaling in the temporal distributions of words. PLoS ONE 4(11):e7678
Ortuño M, Carpena P, Bernaola-Galván P, Muñoz E, Somoza AM (2002) Keyword detection in natural languages and DNA. Europhys Lett 57(5):759–764
Voss RF (1992) Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys Rev Lett 68:3805–3808
Peng C-K, Buldyrev SV, Havlin S, Simons M, Stanley HE, Goldberger AL (1994) Mosaic organization of DNA nucleotides. Phys Rev E 49:1685–1689
Hu K, Ivanov PC, Chen Z, Carpena P, Stanley HE (2001) Effect of trends on detrended fluctuation analysis. Phys Rev E 64:011114
Makse HA, Havlin S, Schwartz M, Stanley HE (1996) Method for generating long-range correlations for large systems. Phys Rev E 53:5445–5449
Carretero-Campos C, Bernaola-Galván P, Ivanov PC, Carpena P (2012) Phase transitions in the first-passage time of scale-invariant correlated processes. Phys Rev E 85:011139
Acknowledgements
This work has been supported by Grant no. P07-FQM03163 from Spanish Junta de Andalucía.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Carretero-Campos, C., Montemurro, M.A., Bernaola-Galván, P., Coronado, A.V., Carpena, P. (2013). Towards a Deeper Understanding of the Complex Behaviour Observed in the Distribution of Words in Written Texts. In: Gilbert, T., Kirkilionis, M., Nicolis, G. (eds) Proceedings of the European Conference on Complex Systems 2012. Springer Proceedings in Complexity. Springer, Cham. https://doi.org/10.1007/978-3-319-00395-5_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-00395-5_34
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-00394-8
Online ISBN: 978-3-319-00395-5
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)