Towards a Deeper Understanding of the Complex Behaviour Observed in the Distribution of Words in Written Texts

  • Concepción Carretero-Campos
  • Marcelo A. Montemurro
  • Pedro Bernaola-Galván
  • Ana V. Coronado
  • Pedro Carpena
Part of the Springer Proceedings in Complexity book series (SPCOM)


Here we show that the recently reported presence of long-range correlations in the distribution of words along texts is due to the complex distribution of the keywords, while common words are not correlated. Indeed we prove that the degree of long-range correlations of a word at long scales is a good measure of its relevance to the text. Additionally, we develop a model able to reproduce the spatial distribution of a word in a text, based on the long-range correlations observed for the word. The model not only reproduces the complex behaviour characterized by the presence of correlations at long scales and the degree of relevance of the word, but also the probability distribution of the inter-occurrences distances in the whole range of scales.


Long-range correlations Keyword detection Complex structure of words in texts Word relevance and complexity 



This work has been supported by Grant no. P07-FQM03163 from Spanish Junta de Andalucía.


  1. 1.
    Carpena P, Bernaola-Galván P, Hackenberg M, Coronado AV, Oliver JL (2009) Level statistics of words: finding keywords in literary texts and symbolic sequences. Phys Rev E 79:035102(R) ADSCrossRefGoogle Scholar
  2. 2.
    Montemurro MA, Zanette DH (2010) Towards the quantification of the semantic information encoded in written language. Adv Complex Syst 13(2):135–153 zbMATHCrossRefGoogle Scholar
  3. 3.
    Montemurro MA, Pury PA (2002) Long-range fractal correlations in literary corpora. Fractals 10:451–461 CrossRefGoogle Scholar
  4. 4.
    Bhan J, Kim S, Kim J, Kwon Y, Yang S, Lee K (2006) Long-range correlations in Korean literary corpora. Chaos Solitons Fractals 29:69–81 ADSzbMATHCrossRefGoogle Scholar
  5. 5.
    Şahin G, Erentürk M, Hacinliyan A (2009) Detrended fluctuation analysis in natural languages using non-corpus parametrization. Chaos Solitons Fractals 41:198–205 ADSCrossRefGoogle Scholar
  6. 6.
    Altmann EG, Pierrehumbert JB, Motter AE (2009) Beyond word frequency: bursts, lulls, and scaling in the temporal distributions of words. PLoS ONE 4(11):e7678 ADSCrossRefGoogle Scholar
  7. 7.
    Ortuño M, Carpena P, Bernaola-Galván P, Muñoz E, Somoza AM (2002) Keyword detection in natural languages and DNA. Europhys Lett 57(5):759–764 ADSCrossRefGoogle Scholar
  8. 8.
    Voss RF (1992) Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys Rev Lett 68:3805–3808 ADSCrossRefGoogle Scholar
  9. 9.
    Peng C-K, Buldyrev SV, Havlin S, Simons M, Stanley HE, Goldberger AL (1994) Mosaic organization of DNA nucleotides. Phys Rev E 49:1685–1689 ADSCrossRefGoogle Scholar
  10. 10.
    Hu K, Ivanov PC, Chen Z, Carpena P, Stanley HE (2001) Effect of trends on detrended fluctuation analysis. Phys Rev E 64:011114 ADSCrossRefGoogle Scholar
  11. 11.
    Makse HA, Havlin S, Schwartz M, Stanley HE (1996) Method for generating long-range correlations for large systems. Phys Rev E 53:5445–5449 ADSCrossRefGoogle Scholar
  12. 12.
    Carretero-Campos C, Bernaola-Galván P, Ivanov PC, Carpena P (2012) Phase transitions in the first-passage time of scale-invariant correlated processes. Phys Rev E 85:011139 ADSCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Concepción Carretero-Campos
    • 1
  • Marcelo A. Montemurro
    • 2
  • Pedro Bernaola-Galván
    • 1
  • Ana V. Coronado
    • 1
  • Pedro Carpena
    • 1
  1. 1.Departamento de Física Aplicada IIUniversidad de MálagaMálagaSpain
  2. 2.Faculty of Life SciencesThe University of ManchesterManchesterUK

Personalised recommendations