Zipf’s law—another view

Abstract

In many scientific disciplines phenomena are observed which are usually described using one of the various versions of the so-called Zipf’s Law. As no single formula could be found so far which would be able to sufficiently fit to all data more and more variants (modifications, ad-hoc formulas, derivations etc.) are created, each of which agrees quite well with some given data sets but fail with others. The present paper proposes a new approach to the problem, based on the assumption that every data set which displays a Zipf-like structure is composed of several system components. A corresponding model is presented and tested on data from 100 texts from 20 languages.

This is a preview of subscription content, access via your institution.

References

  1. Bak P.: How Nature Works. The Science of Self-Organized Criticality. Copernicus Springer, New York (1999)

    Google Scholar 

  2. Chitashvili R.J., Baayen R.H.: Word frequency distributions of text and corpora as large number of rare event distributions. In: Altmann, G., Hřebíček, L. (eds) Quantitative Text Analysis, pp. 54–135. WVT, Trier (1993)

    Google Scholar 

  3. Ferrer I Cancho R., Solé R.V.: Two regions in the frequency of words and the origin of complex lexicons: Zipf’s law revisited. J. Quant. Linguist. 8, 165–173 (2001)

    Article  Google Scholar 

  4. Gell-Mann M.: The Quark and the Jaguar. Freeman, New York (1994)

    Google Scholar 

  5. Ha, L.Q., Smith, F.J.: Zipf and type-token rules for the English and Irish language. http://www.nslij-genetics.org/wli/zipf/ha04.pdf. Cited 10 October 2007 (2004)

  6. Mandelbrot, B.: An information theory of the statistical structure of language. In: Jackson, W. (ed.) Communication Theory, pp. 486–502. Butterworth, London (1953)

  7. Miller G.A., Newman E.B., Friedman E.A.: Length-frequency statistics for written English. Information and Control 1(1958), 370–389 (1968)

    Google Scholar 

  8. Naranan S., Balasubrahmanyan V.K.: Models for power relations in linguistics and information science. J. Quant. Linguist. 5, 35–61 (1998)

    Article  Google Scholar 

  9. Popescu, I.-I., Altmann, G.: Hapax legomena and language typology. J. Quant. Linguist. 15, 370–378 (2008)

    Google Scholar 

  10. Popescu, I.-I., Vidya, M.N., Uhlířová, L., Pustet, R., Mehler, A., Mačutek, J., Krupa, V., Köhler, R., Jayaram, B.D., Grzybek, P., Altmann, G.: Word Frequency Studies. Berlin, New York: Mouton de Gruyter (2008) (in print)

  11. Rapoport A.: Zipf’s Law Re-visited. In: Guiter, H., Arapov, M.V.(eds) Studies on Zipf’s Law, pp. 1–28. Brockmeyer, Bochum (1982)

    Google Scholar 

  12. Simon H.: On a class of skew distribution functions. Biometrika 42, 435–440 (1955)

    Google Scholar 

  13. Wimmer G., Altmann G.: Unified derivation of some linguistic laws. In: Köhler, R., Altmann, G., Piotrowski, R.G.(eds) Quantitative Linguistics. An International Handbook, pp. 791–807. De Gruyter, Berlin, New York (2005)

    Google Scholar 

  14. Tuldava J.: The frequency spectrum of text and vocabulary. J. Quant. Linguist. 3, 38–50 (1996)

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Gabriel Altmann.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Popescu, II., Altmann, G. & Köhler, R. Zipf’s law—another view. Qual Quant 44, 713–731 (2010). https://doi.org/10.1007/s11135-009-9234-y

Download citation

Keywords

  • Zipf’s law
  • Rank-frequency distribution
  • Synthetic language