Quality & Quantity

, Volume 44, Issue 4, pp 713–731 | Cite as

Zipf’s law—another view

  • Ioan-Iovitz Popescu
  • Gabriel Altmann
  • Reinhard Köhler
Article

Abstract

In many scientific disciplines phenomena are observed which are usually described using one of the various versions of the so-called Zipf’s Law. As no single formula could be found so far which would be able to sufficiently fit to all data more and more variants (modifications, ad-hoc formulas, derivations etc.) are created, each of which agrees quite well with some given data sets but fail with others. The present paper proposes a new approach to the problem, based on the assumption that every data set which displays a Zipf-like structure is composed of several system components. A corresponding model is presented and tested on data from 100 texts from 20 languages.

Keywords

Zipf’s law Rank-frequency distribution Synthetic language 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bak P.: How Nature Works. The Science of Self-Organized Criticality. Copernicus Springer, New York (1999)Google Scholar
  2. Chitashvili R.J., Baayen R.H.: Word frequency distributions of text and corpora as large number of rare event distributions. In: Altmann, G., Hřebíček, L. (eds) Quantitative Text Analysis, pp. 54–135. WVT, Trier (1993)Google Scholar
  3. Ferrer I Cancho R., Solé R.V.: Two regions in the frequency of words and the origin of complex lexicons: Zipf’s law revisited. J. Quant. Linguist. 8, 165–173 (2001)CrossRefGoogle Scholar
  4. Gell-Mann M.: The Quark and the Jaguar. Freeman, New York (1994)Google Scholar
  5. Ha, L.Q., Smith, F.J.: Zipf and type-token rules for the English and Irish language. http://www.nslij-genetics.org/wli/zipf/ha04.pdf. Cited 10 October 2007 (2004)
  6. Mandelbrot, B.: An information theory of the statistical structure of language. In: Jackson, W. (ed.) Communication Theory, pp. 486–502. Butterworth, London (1953)Google Scholar
  7. Miller G.A., Newman E.B., Friedman E.A.: Length-frequency statistics for written English. Information and Control 1(1958), 370–389 (1968)Google Scholar
  8. Naranan S., Balasubrahmanyan V.K.: Models for power relations in linguistics and information science. J. Quant. Linguist. 5, 35–61 (1998)CrossRefGoogle Scholar
  9. Popescu, I.-I., Altmann, G.: Hapax legomena and language typology. J. Quant. Linguist. 15, 370–378 (2008)Google Scholar
  10. Popescu, I.-I., Vidya, M.N., Uhlířová, L., Pustet, R., Mehler, A., Mačutek, J., Krupa, V., Köhler, R., Jayaram, B.D., Grzybek, P., Altmann, G.: Word Frequency Studies. Berlin, New York: Mouton de Gruyter (2008) (in print)Google Scholar
  11. Rapoport A.: Zipf’s Law Re-visited. In: Guiter, H., Arapov, M.V.(eds) Studies on Zipf’s Law, pp. 1–28. Brockmeyer, Bochum (1982)Google Scholar
  12. Simon H.: On a class of skew distribution functions. Biometrika 42, 435–440 (1955)Google Scholar
  13. Wimmer G., Altmann G.: Unified derivation of some linguistic laws. In: Köhler, R., Altmann, G., Piotrowski, R.G.(eds) Quantitative Linguistics. An International Handbook, pp. 791–807. De Gruyter, Berlin, New York (2005)Google Scholar
  14. Tuldava J.: The frequency spectrum of text and vocabulary. J. Quant. Linguist. 3, 38–50 (1996)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  • Ioan-Iovitz Popescu
    • 1
  • Gabriel Altmann
    • 2
  • Reinhard Köhler
    • 3
  1. 1.BucharestRomania
  2. 2.Sprachwissenschaftsliches InstitutUniversität BouchumLüdenscheidGermany
  3. 3.TrierGermany

Personalised recommendations