In many scientific disciplines phenomena are observed which are usually described using one of the various versions of the so-called Zipf’s Law. As no single formula could be found so far which would be able to sufficiently fit to all data more and more variants (modifications, ad-hoc formulas, derivations etc.) are created, each of which agrees quite well with some given data sets but fail with others. The present paper proposes a new approach to the problem, based on the assumption that every data set which displays a Zipf-like structure is composed of several system components. A corresponding model is presented and tested on data from 100 texts from 20 languages.
KeywordsZipf’s law Rank-frequency distribution Synthetic language
Unable to display preview. Download preview PDF.
- Bak P.: How Nature Works. The Science of Self-Organized Criticality. Copernicus Springer, New York (1999)Google Scholar
- Chitashvili R.J., Baayen R.H.: Word frequency distributions of text and corpora as large number of rare event distributions. In: Altmann, G., Hřebíček, L. (eds) Quantitative Text Analysis, pp. 54–135. WVT, Trier (1993)Google Scholar
- Gell-Mann M.: The Quark and the Jaguar. Freeman, New York (1994)Google Scholar
- Ha, L.Q., Smith, F.J.: Zipf and type-token rules for the English and Irish language. http://www.nslij-genetics.org/wli/zipf/ha04.pdf. Cited 10 October 2007 (2004)
- Mandelbrot, B.: An information theory of the statistical structure of language. In: Jackson, W. (ed.) Communication Theory, pp. 486–502. Butterworth, London (1953)Google Scholar
- Miller G.A., Newman E.B., Friedman E.A.: Length-frequency statistics for written English. Information and Control 1(1958), 370–389 (1968)Google Scholar
- Popescu, I.-I., Altmann, G.: Hapax legomena and language typology. J. Quant. Linguist. 15, 370–378 (2008)Google Scholar
- Popescu, I.-I., Vidya, M.N., Uhlířová, L., Pustet, R., Mehler, A., Mačutek, J., Krupa, V., Köhler, R., Jayaram, B.D., Grzybek, P., Altmann, G.: Word Frequency Studies. Berlin, New York: Mouton de Gruyter (2008) (in print)Google Scholar
- Rapoport A.: Zipf’s Law Re-visited. In: Guiter, H., Arapov, M.V.(eds) Studies on Zipf’s Law, pp. 1–28. Brockmeyer, Bochum (1982)Google Scholar
- Simon H.: On a class of skew distribution functions. Biometrika 42, 435–440 (1955)Google Scholar
- Wimmer G., Altmann G.: Unified derivation of some linguistic laws. In: Köhler, R., Altmann, G., Piotrowski, R.G.(eds) Quantitative Linguistics. An International Handbook, pp. 791–807. De Gruyter, Berlin, New York (2005)Google Scholar