Abstract
Zipf’s law is one of the most famous generalizations in text statistics. S. Mizutani introduced a new representation which approximates a text more precisely. We show that his representation is well approximated by the ratio N/L, where N is the number of words and L the number of different words in the target text. This ratio was applied to several examples of text; computer language, non-native English, and standard English. We found that his representation, which is approximated by the above ratio, is modified into a better measure of constraint (size of the domain expressed by the text) of the sentence, (math). This measure roughly classifies English text in the order of constraint. Combined with another parameter R, the correlation coefficient between word-length and logarithmic-scaled rank order, a stronger classifier can be proposed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Booth, A.D. (1967): A law of Occurrences for Words of Low Frequency. In: Information and Control 10(4), 386–393.
Carroll, J.; Davies, P.; Richman, B. (1971): Word Frequency Book. Houghton Mifflin Company.
Chen, Ye-Sho (1989): Zipf’s Laws in Text Modeling. In: Int. J. General Systems 15, 233–252.
Bahl, L.R.; Jelinek, F.; Mercer, R.L. (1983): A maximum likelihood approach to continuous speech recognition. In: IEEE Trans. Pattern Anal Machine Intel. vol. PAMI-5 (Mar.), 179–190.
Hoermann, H. (1979): Psycholinguistics. Translated by H.H. Stern and P. Leppmann. New York: Springer-Verlag.
Howes, D. (1957): On the relations between the Probability of Word as an Association and in General Linguistic Usage. In: J. of Abnormal and Social Psychology 54, 75–85.
Katsikas, A. A.; Nicolis, J.S. (1990): Chaotic Dynamics of Generating Markov Partitions and Linguistic Sequences Mimicking Zipf’s Law. In: Il Nuovo Cimento 12D, (2), 177.
Kennedy, J.; Neville, A. (1986): Basic Statistical Method for Engineers and Scientists. Harper and Row Publishers Inc.
Mandelbrot, B.B. (1982): Fractal Geometry of Nature, New York: W.H.Freeman and Co.
Mizutani, S. (1983): Lecture on Japanese. Tokyo: Asakura.
Nicolis, J. (1956): Dynamics of Hierarchical Systems. Berlin: Springer-Verlag.
Tankard, J. (1986): The literary detective. In: Byte (February), 231.
Zipf, G.K. (1965): The psycho-biology of language. The MIT Press. Originally printed by Houghton Mifflin Co., 1935.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1993 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Ejiri, K., Smith, A.E. (1993). Proposal of a New ‘Constraint Measure’ for Text. In: Köhler, R., Rieger, B.B. (eds) Contributions to Quantitative Linguistics. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-1769-2_13
Download citation
DOI: https://doi.org/10.1007/978-94-011-1769-2_13
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-4777-7
Online ISBN: 978-94-011-1769-2
eBook Packages: Springer Book Archive