Skip to main content

Proposal of a New ‘Constraint Measure’ for Text

  • Chapter
Contributions to Quantitative Linguistics

Abstract

Zipf’s law is one of the most famous generalizations in text statistics. S. Mizutani introduced a new representation which approximates a text more precisely. We show that his representation is well approximated by the ratio N/L, where N is the number of words and L the number of different words in the target text. This ratio was applied to several examples of text; computer language, non-native English, and standard English. We found that his representation, which is approximated by the above ratio, is modified into a better measure of constraint (size of the domain expressed by the text) of the sentence, (math). This measure roughly classifies English text in the order of constraint. Combined with another parameter R, the correlation coefficient between word-length and logarithmic-scaled rank order, a stronger classifier can be proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Booth, A.D. (1967): A law of Occurrences for Words of Low Frequency. In: Information and Control 10(4), 386–393.

    Google Scholar 

  • Carroll, J.; Davies, P.; Richman, B. (1971): Word Frequency Book. Houghton Mifflin Company.

    Google Scholar 

  • Chen, Ye-Sho (1989): Zipf’s Laws in Text Modeling. In: Int. J. General Systems 15, 233–252.

    Google Scholar 

  • Bahl, L.R.; Jelinek, F.; Mercer, R.L. (1983): A maximum likelihood approach to continuous speech recognition. In: IEEE Trans. Pattern Anal Machine Intel. vol. PAMI-5 (Mar.), 179–190.

    Google Scholar 

  • Hoermann, H. (1979): Psycholinguistics. Translated by H.H. Stern and P. Leppmann. New York: Springer-Verlag.

    Book  Google Scholar 

  • Howes, D. (1957): On the relations between the Probability of Word as an Association and in General Linguistic Usage. In: J. of Abnormal and Social Psychology 54, 75–85.

    Google Scholar 

  • Katsikas, A. A.; Nicolis, J.S. (1990): Chaotic Dynamics of Generating Markov Partitions and Linguistic Sequences Mimicking Zipf’s Law. In: Il Nuovo Cimento 12D, (2), 177.

    Google Scholar 

  • Kennedy, J.; Neville, A. (1986): Basic Statistical Method for Engineers and Scientists. Harper and Row Publishers Inc.

    Google Scholar 

  • Mandelbrot, B.B. (1982): Fractal Geometry of Nature, New York: W.H.Freeman and Co.

    Google Scholar 

  • Mizutani, S. (1983): Lecture on Japanese. Tokyo: Asakura.

    Google Scholar 

  • Nicolis, J. (1956): Dynamics of Hierarchical Systems. Berlin: Springer-Verlag.

    Google Scholar 

  • Tankard, J. (1986): The literary detective. In: Byte (February), 231.

    Google Scholar 

  • Zipf, G.K. (1965): The psycho-biology of language. The MIT Press. Originally printed by Houghton Mifflin Co., 1935.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Ejiri, K., Smith, A.E. (1993). Proposal of a New ‘Constraint Measure’ for Text. In: Köhler, R., Rieger, B.B. (eds) Contributions to Quantitative Linguistics. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-1769-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-94-011-1769-2_13

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-4777-7

  • Online ISBN: 978-94-011-1769-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics