Skip to main content

SmallWorlds of Natural Language

  • Chapter
  • First Online:
Structure Discovery in Natural Language
  • 1274 Accesses

Abstract

In this chapter, power-law distributions and Small World Graphs originating from natural language data are examined in the fashion of Quantitative Linguistics. After giving several data sources that exhibit power-law distributions in rank-frequency in Section 3.1, graphs with Small World properties in language data are discussed in Section 3.2. We shall see that these characteristics are omnipresent in language data, and we should be aware of them when designing Structure Discovery processes. When knowing e.g. that a few hundreds of words make the bulk of words in a text, it is safe to use only these as contextual features without losing a lot of text coverage. Knowing that word co-occurrence networks possess the scale-free Small World property has implications for clustering these networks. An interesting aspect is whether these characteristics are only inherent to real natural language data or whether they can be produced with generators of linear sequences in a much simpler way than our intuition about language complexity would suggest –in other words, we shall see how distinctive these characteristics are with respect to tests deciding whether a given sequence is natural language or not. Finally, an emergent random text generation model that captures many of the characteristics of natural language is defined and quantitatively verified in Section 3.3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Biemann, C. (2012). SmallWorlds of Natural Language. In: Structure Discovery in Natural Language. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25923-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25923-4_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25922-7

  • Online ISBN: 978-3-642-25923-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics