Skip to main content

Text Segmentation

  • Reference work entry
  • First Online:
  • 41 Accesses

Synonyms

Document segmentation

Definition

Text segmentation is a precursor to text retrieval, automatic summarization, information retrieval (IR); language modeling (LM) and natural language processing (NLP). In written texts, text segmentation is the process of identifying the boundaries between words, phrases, or some other linguistic meaningful units, such as sentences or topics. The term separated from such processing is useful to help humans reading texts, and are mainly used to assist computers to do some artificial processes as fundamental units, such as NLP, and IR.

Historical Background

Natural language processing (NLP) is an important research field. Its primary problem is how to segment text correctly. Various segmentation methods have emerged in the past decades for different kinds of language and applications. Text segmentation is language dependent (different language has its own special problems, which would be introduced later), corpus dependent, character-set...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Beeferman D, Berger A, Lafferty J. Statistical models for text segmentation. Mach Learn. 1999;34(1–3):177–210.

    Article  MATH  Google Scholar 

  2. Grefenstette G, Tapanainen P. What is a word, what is a sentence? Problems of tokenization. In: Proceedings of the 3rd Conference on Computational Lexicography and Text Research; 1994. p. 7–10.

    Google Scholar 

  3. Mikheev A. Tagging sentence boundaries. In: Proceedings of the 1st Conference on North American Chapter of the Association for Computational Linguistics; 2000. p. 264–71.

    Google Scholar 

  4. Reynar JC, Marcus MP.Topic segmentation: algorithms and applications. Philadelphia: University of Pennsylvania, Ph.D. Thesis. 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haoda Huang .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Huang, H., Zhang, B. (2018). Text Segmentation. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_421

Download citation

Publish with us

Policies and ethics