Synonyms
Document segmentation
Definition
Text segmentation is a precursor to text retrieval, automatic summarization, information retrieval (IR); language modeling (LM) and natural language processing (NLP). In written texts, text segmentation is the process of identifying the boundaries between words, phrases, or some other linguistic meaningful units, such as sentences or topics. The term separated from such processing is useful to help humans reading texts, and are mainly used to assist computers to do some artificial processes as fundamental units, such as NLP, and IR.
Historical Background
Natural language processing (NLP) is an important research field. Its primary problem is how to segment text correctly. Various segmentation methods have emerged in the past decades for different kinds of language and applications. Text segmentation is language dependent (different language has its own special problems, which would be introduced later), corpus dependent, character-set...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Beeferman D, Berger A, Lafferty J. Statistical models for text segmentation. Mach Learn. 1999;34(1–3):177–210.
Grefenstette G, Tapanainen P. What is a word, what is a sentence? Problems of tokenization. In: Proceedings of the 3rd Conference on Computational Lexicography and Text Research; 1994. p. 7–10.
Mikheev A. Tagging sentence boundaries. In: Proceedings of the 1st Conference on North American Chapter of the Association for Computational Linguistics; 2000. p. 264–71.
Reynar JC, Marcus MP.Topic segmentation: algorithms and applications. Philadelphia: University of Pennsylvania, Ph.D. Thesis. 1998.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Huang, H., Zhang, B. (2018). Text Segmentation. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_421
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_421
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering