Synonyms
Information extraction; Record extraction; Text segmentation
Definition
The term column segmentation refers to the segmentation of an unstructured text string into segments such that each segment is a column of a structured record.
As an example, consider a text string S = “18100 New Hampshire Ave. Silver Spring, MD 20861” representing an unstructured form of an Address record. Let the columns of this record be House number, Street name, City name, State, Zip and Country. In column segmentation, the goal is to segment S and assign a column label to each segment so as to get an output of the form:
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Agichtein E, Ganti V. Mining reference tables for automatic text segmentation. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2004. p. 20–9.
Aldelberg B. Nodose: a tool for semi-automatically extracting structured and semi-structured data from text documents. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 283–94.
Borkar VR, Deshmukh K, Sarawagi S. Automatic text segmentation for extracting structured records. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2001. p. 175–86.
Chandel A, Nagesh PC, Sarawagi S. Efficient batch top-k search for dictionary-based entity recognition. In: Proceedings of the 22nd International Conference on Data Engineering; 2006.
Cunningham H. Information extraction, automatic. In: Encyclopedia of Language and Linguistics. 2nd ed. 2005.
Kushmerick N, Weld DS, Doorenbos R. Wrapper induction for information extraction. In: Proceedings of the 15th International Joint Conference on AI; 1997. p. 729–37.
Lafferty J, McCallum A, Pereira F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning; 2001. p. 282–9.
Peng F, McCallum A. Accurate information extraction from research papers using conditional random fields. In: Proceedings of the Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics; 2004. p. 329–36.
Ratnaparkhi A. Learning to parse natural language with maximum entropy models. Mach Learn. 1999;34:151.
Sarawagi S, Cohen WW. Semi-markov conditional random fields for information extraction. In: Advances in Neural Information Processing Systems. 17, 2004.
Seymore K, McCallum A, Rosenfeld R. Learning Hidden Markov Model structure for information extraction. In: Papers from the AAAI-99 Workshop on Machine Learning for Information Extraction; 1999. p. 37–42.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Sarawagi, S. (2018). Column Segmentation. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_597
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_597
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering