Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Column Segmentation

  • Sunita Sarawagi
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_597

Synonyms

Information extraction; Record extraction; Text segmentation

Definition

The term column segmentation refers to the segmentation of an unstructured text string into segments such that each segment is a column of a structured record.

As an example, consider a text string S = “ 18100 New Hampshire Ave. Silver Spring, MD 20861” representing an unstructured form of an Address record. Let the columns of this record be House number, Street name, City name, State, Zip and Country. In column segmentation, the goal is to segment S and assign a column label to each segment so as to get an output of the form:
This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Agichtein E, Ganti V. Mining reference tables for automatic text segmentation. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2004. p. 20–9.Google Scholar
  2. 2.
    Aldelberg B. Nodose: a tool for semi-automatically extracting structured and semi-structured data from text documents. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 283–94.Google Scholar
  3. 3.
    Borkar VR, Deshmukh K, Sarawagi S. Automatic text segmentation for extracting structured records. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2001. p. 175–86.Google Scholar
  4. 4.
    Chandel A, Nagesh PC, Sarawagi S. Efficient batch top-k search for dictionary-based entity recognition. In: Proceedings of the 22nd International Conference on Data Engineering; 2006.Google Scholar
  5. 5.
    Cunningham H. Information extraction, automatic. In: Encyclopedia of Language and Linguistics. 2nd ed. 2005.Google Scholar
  6. 6.
    Kushmerick N, Weld DS, Doorenbos R. Wrapper induction for information extraction. In: Proceedings of the 15th International Joint Conference on AI; 1997. p. 729–37.Google Scholar
  7. 7.
    Lafferty J, McCallum A, Pereira F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning; 2001. p. 282–9.Google Scholar
  8. 8.
    Peng F, McCallum A. Accurate information extraction from research papers using conditional random fields. In: Proceedings of the Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics; 2004. p. 329–36.Google Scholar
  9. 9.
    Ratnaparkhi A. Learning to parse natural language with maximum entropy models. Mach Learn. 1999;34:151.zbMATHCrossRefGoogle Scholar
  10. 10.
    Sarawagi S, Cohen WW. Semi-markov conditional random fields for information extraction. In: Advances in Neural Information Processing Systems. 17, 2004.Google Scholar
  11. 11.
    Seymore K, McCallum A, Rosenfeld R. Learning Hidden Markov Model structure for information extraction. In: Papers from the AAAI-99 Workshop on Machine Learning for Information Extraction; 1999. p. 37–42.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.IIT BombayMumbaiIndia

Section editors and affiliations

  • Venkatesh Ganti
    • 1
  1. 1.Microsoft ResearchMicrosoft CorporationRedmondUSA