Living Reference Work Entry

Encyclopedia of Database Systems

pp 1-9

Date: Latest Version

Web Information Extraction

  • Laura ChiticariuAffiliated withIBM Almaden Research Center Email author 
  • , Marina DanilevskyAffiliated withIBM Almaden Research Center
  • , Howard HoAffiliated withIBM Almaden Research Center
  • , Rajasekar KrishnamurthyAffiliated withIBM Almaden Research Center
  • , Yunyao LiAffiliated withIBM Almaden Research Center
  • , Sriram RaghavanAffiliated withIBM Almaden Research Center
  • , Frederick ReissAffiliated withIBM Almaden Research Center
  • , Shivakumar VaithyanathanAffiliated withIBM Almaden Research Center
  • , Huaiyu ZhuAffiliated withIBM Almaden Research Center

Definition

Information extraction (IE) is the process of automatically extracting structured pieces of information from unstructured or semi-structured text documents. Classical problems in information extraction include named-entity recognition (identifying mentions of persons, places, organizations, etc.) and relationship extraction (identifying mentions of relationships between such named entities). Web information extraction is the application of IE techniques to process the vast amounts of unstructured content on the Web. Due to the nature of the content on the Web, in addition to named-entity and relationship extraction, there is growing interest in more complex tasks such as extraction of reviews, opinions, and sentiments.

Historical Background

Historically, information extraction was studied by the Natural Language Processing community in the context of identifying organizations, locations, and person names in news articles and militar ...

This is an excerpt from the content