Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Managing Probabilistic Entity Extraction

  • Daisy Zhe Wang
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80762


Probabilistic databases; Probabilistic information extraction; Probabilistic knowledge bases


Entity extraction is the process of extracting structured entities with corresponding attributes from unstructured text data. For example, a structured paper entity can be extracted from a citation with corresponding author names, title, and journal names. Alternatively, a professor entity can be extracted from his or her homepage with corresponding job title, email, and research interests. The result of entity extraction is a set of structured entity records.

Probabilistic entity extractions are structured entity attributes and records extracted from text each associated with probability of correctness. The probability of correctness is usually generated from the state-of-the-art statistical information extraction models due to the imperfect nature of automatic entity extraction process.

The management of probabilistic entity extractions requires not only scalable execution...

