Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Managing Probabilistic Entity Extraction

  • Daisy Zhe Wang
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80762


Probabilistic databases; Probabilistic information extraction; Probabilistic knowledge bases


Entity extraction is the process of extracting structured entities with corresponding attributes from unstructured text data. For example, a structured paper entity can be extracted from a citation with corresponding author names, title, and journal names. Alternatively, a professor entity can be extracted from his or her homepage with corresponding job title, email, and research interests. The result of entity extraction is a set of structured entity records.

Probabilistic entity extractions are structured entity attributes and records extracted from text each associated with probability of correctness. The probability of correctness is usually generated from the state-of-the-art statistical information extraction models due to the imperfect nature of automatic entity extraction process.

The management of probabilistic entity extractions requires not only scalable execution...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Dalvi N, Suciu D. Efficient Query Evaluation on Probabilistic Databases. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004.CrossRefGoogle Scholar
  2. 2.
    Doan A, Ramakrishnan R, Chen F, DeRose P, Lee Y, McCann R, Sayyadian M, Shen W. Community information management. 2006.Google Scholar
  3. 3.
    Gupta R, Sarawagi S. Curating probabilistic databases from information extraction models. In: Proceedings of the 32nd International Conference on Very Large Data Bases; 2006.Google Scholar
  4. 4.
    Manning CD, Schütze H. Foundations of statistical natural language processing. Cambridge, MA: MIT Press; 1999.zbMATHGoogle Scholar
  5. 5.
    Reiss F, Raghavan S, Krishnamurthy R, Zhu H, Vaithyanathan S. An algebraic approach to rule-based information extraction. In: Proceedings of the 24th International Conference on Data Engineering; 2008.Google Scholar
  6. 6.
    Shen W, Doan A, Naughton J, Ramakrishnan R. Declarative information extraction using datalog with embedded extraction predicates. In: Proceedings of the 33rd International Conference on Very Large Data Bases; 2007.Google Scholar
  7. 7.
    Suciu D, Olteanu D, Ré C, Koch C. Probabilistic databases, synthesis lectures on data management. San Rafael: Morgan and Claypool; 2011.zbMATHGoogle Scholar
  8. 8.
    Wang D, Michelakis E, Garofalakis M, Hellerstein J. BayesStore: managing large, uncertain data repositories with probabilistic graphical models. In: Proceedings of the 33rd International Conference on Very Large Data Bases; 2008.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Daisy Zhe Wang
    • 1
  1. 1.Computer and Information Science and Engineering (CISE)University of FloridaGainesvilleUSA