Skip to main content

Managing Probabilistic Entity Extraction

  • Reference work entry
  • First Online:
Encyclopedia of Database Systems
  • 9 Accesses

Synonyms

Probabilistic databases; Probabilistic information extraction; Probabilistic knowledge bases

Definition

Entity extraction is the process of extracting structured entities with corresponding attributes from unstructured text data. For example, a structured paper entity can be extracted from a citation with corresponding author names, title, and journal names. Alternatively, a professor entity can be extracted from his or her homepage with corresponding job title, email, and research interests. The result of entity extraction is a set of structured entity records.

Probabilistic entity extractions are structured entity attributes and records extracted from text each associated with probability of correctness. The probability of correctness is usually generated from the state-of-the-art statistical information extraction models due to the imperfect nature of automatic entity extraction process.

The management of probabilistic entity extractions requires not only scalable execution...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Dalvi N, Suciu D. Efficient Query Evaluation on Probabilistic Databases. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004.

    Chapter  Google Scholar 

  2. Doan A, Ramakrishnan R, Chen F, DeRose P, Lee Y, McCann R, Sayyadian M, Shen W. Community information management. 2006.

    Google Scholar 

  3. Gupta R, Sarawagi S. Curating probabilistic databases from information extraction models. In: Proceedings of the 32nd International Conference on Very Large Data Bases; 2006.

    Google Scholar 

  4. Manning CD, Schütze H. Foundations of statistical natural language processing. Cambridge, MA: MIT Press; 1999.

    MATH  Google Scholar 

  5. Reiss F, Raghavan S, Krishnamurthy R, Zhu H, Vaithyanathan S. An algebraic approach to rule-based information extraction. In: Proceedings of the 24th International Conference on Data Engineering; 2008.

    Google Scholar 

  6. Shen W, Doan A, Naughton J, Ramakrishnan R. Declarative information extraction using datalog with embedded extraction predicates. In: Proceedings of the 33rd International Conference on Very Large Data Bases; 2007.

    Google Scholar 

  7. Suciu D, Olteanu D, Ré C, Koch C. Probabilistic databases, synthesis lectures on data management. San Rafael: Morgan and Claypool; 2011.

    MATH  Google Scholar 

  8. Wang D, Michelakis E, Garofalakis M, Hellerstein J. BayesStore: managing large, uncertain data repositories with probabilistic graphical models. In: Proceedings of the 33rd International Conference on Very Large Data Bases; 2008.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Wang, D.Z. (2018). Managing Probabilistic Entity Extraction. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80762

Download citation

Publish with us

Policies and ethics