Skip to main content

Query Processing over Uncertain Data

  • Reference work entry
  • First Online:
Encyclopedia of Database Systems
  • 32 Accesses

Synonyms

Query processing over probabilistic data

Definition

An uncertain or probabilistic database is defined as a probability distribution over a set of deterministic database instances called possible worlds.

In the classical deterministic setting, the query processing problem is to compute the set of tuples representing the answer of a given query on a given database. In the probabilistic setting, this problem becomes the computation of all pairs (t, p), where the tuple t is in the query answer in some random world of the input probabilistic database with probability p.

Scientific Fundamentals

Representation of Uncertain Data

All aspects of query processing over uncertain data, and in particular its complexity and existing techniques, highly depend on data representation. Since it is prohibitively expensive to explicitly represent the extremely large set of all possible worlds of a probabilistic database, one has to settle for succinct data representations. Three such...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amsterdamer Y, Deutch D, Tannen V. Provenance for aggregate queries. In: Proceedings of the 30th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2011. p. 153–64.

    Google Scholar 

  2. Beame P, Li J, Roy S, Suciu D. Counting of query expressions: limitations of propositional methods. In: Proceedings of the 17th International Conference on Database Theory; 2014. p. 177–88.

    Google Scholar 

  3. Benedikt M, Kharlamov E, Olteanu D, Senellart P. Probabilistic XML via Markov Chains. Proc VLDB Endow. 2010;3(1-2):770–81.

    Article  Google Scholar 

  4. Chen L, Lian X. Query processing over uncertain databases. Synthesis lectures on data management. San Rafael: Morgan & Claypool Publishers; 2012.

    Google Scholar 

  5. Dalvi N, Suciu D. Efficient query evaluation on probabilistic databases. VLDB J. 2007;16(4):523–44.

    Article  Google Scholar 

  6. Dalvi NN, Suciu D. Efficient query evaluation on probabilistic databases. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 864–75.

    Chapter  Google Scholar 

  7. Dalvi NN, Suciu D. The dichotomy of probabilistic inference for unions of onjunctive queries. J ACM. 2012;59(6):30:1–30:87. https://doi.org/10.1145/2395116.2395119.

    Article  MathSciNet  MATH  Google Scholar 

  8. Dong X, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W. Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2014. p. 601–10.

    Google Scholar 

  9. Dylla M, Miliaraki I, Theobald M. Top-k query processing in probabilistic databases with non-materialized views. In: Proceedings of the 29th International Conference on Data Engineering; 2013. p. 122–33.

    Google Scholar 

  10. Fink R, Hogue A, Olteanu D, Rath S. SPROUT2: a squared query engine for uncertain web data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2011. p. 1299–302.

    Google Scholar 

  11. Fink R, Huang J, Olteanu D. Anytime approximation in probabilistic databases. VLDB J. 2013;22(6): 823–48.

    Article  Google Scholar 

  12. Fink R, Olteanu D. Dichotomies for queries with negation in probabilistic databases. ACM Trans Database Syst. 2016;41(1):4.

    Article  MathSciNet  Google Scholar 

  13. Fuhr N, Rölleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans Information Syst. 1997;15(1):32–66.

    Article  Google Scholar 

  14. Gatterbauer W, Suciu D. Oblivious bounds on the probability of boolean functions. ACM Trans Database Syst. 2014;39(1):5.

    Article  MathSciNet  MATH  Google Scholar 

  15. Grädel E, Gurevich Y, Hirsch C. The complexity of query reliability. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 1998. p. 227–34.

    Google Scholar 

  16. Green TJ, Karvounarakis G, Tannen V. Provenance semirings. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2007. p. 31–40.

    Google Scholar 

  17. Huang J, Antova L, Koch C, Olteanu D. MayBMS: a probabilistic database management system. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2009. p. 1071–74.

    Google Scholar 

  18. Ilyas IF, Soliman MA. Probabilistic ranking techniques in relational databases. Synthesis lectures on data management. San Rafael: Morgan & Claypool Publishers; 2011.

    MATH  Google Scholar 

  19. Imieliński T, Lipski W Jr. Incomplete information in relational databases. J ACM. 1984;31(4):761–91.

    Article  MathSciNet  MATH  Google Scholar 

  20. Jampani R, Xu F, Wu M, Perez LL, Jermaine C, Haas PJ. The Monte Carlo database system: stochastic analysis close to the data. ACM Trans Database Syst. 2011;36(3):18.

    Article  Google Scholar 

  21. Jha AK, Suciu D. Knowledge compilation meets database theory: compiling queries to decision diagrams. Theory Comput Syst. 2013;52(3): 403–40.

    Article  MathSciNet  MATH  Google Scholar 

  22. Kanagal B, Li J, Deshpande A. Sensitivity analysis and explanations for robust query evaluation in probabilistic databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2011. p. 841–52.

    Google Scholar 

  23. Karp RM, Luby M, Madras N. Monte-Carlo approximation algorithms for enumeration problems. J Algorithms. 1989;10(3):429–48.

    Article  MathSciNet  MATH  Google Scholar 

  24. Kimelfeld B, Senellart P. Probabilistic XML: models and complexity. In: Advances in Probability Database for Uncertain Information Management; 2013. p. 39–66.

    Google Scholar 

  25. Lian X, Chen L. Efficient query answering in probabilistic RDF graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2011. p. 157–68.

    Google Scholar 

  26. Olteanu D, Huang J, Koch C. SPROUT: lazy vs. eager query plans for tuple-independent probabilistic databases. In: Proceedings of the 25th International Conference on Data Engineering; 2009. p. 640–51.

    Google Scholar 

  27. Olteanu D, Wen H. Ranking query answers in probabilistic databases: complexity and efficient algorithms. In: Proceedings of the 28th International Conference on Data Engineering; 2012. p. 282–93.

    Google Scholar 

  28. Ré C, Dalvi NN, Suciu D. Query evaluation on probabilistic databases. IEEE Data Eng Bull. 2006;29(1):25–31.

    Google Scholar 

  29. Ré C, Letchner J, Balazinksa M, Suciu D. Event queries on correlated probabilistic streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 715–28.

    Google Scholar 

  30. Ré C, Suciu D. The trichotomy of having queries on a probabilistic database. VLDB J. 2009;18(5): 1091–116.

    Article  Google Scholar 

  31. Sen P, Deshpande A, Getoor L. PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB J. 2009;18(5):1065–90.

    Article  Google Scholar 

  32. Singh S, Mayfield C, Mittal S, Prabhakar S, Hambrusch SE, Shah R. Orion 2.0: native support for uncertain data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 1239–42.

    Google Scholar 

  33. Souihli A, Senellart P. Optimizing approximations of DNF query lineage in probabilistic XML. In: Proceedings of the 29th International Conference on Data Engineering; 2013. p. 721–32.

    Google Scholar 

  34. Suciu D, Olteanu D, Ré C, Koch C. Probabilistic databases. Synthesis lectures on data management. San Rafael: Morgan & Claypool Publishers; 2011.

    MATH  Google Scholar 

  35. Vazirani VV. Approximation algorithms. Springer; 2001. ISBN:978-3-540-65367-7.

    Google Scholar 

  36. Widom J. Trio: a system for integrated management of data, accuracy, and lineage. In: Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research; 2005. p. 262–76.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nilesh Dalvi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Dalvi, N., Olteanu, D. (2018). Query Processing over Uncertain Data. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80689

Download citation

Publish with us

Policies and ethics