Synonyms
Query processing over probabilistic data
Definition
An uncertain or probabilistic database is defined as a probability distribution over a set of deterministic database instances called possible worlds.
In the classical deterministic setting, the query processing problem is to compute the set of tuples representing the answer of a given query on a given database. In the probabilistic setting, this problem becomes the computation of all pairs (t, p), where the tuple t is in the query answer in some random world of the input probabilistic database with probability p.
Scientific Fundamentals
Representation of Uncertain Data
All aspects of query processing over uncertain data, and in particular its complexity and existing techniques, highly depend on data representation. Since it is prohibitively expensive to explicitly represent the extremely large set of all possible worlds of a probabilistic database, one has to settle for succinct data representations. Three such...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amsterdamer Y, Deutch D, Tannen V. Provenance for aggregate queries. In: Proceedings of the 30th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2011. p. 153–64.
Beame P, Li J, Roy S, Suciu D. Counting of query expressions: limitations of propositional methods. In: Proceedings of the 17th International Conference on Database Theory; 2014. p. 177–88.
Benedikt M, Kharlamov E, Olteanu D, Senellart P. Probabilistic XML via Markov Chains. Proc VLDB Endow. 2010;3(1-2):770–81.
Chen L, Lian X. Query processing over uncertain databases. Synthesis lectures on data management. San Rafael: Morgan & Claypool Publishers; 2012.
Dalvi N, Suciu D. Efficient query evaluation on probabilistic databases. VLDB J. 2007;16(4):523–44.
Dalvi NN, Suciu D. Efficient query evaluation on probabilistic databases. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 864–75.
Dalvi NN, Suciu D. The dichotomy of probabilistic inference for unions of onjunctive queries. J ACM. 2012;59(6):30:1–30:87. https://doi.org/10.1145/2395116.2395119.
Dong X, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W. Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2014. p. 601–10.
Dylla M, Miliaraki I, Theobald M. Top-k query processing in probabilistic databases with non-materialized views. In: Proceedings of the 29th International Conference on Data Engineering; 2013. p. 122–33.
Fink R, Hogue A, Olteanu D, Rath S. SPROUT2: a squared query engine for uncertain web data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2011. p. 1299–302.
Fink R, Huang J, Olteanu D. Anytime approximation in probabilistic databases. VLDB J. 2013;22(6): 823–48.
Fink R, Olteanu D. Dichotomies for queries with negation in probabilistic databases. ACM Trans Database Syst. 2016;41(1):4.
Fuhr N, Rölleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans Information Syst. 1997;15(1):32–66.
Gatterbauer W, Suciu D. Oblivious bounds on the probability of boolean functions. ACM Trans Database Syst. 2014;39(1):5.
Grädel E, Gurevich Y, Hirsch C. The complexity of query reliability. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 1998. p. 227–34.
Green TJ, Karvounarakis G, Tannen V. Provenance semirings. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2007. p. 31–40.
Huang J, Antova L, Koch C, Olteanu D. MayBMS: a probabilistic database management system. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2009. p. 1071–74.
Ilyas IF, Soliman MA. Probabilistic ranking techniques in relational databases. Synthesis lectures on data management. San Rafael: Morgan & Claypool Publishers; 2011.
Imieliński T, Lipski W Jr. Incomplete information in relational databases. J ACM. 1984;31(4):761–91.
Jampani R, Xu F, Wu M, Perez LL, Jermaine C, Haas PJ. The Monte Carlo database system: stochastic analysis close to the data. ACM Trans Database Syst. 2011;36(3):18.
Jha AK, Suciu D. Knowledge compilation meets database theory: compiling queries to decision diagrams. Theory Comput Syst. 2013;52(3): 403–40.
Kanagal B, Li J, Deshpande A. Sensitivity analysis and explanations for robust query evaluation in probabilistic databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2011. p. 841–52.
Karp RM, Luby M, Madras N. Monte-Carlo approximation algorithms for enumeration problems. J Algorithms. 1989;10(3):429–48.
Kimelfeld B, Senellart P. Probabilistic XML: models and complexity. In: Advances in Probability Database for Uncertain Information Management; 2013. p. 39–66.
Lian X, Chen L. Efficient query answering in probabilistic RDF graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2011. p. 157–68.
Olteanu D, Huang J, Koch C. SPROUT: lazy vs. eager query plans for tuple-independent probabilistic databases. In: Proceedings of the 25th International Conference on Data Engineering; 2009. p. 640–51.
Olteanu D, Wen H. Ranking query answers in probabilistic databases: complexity and efficient algorithms. In: Proceedings of the 28th International Conference on Data Engineering; 2012. p. 282–93.
Ré C, Dalvi NN, Suciu D. Query evaluation on probabilistic databases. IEEE Data Eng Bull. 2006;29(1):25–31.
Ré C, Letchner J, Balazinksa M, Suciu D. Event queries on correlated probabilistic streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 715–28.
Ré C, Suciu D. The trichotomy of having queries on a probabilistic database. VLDB J. 2009;18(5): 1091–116.
Sen P, Deshpande A, Getoor L. PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB J. 2009;18(5):1065–90.
Singh S, Mayfield C, Mittal S, Prabhakar S, Hambrusch SE, Shah R. Orion 2.0: native support for uncertain data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 1239–42.
Souihli A, Senellart P. Optimizing approximations of DNF query lineage in probabilistic XML. In: Proceedings of the 29th International Conference on Data Engineering; 2013. p. 721–32.
Suciu D, Olteanu D, Ré C, Koch C. Probabilistic databases. Synthesis lectures on data management. San Rafael: Morgan & Claypool Publishers; 2011.
Vazirani VV. Approximation algorithms. Springer; 2001. ISBN:978-3-540-65367-7.
Widom J. Trio: a system for integrated management of data, accuracy, and lineage. In: Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research; 2005. p. 262–76.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Dalvi, N., Olteanu, D. (2018). Query Processing over Uncertain Data. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80689
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80689
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering