Synonyms
Sampling methods; Simulation methods; Stochastic methods
Definition
Uncertain datasets do not contain specific data values but rather representations of probability distributions over “possible worlds,” that is, over possible realizations of the dataset. Queries over such datasets result in a probability distribution over possible answers, where each possible answer corresponds to the query result in one of the possible worlds. In this setting, the goal of query processing is to compute the query-result distribution or perhaps features of this distribution such as marginal probabilities, moments, modes, and quantiles. Basic Monte Carlo methods approximate the query-result distribution by, in essence, repeatedly generating possible-world instances and computing the query answer on each instance. The resulting samples from the query-result distribution are used to estimate quantities of interest using statistical methods. More sophisticated techniques try to improve on the...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Suciu D, Olteanu D, Ré C, Koch C. Probabilistic databases. Synthesis lectures on data management. San Rafael: Morgan & Claypool; 2011.
Dalvi NN, Suciu D. Efficient query evaluation on probabilistic databases. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 864–75.
Robert CP, Casella G. Monte Carlo statistical methods. 2nd ed. New York: Springer; 2010.
Karp RM, Luby M. Monte-Carlo algorithms for enumeration and reliability problems. In: Proceedings of the 24th Annual Symposium on Foundations of Computer Science; 1983. p. 56–64.
Koller D, Friedman N. Probabilistic graphical models: Principles and techniques. Cambridge: MIT Press; 2009.
Sen P, Deshpande A, Getoor L. PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB J. 2009;18(5):1065–90.
Wang DZ, Michelakis E, Garofalakis MN, Hellerstein JM. BayesStore: managing large, uncertain data repositories with probabilistic graphical models. PVLDB. 2008;1(1):340–51.
Wick ML, McCallum A, Miklau G. Scalable probabilistic databases with factor graphs and MCMC. Proc VLDB Endow. 2010;3(1):794–804.
Provan JS, Ball MO. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J Comput. 1983;12(4): 777–88.
Ré C, Dalvi NN, Suciu D. Efficient top-k query evaluation on probabilistic data. In: Proceedings of the 23rd International Conference on Data Engineering; 2007. p. 886–95.
Ilyas IF, Soliman MA. Probabilistic ranking techniques in relational databases. San Rafael: Morgan & Claypool Publishers; 2011.
Singh S, Mayfield C, Mittal S, Prabhakar S, Hambrusch S, Shah R. Orion 2.0: native support for uncertain data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 1239–42.
Jampani R, Xu F, Wu M, Perez LL, Jermaine C, Haas PJ. The Monte Carlo database system: stochastic analysis close to the data. ACM Trans. Database Syst. 2011;36(3):1–41.
Ge T, Grabiner D, Zdonik SB. Monte Carlo query processing of uncertain multidimensional array data. In: Proceedings of the 27th International Conference on Data Engineering; 2011. p. 936–47.
Kennedy O, Koch C. PIP: a database system for great and small expectations. In: Proceedings of the 26th International Conference on Data Engineering; 2010. p. 157–68.
Cai Z, Vagena Z, Perez LL, Arumugam S, Haas PJ, Jermaine CM. Simulation of database-valued Markov chains using SimSQL. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2013. p. 637–48.
Zou L, Peng P, Zhao D. Top-K possible shortest path query over a large uncertain graph. In: Proceedings of the 12th International Conference on Web Information Systems Engineering; 2011. p. 72–86.
Cheng Y, Yuan Y, Wang G, Qiao B, Wang Z. Efficient sampling methods for shortest path query over uncertain graphs. In: Proceedings of the 19th International Conference on Database Systems for Advanced Applications; 2014. p. 124–40.
Li R, Yu JX, Mao R, Jin T. Efficient and accurate query evaluation on uncertain graphs via recursive stratified sampling. In: Proceedings of the 30th International Conference on Data Engineering; 2014. p. 892–903.
Emrich T, Kriegel H, Niedermayer J, Renz M, Suhartha A, Züfle A. Exploration of Monte Carlo based probabilistic query processing in uncertain graphs. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management; 2012. p. 2728–30.
Souihli A, Senellart P. Optimizing approximations of DNF query lineage in probabilistic XML. In: Proceedings of the 29th International Conference on Data Engineering; 2013. p. 721–32.
Zhang Y, Lin X, Zhu G, Zhang W, Lin Q. Efficient rank based KNN query processing over uncertain data. In: Proceedings of the 26th International Conference on Data Engineering; 2010. p. 28–39.
Emrich T, Kriegel H, Mamoulis N, Niedermayer J, Renz M, Züfle A. Reverse-nearest neighbor queries on uncertain moving object trajectories. In: Proceedings of the 19th International Conference on Database Systems for Advanced Applications; 2014. p. 92–107.
Gao ZJ, Luo S, Perez LL, Jermaine C. The BUDS language for distributed Bayesian machine learning. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2017. p. 961–76.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Haas, P.J. (2018). Monte Carlo Methods for Uncertain Data. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80692
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80692
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering