Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Monte Carlo Methods for Uncertain Data

  • Peter J. Haas
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80692

Synonyms

Sampling methods; Simulation methods; Stochastic methods

Definition

Uncertain datasets do not contain specific data values but rather representations of probability distributions over “possible worlds,” that is, over possible realizations of the dataset. Queries over such datasets result in a probability distribution over possible answers, where each possible answer corresponds to the query result in one of the possible worlds. In this setting, the goal of query processing is to compute the query-result distribution or perhaps features of this distribution such as marginal probabilities, moments, modes, and quantiles. Basic Monte Carlo methods approximate the query-result distribution by, in essence, repeatedly generating possible-world instances and computing the query answer on each instance. The resulting samples from the query-result distribution are used to estimate quantities of interest using statistical methods. More sophisticated techniques try to improve on the...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Suciu D, Olteanu D, Ré C, Koch C. Probabilistic databases. Synthesis lectures on data management. San Rafael: Morgan & Claypool; 2011.zbMATHGoogle Scholar
  2. 2.
    Dalvi NN, Suciu D. Efficient query evaluation on probabilistic databases. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 864–75.CrossRefGoogle Scholar
  3. 3.
    Robert CP, Casella G. Monte Carlo statistical methods. 2nd ed. New York: Springer; 2010.Google Scholar
  4. 4.
    Karp RM, Luby M. Monte-Carlo algorithms for enumeration and reliability problems. In: Proceedings of the 24th Annual Symposium on Foundations of Computer Science; 1983. p. 56–64.Google Scholar
  5. 5.
    Koller D, Friedman N. Probabilistic graphical models: Principles and techniques. Cambridge: MIT Press; 2009.zbMATHGoogle Scholar
  6. 6.
    Sen P, Deshpande A, Getoor L. PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB J. 2009;18(5):1065–90.CrossRefGoogle Scholar
  7. 7.
    Wang DZ, Michelakis E, Garofalakis MN, Hellerstein JM. BayesStore: managing large, uncertain data repositories with probabilistic graphical models. PVLDB. 2008;1(1):340–51.Google Scholar
  8. 8.
    Wick ML, McCallum A, Miklau G. Scalable probabilistic databases with factor graphs and MCMC. Proc VLDB Endow. 2010;3(1):794–804.CrossRefGoogle Scholar
  9. 9.
    Provan JS, Ball MO. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J Comput. 1983;12(4): 777–88.MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Ré C, Dalvi NN, Suciu D. Efficient top-k query evaluation on probabilistic data. In: Proceedings of the 23rd International Conference on Data Engineering; 2007. p. 886–95.Google Scholar
  11. 11.
    Ilyas IF, Soliman MA. Probabilistic ranking techniques in relational databases. San Rafael: Morgan & Claypool Publishers; 2011.zbMATHCrossRefGoogle Scholar
  12. 12.
    Singh S, Mayfield C, Mittal S, Prabhakar S, Hambrusch S, Shah R. Orion 2.0: native support for uncertain data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 1239–42.Google Scholar
  13. 13.
    Jampani R, Xu F, Wu M, Perez LL, Jermaine C, Haas PJ. The Monte Carlo database system: stochastic analysis close to the data. ACM Trans. Database Syst. 2011;36(3):1–41.CrossRefGoogle Scholar
  14. 14.
    Ge T, Grabiner D, Zdonik SB. Monte Carlo query processing of uncertain multidimensional array data. In: Proceedings of the 27th International Conference on Data Engineering; 2011. p. 936–47.Google Scholar
  15. 15.
    Kennedy O, Koch C. PIP: a database system for great and small expectations. In: Proceedings of the 26th International Conference on Data Engineering; 2010. p. 157–68.Google Scholar
  16. 16.
    Cai Z, Vagena Z, Perez LL, Arumugam S, Haas PJ, Jermaine CM. Simulation of database-valued Markov chains using SimSQL. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2013. p. 637–48.Google Scholar
  17. 17.
    Zou L, Peng P, Zhao D. Top-K possible shortest path query over a large uncertain graph. In: Proceedings of the 12th International Conference on Web Information Systems Engineering; 2011. p. 72–86.Google Scholar
  18. 18.
    Cheng Y, Yuan Y, Wang G, Qiao B, Wang Z. Efficient sampling methods for shortest path query over uncertain graphs. In: Proceedings of the 19th International Conference on Database Systems for Advanced Applications; 2014. p. 124–40.CrossRefGoogle Scholar
  19. 19.
    Li R, Yu JX, Mao R, Jin T. Efficient and accurate query evaluation on uncertain graphs via recursive stratified sampling. In: Proceedings of the 30th International Conference on Data Engineering; 2014. p. 892–903.Google Scholar
  20. 20.
    Emrich T, Kriegel H, Niedermayer J, Renz M, Suhartha A, Züfle A. Exploration of Monte Carlo based probabilistic query processing in uncertain graphs. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management; 2012. p. 2728–30.Google Scholar
  21. 21.
    Souihli A, Senellart P. Optimizing approximations of DNF query lineage in probabilistic XML. In: Proceedings of the 29th International Conference on Data Engineering; 2013. p. 721–32.Google Scholar
  22. 22.
    Zhang Y, Lin X, Zhu G, Zhang W, Lin Q. Efficient rank based KNN query processing over uncertain data. In: Proceedings of the 26th International Conference on Data Engineering; 2010. p. 28–39.Google Scholar
  23. 23.
    Emrich T, Kriegel H, Mamoulis N, Niedermayer J, Renz M, Züfle A. Reverse-nearest neighbor queries on uncertain moving object trajectories. In: Proceedings of the 19th International Conference on Database Systems for Advanced Applications; 2014. p. 92–107.CrossRefGoogle Scholar
  24. 24.
    Gao ZJ, Luo S, Perez LL, Jermaine C. The BUDS language for distributed Bayesian machine learning. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2017. p. 961–76.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.IBM Almaden Research CenterSan JoseUSA