Skip to main content

Monte Carlo Methods for Uncertain Data

  • Reference work entry
  • First Online:

Synonyms

Sampling methods; Simulation methods; Stochastic methods

Definition

Uncertain datasets do not contain specific data values but rather representations of probability distributions over “possible worlds,” that is, over possible realizations of the dataset. Queries over such datasets result in a probability distribution over possible answers, where each possible answer corresponds to the query result in one of the possible worlds. In this setting, the goal of query processing is to compute the query-result distribution or perhaps features of this distribution such as marginal probabilities, moments, modes, and quantiles. Basic Monte Carlo methods approximate the query-result distribution by, in essence, repeatedly generating possible-world instances and computing the query answer on each instance. The resulting samples from the query-result distribution are used to estimate quantities of interest using statistical methods. More sophisticated techniques try to improve on the...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Suciu D, Olteanu D, Ré C, Koch C. Probabilistic databases. Synthesis lectures on data management. San Rafael: Morgan & Claypool; 2011.

    MATH  Google Scholar 

  2. Dalvi NN, Suciu D. Efficient query evaluation on probabilistic databases. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 864–75.

    Chapter  Google Scholar 

  3. Robert CP, Casella G. Monte Carlo statistical methods. 2nd ed. New York: Springer; 2010.

    Google Scholar 

  4. Karp RM, Luby M. Monte-Carlo algorithms for enumeration and reliability problems. In: Proceedings of the 24th Annual Symposium on Foundations of Computer Science; 1983. p. 56–64.

    Google Scholar 

  5. Koller D, Friedman N. Probabilistic graphical models: Principles and techniques. Cambridge: MIT Press; 2009.

    MATH  Google Scholar 

  6. Sen P, Deshpande A, Getoor L. PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB J. 2009;18(5):1065–90.

    Article  Google Scholar 

  7. Wang DZ, Michelakis E, Garofalakis MN, Hellerstein JM. BayesStore: managing large, uncertain data repositories with probabilistic graphical models. PVLDB. 2008;1(1):340–51.

    Google Scholar 

  8. Wick ML, McCallum A, Miklau G. Scalable probabilistic databases with factor graphs and MCMC. Proc VLDB Endow. 2010;3(1):794–804.

    Article  Google Scholar 

  9. Provan JS, Ball MO. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J Comput. 1983;12(4): 777–88.

    Article  MathSciNet  MATH  Google Scholar 

  10. Ré C, Dalvi NN, Suciu D. Efficient top-k query evaluation on probabilistic data. In: Proceedings of the 23rd International Conference on Data Engineering; 2007. p. 886–95.

    Google Scholar 

  11. Ilyas IF, Soliman MA. Probabilistic ranking techniques in relational databases. San Rafael: Morgan & Claypool Publishers; 2011.

    Book  MATH  Google Scholar 

  12. Singh S, Mayfield C, Mittal S, Prabhakar S, Hambrusch S, Shah R. Orion 2.0: native support for uncertain data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 1239–42.

    Google Scholar 

  13. Jampani R, Xu F, Wu M, Perez LL, Jermaine C, Haas PJ. The Monte Carlo database system: stochastic analysis close to the data. ACM Trans. Database Syst. 2011;36(3):1–41.

    Article  Google Scholar 

  14. Ge T, Grabiner D, Zdonik SB. Monte Carlo query processing of uncertain multidimensional array data. In: Proceedings of the 27th International Conference on Data Engineering; 2011. p. 936–47.

    Google Scholar 

  15. Kennedy O, Koch C. PIP: a database system for great and small expectations. In: Proceedings of the 26th International Conference on Data Engineering; 2010. p. 157–68.

    Google Scholar 

  16. Cai Z, Vagena Z, Perez LL, Arumugam S, Haas PJ, Jermaine CM. Simulation of database-valued Markov chains using SimSQL. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2013. p. 637–48.

    Google Scholar 

  17. Zou L, Peng P, Zhao D. Top-K possible shortest path query over a large uncertain graph. In: Proceedings of the 12th International Conference on Web Information Systems Engineering; 2011. p. 72–86.

    Google Scholar 

  18. Cheng Y, Yuan Y, Wang G, Qiao B, Wang Z. Efficient sampling methods for shortest path query over uncertain graphs. In: Proceedings of the 19th International Conference on Database Systems for Advanced Applications; 2014. p. 124–40.

    Chapter  Google Scholar 

  19. Li R, Yu JX, Mao R, Jin T. Efficient and accurate query evaluation on uncertain graphs via recursive stratified sampling. In: Proceedings of the 30th International Conference on Data Engineering; 2014. p. 892–903.

    Google Scholar 

  20. Emrich T, Kriegel H, Niedermayer J, Renz M, Suhartha A, Züfle A. Exploration of Monte Carlo based probabilistic query processing in uncertain graphs. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management; 2012. p. 2728–30.

    Google Scholar 

  21. Souihli A, Senellart P. Optimizing approximations of DNF query lineage in probabilistic XML. In: Proceedings of the 29th International Conference on Data Engineering; 2013. p. 721–32.

    Google Scholar 

  22. Zhang Y, Lin X, Zhu G, Zhang W, Lin Q. Efficient rank based KNN query processing over uncertain data. In: Proceedings of the 26th International Conference on Data Engineering; 2010. p. 28–39.

    Google Scholar 

  23. Emrich T, Kriegel H, Mamoulis N, Niedermayer J, Renz M, Züfle A. Reverse-nearest neighbor queries on uncertain moving object trajectories. In: Proceedings of the 19th International Conference on Database Systems for Advanced Applications; 2014. p. 92–107.

    Chapter  Google Scholar 

  24. Gao ZJ, Luo S, Perez LL, Jermaine C. The BUDS language for distributed Bayesian machine learning. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2017. p. 961–76.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter J. Haas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Haas, P.J. (2018). Monte Carlo Methods for Uncertain Data. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80692

Download citation

Publish with us

Policies and ethics