Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Uncertain Data Lineage

  • Sudeepa RoyEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80759


Provenance in probabilistic databases


Lineage, also called Boolean provenance, event expression, or why-provenance, is a form of provenance or origin of the answer(s) to a query executed on a database. Lineage is expressed as a Boolean formula with variables assigned to the tuples in the database, where joint usage of the tuples (by the database join operation) is captured by Boolean conjunction (AND, ∧) and alternative usage (projection or union) by Boolean disjunction (OR, ∨). Uncertain data is typically expressed in the form of a probabilistic database, which is a compact representation of a probability distribution over a set of deterministic database instances (called possible worlds). When an input query is evaluated on such a probabilistic database, instead of a deterministic set of tuples representing the answer, the output is a distribution on possible answers for the possible worlds. The query evaluation problem on uncertain data aims to compute this...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Afrati FN, Vasilakopoulos A. Query containment for databases with uncertainty and lineage. In: Proceedings of the 4th International VLDB Workshop on Management of Uncertain Data; 2010. p. 67–81.Google Scholar
  2. 2.
    Aggarwal CC. Managing and mining uncertain data. New York: Springer Publishing Company, Incorporated; 2009.zbMATHCrossRefGoogle Scholar
  3. 3.
    Akers SB. Binary decision diagrams. IEEE Trans. Comput. 1978;27(6):509–16.zbMATHCrossRefGoogle Scholar
  4. 4.
    Amarilli A, Bourhis P, Senellart P. Tractable lineages on treelike instances: limits and extensions. In: Proceedings of the 35th ACM Symposium on Principles of Database Systems; 2016. p. 355–370.Google Scholar
  5. 5.
    Beame P, Li J, Roy S, Suciu D. Exact model counting of query expressions: limitations of propositional methods. ACM Trans Database Syst. 2017;42(1):1:1–1:46.MathSciNetCrossRefGoogle Scholar
  6. 6.
    Beame P, Van den Broeck G, Gribkoff E, Suciu D. Symmetric weighted first-order model counting. In: Proceedings of the 34th ACM Symposium on Principles of Database Systems; 2015. p. 313–28.Google Scholar
  7. 7.
    Benjelloun O, Sarma AD, Hayworth C, Widom J. An introduction to ULDBs and the Trio system. IEEE Data Eng Bull. 2006;29(1):5–16.Google Scholar
  8. 8.
    Blaustein BT, Seligman L, Morse M, Allen MD, Rosenthal A. PLUS: Synthesizing privacy, lineage, uncertainty and security. In: Proceedings of the Workshops of 24th International Conference on Data Engineering; 2008. p. 242–5.Google Scholar
  9. 9.
    Bryant RE. Graph-based algorithms for Boolean function manipulation. IEEE Trans Comput 1986;35(8):677–91.zbMATHCrossRefGoogle Scholar
  10. 10.
    Buneman P, Khanna S, Tan WC. Why and where: a characterization of data provenance. In: Proceedings of the 8th International Conference on Database Theory; 2001. p. 316–30.CrossRefGoogle Scholar
  11. 11.
    Cui Y, Widom J, Wiener JL. Tracing the lineage of view data in a warehousing environment. ACM Trans Database Syst. 2000;25(2):179–227.CrossRefGoogle Scholar
  12. 12.
    Dalvi, N, Suciu, D. Management of probabilistic data: foundations and challenges. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2007. p. 1–12.Google Scholar
  13. 13.
    Dalvi N, Suciu D. The dichotomy of probabilistic inference for unions of conjunctive queries. J ACM. 2013;59(6):30:1–87.MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Dalvi NN, Suciu D. Efficient query evaluation on probabilistic databases. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 864–75.CrossRefGoogle Scholar
  15. 15.
    Fink R, Olteanu D. On the optimal approximation of queries using tractable propositional languages. In: Proceedings of the 14th International Conference on Database Theory; 2011. p. 174–185.Google Scholar
  16. 16.
    Fink R, Olteanu D. Dichotomies for queries with negation in probabilistic databases. ACM Trans Database Syst. 2016;41(1):4.MathSciNetCrossRefGoogle Scholar
  17. 17.
    Fink R, Han L, Olteanu D. Aggregation in probabilistic databases via knowledge compilation. Proc VLDB Endow. 2012;5(5):490–501.CrossRefGoogle Scholar
  18. 18.
    Fuhr N, Rölleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans Inf Syst 1997;15(1):32–66.CrossRefGoogle Scholar
  19. 19.
    Green TJ. Containment of conjunctive queries on annotated relations. In: Proceedings of the 12th International Conference on Database Theory; 2009. p. 296–309.Google Scholar
  20. 20.
    Green TJ, Tannen V. Models for incomplete and probabilistic information. IEEE Data Eng Bull. 2006;29(1):17–24.Google Scholar
  21. 21.
    Green TJ, Karvounarakis G, Tannen V. Provenance semirings. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2007. p. 31–40.Google Scholar
  22. 22.
    Gurvich VA. Criteria for repetition-freeness of functions in the algebra of logic. Soviet Math Dokl. 1991;43(3):721–6.MathSciNetzbMATHGoogle Scholar
  23. 23.
    Huang J, Darwiche A. The language of search. J Artif Intel Res. 2007;29:191–219.MathSciNetzbMATHCrossRefGoogle Scholar
  24. 24.
    Imielinski T, Lipski W Jr. Incomplete information in relational databases. J ACM. 1984;31(4):761–91.MathSciNetzbMATHCrossRefGoogle Scholar
  25. 25.
    Jha AK, Suciu D. Knowledge compilation meets database theory: compiling queries to decision diagrams. In: Proceedings of the 14th International Conference on Database Theory; 2011. p. 162–73.Google Scholar
  26. 26.
    Kanagal B, Deshpande A. Lineage processing over correlated probabilistic databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2010. p. 675–686.Google Scholar
  27. 27.
    Karp RM, Luby M. Monte-Carlo algorithms for enumeration and reliability problems. In: Proceedings of the 24th Annual Symposium on Foundations of Computer Science; 1983. p. 56–64.Google Scholar
  28. 28.
    Khanna S, Roy S, Tannen V. Queries with difference on probabilistic databases. Proc VLDB Endow. 2011;4(11):1051–62.Google Scholar
  29. 29.
    Masek WJ. A fast algorithm for the string editing problem and decision graph complexity. Master’s thesis, MIT; 1976.Google Scholar
  30. 30.
    Meiser T, Dylla M, Theobald M. Interactive reasoning in uncertain RDF knowledge bases. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management; 2011. p. 2557–2560.Google Scholar
  31. 31.
    Newman I. On read-once Boolean functions. In: Paterson MS, editor. Boolean function complexity. Cambridge/New York: Cambridge University Press; 1992. p. 25–34.zbMATHCrossRefGoogle Scholar
  32. 32.
    Olteanu D, van Schaik SJ. ENFrame: a framework for processing probabilistic data. ACM Trans Database Syst. 2016;41(1):3:1–3:44.CrossRefGoogle Scholar
  33. 33.
    Pearl J. Probabilistic reasoning in intelligent systems: networks of plausible inference. San Francisco: Morgan Kaufmann Publishers Inc; 1988.zbMATHGoogle Scholar
  34. 34.
    Roy S, Perduca V, Tannen V. Faster query answering in probabilistic databases using read-once functions. In: Proceedings of the 14th International Conference on Database Theory; 2011. p. 232–43.Google Scholar
  35. 35.
    Sen P, Deshpande A, Getoor L. Read-once functions and query evaluation in probabilistic databases. Proc VLDB Endow. 2010;3(1):1068–79.CrossRefGoogle Scholar
  36. 36.
    Suciu D, Olteanu D, Christopher R, Koch C. Probabilistic databases. 1st ed. San Rafael: Morgan & Claypool Publishers; 2011.zbMATHGoogle Scholar
  37. 37.
    Valiant LG. The complexity of enumeration and reliability problems. SIAM J Comput. 1979;8(3):410–21.MathSciNetzbMATHCrossRefGoogle Scholar
  38. 38.
    Wegener I. Branching programs and binary decision diagrams: theory and applications. Philadelphia: SIAM; 2000. ISBN:0-89871-458-3.zbMATHCrossRefGoogle Scholar
  39. 39.
    Zimányi E. Query evaluation in probabilistic relational databases. Theor Comput Sci. 1997;171(1–2): 179–219.MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceDuke UniversityDurhamUSA