Skip to main content

A Working Model for Uncertain Data with Lineage

  • Conference paper
  • First Online:
Conceptual Modeling (ER 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9381))

Included in the following conference series:

  • 1883 Accesses

Abstract

Lineage is important in uncertain data management since it can be used for finding out which part of data contributes to a result and computing the probability of the result. Nonetheless, the existing works consider an uncertain tuple as a set of tuples that can be stored in a relational table. Lineage can derive each tuple in the table, with which one can only find out the tuples rather than specific attributes that contribute to the result. If uncertain tuples have multiple uncertain attributes, for a result tuple with low probability, users cannot know which attribute is the main cause of it. In this paper, we propose an approach to model uncertain data. Compared with the alternative way based on the relational model, our model achieves a low maintenance cost and avoids a large number of redundant storage and join operations. Based on our model, some operations are defined for querying data, generating lineage and computing probability of results. Then we discuss how to correctly compute probability with lineage and an algorithm is proposed to transform lineage for correct probability computation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Das Sarma, A., Theobald, M., Widom, J.: Exploiting lineage for confidence computation in uncertain and probabilistic databases. In: ICDE, pp. 1023–1032 (2008)

    Google Scholar 

  2. Benjelloun, O., Sarma, A.D., Halevy, A., Theobald, M., Widom, J.: Databases with uncertainty and lineage. VLDB J. 17(2), 243–264 (2008)

    Article  Google Scholar 

  3. Das Sarma, A., Theobald, M., Widom, J.: Working models for uncertain data. In: ICDE (2006)

    Google Scholar 

  4. Green, T.J., Tannen, V.: Models for incomplete and probabilistic information. In: Grust, T., Höpfner, H., Illarramendi, A., Jablonski, S., Fischer, F., Müller, S., Patranjan, P.-L., Sattler, K.-U., Spiliopoulou, M., Wijsen, J. (eds.) EDBT 2006. LNCS, vol. 4254, pp. 278–296. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Cui, Y., Widom, J., Wiener, J.L.: Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst. 25(2), 179–227 (2000)

    Article  Google Scholar 

  6. Buneman, P., Tan, W.C.: Provenance in databases. In: SIGMOD, pp: 1171–1173 (2007)

    Google Scholar 

  7. Kanagal, B., Deshpande, A.: Lineage processing over correlated probabilistic databases. In: SIGMOD, pp. 675–686 (2010)

    Google Scholar 

  8. Ré, C., Suciu, D.: Approximate lineage for probabilistic databases. In: VLDB, pp. 797–808 (2008)

    Google Scholar 

  9. Amsterdamer, Y., Deutch, D., Milo, T., Tannen, V.: On provenance minimization. ACM Trans. Database Syst. 37(4), 3123 (2011)

    Google Scholar 

  10. Sen, P., Deshpande, A., Getoor, L.: Read-once functions and query evaluation in probabilistic databases. In: VLDB, pp: 1068–1079 (2010)

    Google Scholar 

  11. Dalvi, N., Ré, C., Suciu, D.: Queries and materialized views on probabilistic databases. J. Comput. Syst. Sci. 77(3), 473–490 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  12. Akbarinia, R., Valduriez, P., Verger, G.: Efficient evaluation of sum queries over probabilistic data. IEEE Trans. Knowl. Data Eng. 25(4), 764–775 (2013)

    Article  Google Scholar 

  13. Murthy, R., Ikeda, R., Widom, J.: Making aggregation work in uncertain and probabilistic databases. IEEE Trans. Knowl. Data Eng. 23(8), 1261–1273 (2011)

    Article  Google Scholar 

  14. Cormode, G., Srivastava, D., Shen, E., Yu, T.: Aggregate query answering on possibilistic data with cardinality constraints. In: ICDE (2012)

    Google Scholar 

  15. Fink, R., Huang, J., Olteanu, D.: Anytime approximation in probabilistic databases. VLDB J. 22(6), 823–848 (2013)

    Article  Google Scholar 

  16. Sen, P., Deshpande, A., Getoor, L.: Representing tuple and attribute uncertainty in probabilistic databases. In: ICDM Workshops, pp. 507–512 (2007)

    Google Scholar 

  17. Singh, S., Mayfield, C., Shah, R., Prabhakar, S., Hambrusch, S., Neville, J., Cheng, R.: Database support for probabilistic attributes and tuples. In: ICDE, pp. 1053–1061 (2008)

    Google Scholar 

  18. Peng, Z.Y., Kambayashi, Y.: Deputy mechanisms for object-oriented databases. In: ICDE, pp. 333–340 (1995)

    Google Scholar 

  19. Bachman, C.W., Daya, M.: The role concept in data models. In: VLDB, pp. 464–476 (1977)

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by the NNSFC under grant No. 61232002 and No. 61202033, NSFHP under grant No. 2011CDB448, Ph.D. SFWU under grant No. 2012211020207.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Liwei Wang or Zhiyong Peng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, L., Wang, L., Peng, Z. (2015). A Working Model for Uncertain Data with Lineage. In: Johannesson, P., Lee, M., Liddle, S., Opdahl, A., Pastor López, Ó. (eds) Conceptual Modeling. ER 2015. Lecture Notes in Computer Science(), vol 9381. Springer, Cham. https://doi.org/10.1007/978-3-319-25264-3_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25264-3_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25263-6

  • Online ISBN: 978-3-319-25264-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics