Skip to main content

Efficient Evaluation of HAVING Queries on a Probabilistic Database

  • Conference paper
Book cover Database Programming Languages (DBPL 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4797))

Included in the following conference series:

Abstract

We study the evaluation of positive conjunctive queries with Boolean aggregate tests (similar to HAVING queries in SQL) on probabilistic databases. Our motivation is to handle aggregate queries over imprecise data resulting from information integration or information extraction. More precisely, we study conjunctive queries with predicate aggregates using MIN,MAX,COUNT, SUM,AVG or COUNT(DISTINCT) on probabilistic databases. Computing the precise output probabilities for positive conjunctive queries (without HAVING) is #\({\mathcal {P}}\)-hard, but is in \({\mathcal {P}}\) for a restricted class of queries called safe queries. Further, for queries without self-joins either a query is safe or its data complexity is #\({\mathcal {P}}\)-Hard, which shows that safe queries exactly capture tractable queries without self-joins. In this paper, for each aggregate above, we find a class of queries that exactly capture efficient evaluation for HAVING queries without self-joins. Our algorithms use a novel technique to compute the marginal distributions of elements in a semiring, which may be of independent interest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)

    MATH  Google Scholar 

  2. Arenas, M., Bertossi, L., Chomicki, J., He, X., Raghavan, V., Spinrad, J.: Scalar aggregation in inconsistent databases. Theoretical Computer Science (2003)

    Google Scholar 

  3. Barbara, D., Garcia-Molina, H., Porter, D.: The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4(5), 487–502 (1992)

    Article  Google Scholar 

  4. Burdick, D., Deshpande, P.M., Jayram, T.S., Ramakrishnan, R., Vaithyanathan, S.: Olap over uncertain and imprecise data. VLDB J. 16(1), 123–144 (2007)

    Google Scholar 

  5. Cafarella, M.J., Ré, C., Suciu, D., Etzioni, O.: Structured querying of web text data: A technical challenge. In: CIDR, pp. 225–234 (2007), http://www.crdrdb.org

  6. Cheng, R., Kalashnikov, D., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: Proceedings of ACM SIGMOD Conference, ACM Press, New York (2003)

    Google Scholar 

  7. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB, Toronto, Canada (2004)

    Google Scholar 

  8. Dalvi, N., Suciu, D.: Management of probabilisitic data: Foundations and challenges. In: PODS, pp. 1–12 (2007)

    Google Scholar 

  9. Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J., Hong, W.: Model-driven data acquisition in sensor networks (2004)

    Google Scholar 

  10. Fuxman, A., Miller, R.J.: First-order query rewriting for inconsistent databases. In: ICDT, pp. 337–351 (2005)

    Google Scholar 

  11. Gradel, E., Gurevich, Yu., Hirch, C.: The complexity of query reliability. In: Symposium on Principles of Database Systems, pp. 227–234 (1998)

    Google Scholar 

  12. Green, T., Karvounarakis, G., Tannen, V.: Provenance semirings. In: PODS (2007)

    Google Scholar 

  13. Green, T.J., Tannen, V.: Models for incomplete and probabilistic information. IEEE Data Engineering Bulletin 29 (2006)

    Google Scholar 

  14. Gupta, R., Sarawagi, S.: Curating probabilistic databases from information extraction models. In: Proc. of the 32nd Int’l. Conference on Very Large Databases (VLDB) (2006)

    Google Scholar 

  15. Hernandez, M.A., Stolfo, S.J.: The merge/purge problem for large databases. In: SIGMOD Conference, pp. 127–138 (1995)

    Google Scholar 

  16. Jayram, T.S., Kale, S., Vee, E.: Efficient aggregation algorithms for probabilistic data. In: SODA (2007)

    Google Scholar 

  17. Jayram, T.S., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., Zhu, H.: Avatar information extraction system. IEEE Data Engineering Bulletin 29(1) (2006)

    Google Scholar 

  18. Lakshmanan, L., Leone, N., Ross, R., Subrahmanian, V.S.: Probview: A flexible probabilistic database system. ACM Trans. Database Syst. 22(3) (1997)

    Google Scholar 

  19. Mansuri, I., Sarawagi, S.: A system for integrating unstructured data into relational databases. In: Proc. of the 22nd IEEE Int’l. Conference on Data Engineering (ICDE), IEEE Computer Society Press, Los Alamitos (2006)

    Google Scholar 

  20. Parag, A., Benjelloun, O., Sarma, A.D., Hayworth, C., Nabar, S., Sugihara, T., Widom, J.: Trio: A system for data uncertainty and lineage. In: VLDB (2006)

    Google Scholar 

  21. Ré, C., Dalvi, N., Suciu, D.: Query evaluation on probabilistic databases. IEEE Data Engineering Bulletin 29(1), 25–31 (2006)

    Google Scholar 

  22. Ré, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: Proceedings of ICDE (2007)

    Google Scholar 

  23. Ré, C., Suciu, D.: Efficient evaluation of having queries on a probabilistic database. Technical Report TR2007-06-01, University of Washington, Seattle, Washington (June 2007)

    Google Scholar 

  24. Ré, C., Suciu, D.: Materialized views in probabilsitic databases for information exchange and query optimization. In: VLDB (2007)

    Google Scholar 

  25. Ross, R., Subrahmanian, V.S., Grant, J.: Aggregate operators in probabilistic databases. J. ACM 52(1), 54–101 (2005)

    Article  MathSciNet  Google Scholar 

  26. Sarma, A.D., Benjelloun, O., Halevy, A.Y., Widom, J.: Working models for uncertain data. In: Liu, L., Reuter, A., Whang, K.-Y., Zhang, J. (eds.) ICDE, p. 7. IEEE Computer Society Press, Los Alamitos (2006)

    Google Scholar 

  27. Sen, P., Deshpande, A.: Representing and querying correlated tuples in probabilistic databases. In: Proceedings of ICDE (2007)

    Google Scholar 

  28. Valiant, L.G.: The complexity of enumeration and reliability problems. SIAM J. Comput. 8(3), 410–421 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  29. Widom, J.: Trio: A system for integrated management of data, accuracy, and lineage. In: CIDR, pp. 262–276 (2005)

    Google Scholar 

  30. Winkler, W.E.: Improved decision rules in the fellegi-sunter model of record linkage. Technical report, Statistical Research Division, U.S. Census Bureau, Washington, DC (1993)

    Google Scholar 

  31. Winkler, W.E.: The state of record linkage and current research problems. Technical report, Statistical Research Division, U.S. Bureau of the Census (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Marcelo Arenas Michael I. Schwartzbach

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ré, C., Suciu, D. (2007). Efficient Evaluation of HAVING Queries on a Probabilistic Database. In: Arenas, M., Schwartzbach, M.I. (eds) Database Programming Languages. DBPL 2007. Lecture Notes in Computer Science, vol 4797. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75987-4_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75987-4_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75986-7

  • Online ISBN: 978-3-540-75987-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics