Skip to main content

Probabilistic Threshold Join over Distributed Uncertain Data

  • Conference paper
Web-Age Information Management (WAIM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6897))

Included in the following conference series:

Abstract

Large amount of uncertain data is collected by many emerging applications which contain multiple sources in a distributed manner. Previous efforts on querying uncertain data in distributed environment have only focus on ranking and skyline, join queries have not been addressed in earlier work despite their importance in databases. In this paper, we address distributed probabilistic threshold join query, which retrieves results satisfying the join condition with combining probabilities that meet the threshold requirement from distributed sites. We propose a new kind of bloom filters called Probability Bloom Filters (PBF) to represent set with probabilistic attribute and design a PBF based Bloomjoin algorithm for executing distributed probabilistic threshold join query with communication efficiency. Furthermore, we provide theoretical analysis of the network cost of our algorithm and demonstrate it by simulation. The experiment results show that our algorithm can save network cost efficiently by comparing to original Bloomjoin algorithm in most scenarios.

This work was supported by the National Natural Science Foundation of China (NSFC) under grant No. 61001070. We would like to thank anonymous reviewers for the insightful comments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J., Hong, W.: Model-driven data acquisition in sensor networks. In: VLDB (2004)

    Google Scholar 

  2. Deng, K., Zhou, X., Shen, H.T.: Multi-source skyline query processing in road networks. In: ICDE (2007)

    Google Scholar 

  3. Li., F., Yi., K., Jestes, J.: Ranking Distributed Probabilistic Data. In: SIGMOD (2009)

    Google Scholar 

  4. Ye, M., Liu, X., Lee, W., Lee, D.: Probabilistic Top-k Query Processing in Distributed Sensor Networks. In: ICDE (2010)

    Google Scholar 

  5. Ding, X., Jin, H.: Efficient and Progressive Algorithms for Distributed Skyline Queries over Uncertain Data. In: ICDCS (2010)

    Google Scholar 

  6. Fuhr, N., Rölleke, T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM TOIS 14(1) (1997)

    Google Scholar 

  7. Perez, L., Arumugam, S., Jermaine, C.: Evaluation of Probabilistic Threshold Queries in MCDB. In: SIGMOD (2010)

    Google Scholar 

  8. Yang, S., Zhang, W., Zhang, Y., Lin, X.: Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, Q.-M. (eds.) APWeb/WAIM 2009. LNCS, vol. 5446, pp. 51–62. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  9. Agrawal, P., Widom, J.: Confidence-aware join algorithms. In: ICDE (2009)

    Google Scholar 

  10. Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. 32(4), 422–469 (2000)

    Article  Google Scholar 

  11. Mackert, L.F., Lohman, G.M.: R* optimizer validation and performance evaluation for distributed queries. In: VLDB (1986)

    Google Scholar 

  12. Bloom, B.: Space/time tradeoffs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  13. Ramesh, S., Papapetrou, O., Siberski, W.: Optimizing distributed joins with bloom filters. In: Parashar, M., Aggarwal, S.K. (eds.) ICDCIT 2008. LNCS, vol. 5375, pp. 145–156. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  14. Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary cache: a scalable wide-area web cache sharing protocol. SIGCOMM Comput. Commun. Rev. 28(4), 254–265 (1998)

    Article  Google Scholar 

  15. Michael, L., Nejdl, W., Papapetrou, O., Siberski, W.: Improving distributed join efficiency with extended bloom filter operations. In: AINA (2007)

    Google Scholar 

  16. Papapetrou, O., Siberski, W., Nejdl, W.: Cardinality estimation and dynamic length adaptation for Bloom filters. Distrib Parallel Databases 28, 119–156 (2010)

    Article  MATH  Google Scholar 

  17. Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21(5), 609–623 (2009)

    Article  Google Scholar 

  18. Cheng, R., Singh, S., Prabhakar, S., Shah, R., Vitter, J.S., Xia, Y.: Efficient join processing over uncertain data. In: CIKM (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Deng, L., Wang, F., Huang, B. (2011). Probabilistic Threshold Join over Distributed Uncertain Data. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23535-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23535-1_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23534-4

  • Online ISBN: 978-3-642-23535-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics