Skip to main content

Efficient Processing of Distributed Iceberg Semi-joins

  • Conference paper
Database and Expert Systems Applications (DEXA 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3180))

Included in the following conference series:

  • 670 Accesses

Abstract

The Iceberg SemiJoin (ISJ) of two datasets \(\mathcal R\) and \(\mathcal S\) returns the tuples in \(\mathcal R\) which join with at least k tuples of \(\mathcal S\). The ISJ operator is essential in many practical applications including OLAP, Data Mining and Information Retrieval. In this paper we consider the distributed evaluation of Iceberg SemiJoins, where \(\mathcal R\) and \(\mathcal S\) reside on remote servers. We developed an efficient algorithm which employs Bloom filters. The novelty of our approach is that we interleave the evaluation of the Iceberg set in server \(\mathcal S\) with the pruning of unmatched tuples in server \(\mathcal R\). Therefore, we are able to (i) eliminate unnecessary tuples early, and (ii) extract accurate Bloom filters from the intermediate hash tables which are constructed during the generation of the Iceberg set. Compared to conventional two-phase approaches, our experiments demonstrate that our method transmits up to 80% less data through the network, while reducing the disk I/O cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bernstein, P., Chiu, D.: Using semijoins to solve relational queries. Journal of the ACM 28(1), 25–40 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  2. Beyer, K.S., Ramakrishnan, R.: Bottom-up computation of sparse and iceberg cubes. In: Proc. of the Int. Conf. on Management of Data (ACM SIGMOD), pp. 359–370 (1999)

    Google Scholar 

  3. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  4. Bratbergsengen, K.: Hashing methods and relational algebra operations. In: Proc. of the 10th Int. Conf. on Very Large Data Bases (VLDB), pp. 323–333 (1984)

    Google Scholar 

  5. Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proc. of the Int. Conf. on Management of Data (ACM SIGMOD), pp. 255–264 (1997)

    Google Scholar 

  6. Broder, A., Glassman, S., Manasse, M.: Syntactic clustering of the web. In: Proc. of the 6th Int. World Wide Web Conference (1997)

    Google Scholar 

  7. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proc. of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pp. 102–113 (2001)

    Google Scholar 

  8. Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.D.: Computing iceberg queries efficiently. In: Proc. of the 24th Int. Conf. on Very Large Data Bases (VLDB), pp. 299–310 (1998)

    Google Scholar 

  9. Han, J., Pei, J., Dong, G., Wang, K.: Efficient computation of iceberg cubes with complex measures. In: Proc. of the Int. Conf. on Management of Data, ACM SIGMOD (2001)

    Google Scholar 

  10. Mamoulis, N., Kalnis, P., Bakiras, S., Li, X.: Optimization of spatial joins on mobile devices. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds.) SSTD 2003. LNCS, vol. 2750, pp. 233–251. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  11. Ng, R.T., Wagner, A., Yin, Y.: Iceberg-cube computation with pc clusters. In: Proc. of the Int. Conf. on Management of Data (ACM SIGMOD), pp. 25–36 (2001)

    Google Scholar 

  12. Yu, C.T., Philip, G., Meng, W.: Distributed top-n query processing with possibly uncooperative local systems. In: Proc. of the 29th Int. Conf. on Very Large Data Bases (VLDB), pp. 117–128 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Imthiyaz, M.K., Xiaoan, D., Kalnis, P. (2004). Efficient Processing of Distributed Iceberg Semi-joins. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2004. Lecture Notes in Computer Science, vol 3180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30075-5_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30075-5_61

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22936-0

  • Online ISBN: 978-3-540-30075-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics