Abstract
The Iceberg SemiJoin (ISJ) of two datasets \(\mathcal R\) and \(\mathcal S\) returns the tuples in \(\mathcal R\) which join with at least k tuples of \(\mathcal S\). The ISJ operator is essential in many practical applications including OLAP, Data Mining and Information Retrieval. In this paper we consider the distributed evaluation of Iceberg SemiJoins, where \(\mathcal R\) and \(\mathcal S\) reside on remote servers. We developed an efficient algorithm which employs Bloom filters. The novelty of our approach is that we interleave the evaluation of the Iceberg set in server \(\mathcal S\) with the pruning of unmatched tuples in server \(\mathcal R\). Therefore, we are able to (i) eliminate unnecessary tuples early, and (ii) extract accurate Bloom filters from the intermediate hash tables which are constructed during the generation of the Iceberg set. Compared to conventional two-phase approaches, our experiments demonstrate that our method transmits up to 80% less data through the network, while reducing the disk I/O cost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bernstein, P., Chiu, D.: Using semijoins to solve relational queries. Journal of the ACM 28(1), 25–40 (1981)
Beyer, K.S., Ramakrishnan, R.: Bottom-up computation of sparse and iceberg cubes. In: Proc. of the Int. Conf. on Management of Data (ACM SIGMOD), pp. 359–370 (1999)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)
Bratbergsengen, K.: Hashing methods and relational algebra operations. In: Proc. of the 10th Int. Conf. on Very Large Data Bases (VLDB), pp. 323–333 (1984)
Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proc. of the Int. Conf. on Management of Data (ACM SIGMOD), pp. 255–264 (1997)
Broder, A., Glassman, S., Manasse, M.: Syntactic clustering of the web. In: Proc. of the 6th Int. World Wide Web Conference (1997)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proc. of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pp. 102–113 (2001)
Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.D.: Computing iceberg queries efficiently. In: Proc. of the 24th Int. Conf. on Very Large Data Bases (VLDB), pp. 299–310 (1998)
Han, J., Pei, J., Dong, G., Wang, K.: Efficient computation of iceberg cubes with complex measures. In: Proc. of the Int. Conf. on Management of Data, ACM SIGMOD (2001)
Mamoulis, N., Kalnis, P., Bakiras, S., Li, X.: Optimization of spatial joins on mobile devices. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds.) SSTD 2003. LNCS, vol. 2750, pp. 233–251. Springer, Heidelberg (2003)
Ng, R.T., Wagner, A., Yin, Y.: Iceberg-cube computation with pc clusters. In: Proc. of the Int. Conf. on Management of Data (ACM SIGMOD), pp. 25–36 (2001)
Yu, C.T., Philip, G., Meng, W.: Distributed top-n query processing with possibly uncooperative local systems. In: Proc. of the 29th Int. Conf. on Very Large Data Bases (VLDB), pp. 117–128 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Imthiyaz, M.K., Xiaoan, D., Kalnis, P. (2004). Efficient Processing of Distributed Iceberg Semi-joins. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2004. Lecture Notes in Computer Science, vol 3180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30075-5_61
Download citation
DOI: https://doi.org/10.1007/978-3-540-30075-5_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22936-0
Online ISBN: 978-3-540-30075-5
eBook Packages: Springer Book Archive