Pipelined band join in shared-nothing systems

Lu, Hongjun; Tan, Kian-Lee

doi:10.1007/3-540-60688-2_48

Hongjun Lu¹ &
Kian-Lee Tan¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1023))

Included in the following conference series:

Asian Computing Science Conference

140 Accesses

Abstract

A non-equijoin of relations R and S is a band join if the join predicate requires values in the join attribute of R to fall within a specified band about the values in the join attribute of S. Traditionally, R and S are split into partitions that are assigned to processors for the join to be executed concurrently and independently. Since the join is a non-equijoin, some records of R (or S) must appear in more than one partition, i.e. some records are replicated across two or more partitions. This may lead to poor performance especially when the number of records to be replicated is large. This paper presents a new algorithm, called the pipelined band join. The algorithm avoids data replication in secondary storage by dynamically creating partitions during join computation through pipelining. A preliminary study indicates that the proposed algorithm outperforms the conventional method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Boral, W. Alexander, L. Clay, G. Copeland, S. Danforth, M. Franklin, B. Hart, M. Smith, and P. Valduriez. Prototyping bubba, a highly parallel database system. IEEE Transactions on Knowledge and Data Engineering, 2(1):4–24, March 1990.
Google Scholar
E.F. Codd. A relational model of data for large shared data bank. Communications of the ACM, 13(6):377–387, June 1970.
Google Scholar
D.J. DeWitt, S. Ghandeharizadeh, D.A. Scheneider, A. Bricker, H-I Hsiao, and R. Rasmussen. The gamma database machine project. IEEE Trans. Knowledge and Data Engineering, 2(1):44–62, March 1990.
Google Scholar
D.J. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. Communications of the ACM, 35(6):85–98, June 1992.
Google Scholar
D.J. DeWitt, J.F. Naughton, and D.A. Schneider. An evaluation of non-equijoin algorithms. In Proceedings of the 17th Intl. Conf. on Very Large Data Bases, pages 443–452, Barcelona, Spain, September 1991.
Google Scholar
S. Englert, J. Gray, T. Kocher, and P. Shah. A benchmark of nonstop sql release 2 demonstrating near-linear speedup and scaleup on large databases. Technical Report Technical Report 89.4, Tandom Computer Inc., 1989.
Google Scholar
J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitative Approach (page 17). Morgan Kaufman Publishers Inc., 1990.
Google Scholar
K.A. Hua and C. Lee. Handling data skew in multiprocessor database computers using partition tuning. In Proceedings of the 17th International Conference on Very Large Data Bases, pages 525–535, Barcelona, Spain, September 1991.
Google Scholar
K.A. Hua, Y.L. Lo, and H.C. Young. Including the load balancing issue in the optimization of multi-way join queries for shared-nothing database computers. In Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems, pages 74–83, San Diego, California, January 1993.
Google Scholar
H. Lu, K.L. Tan, and M.C. Shan. Hash-based join algorithms for multiprocessor computers with shared memory. In Proceedings of the 16th International Conference on Very Large Data Bases, pages 198–209, Brisbane, Australia, August 1990.
Google Scholar
P. Mishra and M.H. Eich. Join processing in relational databases. ACM Computing Surveys, 24(1):63–113, March 1992.
Google Scholar
L. Shapiro. Join processing in database systems with large main memories. ACM Transactions on Database Systems, 11(3):239–264, September 1986.
Google Scholar
V. Soloviev. A truncating hash algorithm for processing band-join queries. In Proceedings of the 9th Intl. Conf. on Data Engineering, pages 419–427, Vienna, Austria, February 1993.
Google Scholar
M. Soo, R. Snodgrass, and C. Jenson. Efficient evaluation of the valid-time natural join. In Proceedings of the 10th Intl. Conf. on Data Engineering, pages 282–292, February 1994.
Google Scholar
M. Stonebraker. The case for shared nothing. Database Engineering, 9(1):4–9, 1986.
Google Scholar
Teradata Corporation. Dbc/1012 database computer concepts and facilities, rel. 3.1 edition, teradata document c02-0001-05. Los Angeles, CA, 1988.
Google Scholar
J. Torrellas, A. Gupta, and J. Hennessy. Characterizing the cache performance and synchronization behavior of a multiprocessor operating system. Technical Report CSL-TR 92-512, Computer Systems Laboratory, Stanford University, January 1992.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Systems & Computer Science, National University of Singapore, Singapore
Hongjun Lu & Kian-Lee Tan

Authors

Hongjun Lu
View author publications
You can also search for this author in PubMed Google Scholar
Kian-Lee Tan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Kanchana Kanchanasut Jean-Jacques Lévy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, H., Tan, KL. (1995). Pipelined band join in shared-nothing systems. In: Kanchanasut, K., Lévy, JJ. (eds) Algorithms, Concurrency and Knowledge. ACSC 1995. Lecture Notes in Computer Science, vol 1023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60688-2_48

Download citation

DOI: https://doi.org/10.1007/3-540-60688-2_48
Published: 01 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60688-8
Online ISBN: 978-3-540-49262-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics