Skip to main content

Pipelined band join in shared-nothing systems

  • Data Bases
  • Conference paper
  • First Online:
Algorithms, Concurrency and Knowledge (ACSC 1995)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1023))

Included in the following conference series:

  • 140 Accesses

Abstract

A non-equijoin of relations R and S is a band join if the join predicate requires values in the join attribute of R to fall within a specified band about the values in the join attribute of S. Traditionally, R and S are split into partitions that are assigned to processors for the join to be executed concurrently and independently. Since the join is a non-equijoin, some records of R (or S) must appear in more than one partition, i.e. some records are replicated across two or more partitions. This may lead to poor performance especially when the number of records to be replicated is large. This paper presents a new algorithm, called the pipelined band join. The algorithm avoids data replication in secondary storage by dynamically creating partitions during join computation through pipelining. A preliminary study indicates that the proposed algorithm outperforms the conventional method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Boral, W. Alexander, L. Clay, G. Copeland, S. Danforth, M. Franklin, B. Hart, M. Smith, and P. Valduriez. Prototyping bubba, a highly parallel database system. IEEE Transactions on Knowledge and Data Engineering, 2(1):4–24, March 1990.

    Google Scholar 

  2. E.F. Codd. A relational model of data for large shared data bank. Communications of the ACM, 13(6):377–387, June 1970.

    Google Scholar 

  3. D.J. DeWitt, S. Ghandeharizadeh, D.A. Scheneider, A. Bricker, H-I Hsiao, and R. Rasmussen. The gamma database machine project. IEEE Trans. Knowledge and Data Engineering, 2(1):44–62, March 1990.

    Google Scholar 

  4. D.J. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. Communications of the ACM, 35(6):85–98, June 1992.

    Google Scholar 

  5. D.J. DeWitt, J.F. Naughton, and D.A. Schneider. An evaluation of non-equijoin algorithms. In Proceedings of the 17th Intl. Conf. on Very Large Data Bases, pages 443–452, Barcelona, Spain, September 1991.

    Google Scholar 

  6. S. Englert, J. Gray, T. Kocher, and P. Shah. A benchmark of nonstop sql release 2 demonstrating near-linear speedup and scaleup on large databases. Technical Report Technical Report 89.4, Tandom Computer Inc., 1989.

    Google Scholar 

  7. J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitative Approach (page 17). Morgan Kaufman Publishers Inc., 1990.

    Google Scholar 

  8. K.A. Hua and C. Lee. Handling data skew in multiprocessor database computers using partition tuning. In Proceedings of the 17th International Conference on Very Large Data Bases, pages 525–535, Barcelona, Spain, September 1991.

    Google Scholar 

  9. K.A. Hua, Y.L. Lo, and H.C. Young. Including the load balancing issue in the optimization of multi-way join queries for shared-nothing database computers. In Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems, pages 74–83, San Diego, California, January 1993.

    Google Scholar 

  10. H. Lu, K.L. Tan, and M.C. Shan. Hash-based join algorithms for multiprocessor computers with shared memory. In Proceedings of the 16th International Conference on Very Large Data Bases, pages 198–209, Brisbane, Australia, August 1990.

    Google Scholar 

  11. P. Mishra and M.H. Eich. Join processing in relational databases. ACM Computing Surveys, 24(1):63–113, March 1992.

    Google Scholar 

  12. L. Shapiro. Join processing in database systems with large main memories. ACM Transactions on Database Systems, 11(3):239–264, September 1986.

    Google Scholar 

  13. V. Soloviev. A truncating hash algorithm for processing band-join queries. In Proceedings of the 9th Intl. Conf. on Data Engineering, pages 419–427, Vienna, Austria, February 1993.

    Google Scholar 

  14. M. Soo, R. Snodgrass, and C. Jenson. Efficient evaluation of the valid-time natural join. In Proceedings of the 10th Intl. Conf. on Data Engineering, pages 282–292, February 1994.

    Google Scholar 

  15. M. Stonebraker. The case for shared nothing. Database Engineering, 9(1):4–9, 1986.

    Google Scholar 

  16. Teradata Corporation. Dbc/1012 database computer concepts and facilities, rel. 3.1 edition, teradata document c02-0001-05. Los Angeles, CA, 1988.

    Google Scholar 

  17. J. Torrellas, A. Gupta, and J. Hennessy. Characterizing the cache performance and synchronization behavior of a multiprocessor operating system. Technical Report CSL-TR 92-512, Computer Systems Laboratory, Stanford University, January 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Kanchana Kanchanasut Jean-Jacques Lévy

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lu, H., Tan, KL. (1995). Pipelined band join in shared-nothing systems. In: Kanchanasut, K., Lévy, JJ. (eds) Algorithms, Concurrency and Knowledge. ACSC 1995. Lecture Notes in Computer Science, vol 1023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60688-2_48

Download citation

  • DOI: https://doi.org/10.1007/3-540-60688-2_48

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60688-8

  • Online ISBN: 978-3-540-49262-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics