Advertisement

Performance Comparison of Pipelined Hash Joins on Workstation Clusters

  • Kenji Imasaki
  • Hong Nguyen
  • Sivarama P. Dandamudi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2552)

Abstract

The traditional hash join algorithm uses a single hash table built on one of the relations participating in the join operation. A variation called double hash join was proposed to remedy some of the performance problems with the single join. In this paper, we compare the performance of single- and double-pipelined hash joins in a cluster environment. In this environment, nodes are heterogeneous; furthermore, nodes experience dynamic, non-query local background load that can impact the pipelined query execution performance. Previous studies have shown that double-pipelined hash join performs substantially better than the single-pipelined hash join when dealing with data from remote sources. However, their relative performance has not been studied in cluster environments. Our study indicates that, in the type of cluster environments we consider here, single pipelined hash join performs as well as or better than the double pipelined hash join in most cases. We present experimental results on a Pentium cluster and identify these cases.

Keywords

Query Processing Hash Table Background Process Slave Node Slave Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    M.-S. Chen, M. Lo, P. S. Yu, and H.C. Young. Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins. IEEE Transactions on Knowledge and Data Engineering, 7(4):656–667, Aug. 1995. 266CrossRefGoogle Scholar
  2. [2]
    Compaq Computer Corporation. Parallel Database Clusters for Oracle 8i, 2000. Available at http://www.compaq.com. 264
  3. [3]
    S.P. Dandamudi. Using Workstations for Database Query Operations. In International Conference of Computers and Their Applications, pages 100–105, Tempe, Arizona, Oct. 1997. 264, 265Google Scholar
  4. [4]
    S.P. Dandamudi and G. Jain. Architectures for Parallel Query Processing on Networks ofWorkstations. In International Conference of Parallel and Distributed Computing Systems, pages 444–451, New Orleans, Louisiana, Oct. 1997. 264Google Scholar
  5. [5]
    D. J. DeWitt and J. Gray. Parallel Database Systems: The Future of High-Performance Database Systems. Communications of the ACM, 35(6):85–98, June 1992. 264, 265CrossRefGoogle Scholar
  6. [6]
    M. Exbrayat and L. Brunie. A PC-NOW Based Parallel Extension for a Sequential DBMS. In International Parallel and Distributed Processing Symposium Workshops PC-NOW, pages 91–100, Cancun, Mexico, May 2000. 264Google Scholar
  7. [7]
    G. Graefe, R. Bunker, and S. Cooper. Hash Joins and Hash Teams in Microsoft SQL Server. In The 24th International Conference on Very Large Databases (VLDB), pages 86–97, Los Altos, CA, 1998.Google Scholar
  8. [8]
    M.A. Haddad and J. Robinson. Using a Network of Workstations to Enhance Database Query Processing Performance. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 8th European PVM/MPI Users’ Group Meeting, volume 2131 of Lecture Notes in Computer Science, pages 352–359, Santorini, Thera, Greece, Sept. 2001. 264CrossRefGoogle Scholar
  9. [9]
    H.-I. Hsiao, M.-S. Chen, and P. S. Yu. On Parallel Execution of Multiple Pipelined Hash Joins. In ACM-SIGMOD International Conference on Management of Data, pages 185–196, Minneapolis, U. S.A., May 1994. 266Google Scholar
  10. [10]
    K. Imasaki and S. Dandamudi. Performance Evaluation of Nested-loop Join Processing on Networks of Workstations. In Proceedings of the Seventh International Conference on Parallel and Distrubuted Systems, pages 537–544, Iwate, Japan, July 2000. 264Google Scholar
  11. [11]
    K. Imasaki and S. Dandamudi. An Adapive Hash Join Algorithm on a Network of Workstations. In International Parallel and Distributed Processing Symposium (IPDPS), Fort Lauderdale, Florida, Apr. 2002. 264Google Scholar
  12. [12]
    K. Imasaki, H. Nguyen, and S.P. Dandamudi. Performance Comparison of Piplened Hash Joins on Workstation Clusters. Technical report, Carleton University, School of Computer Science, Ottawa, Canada, July 2002. Avaialble from http://www.scs.carleton.ca/sivarama/publications.html 271, 273
  13. [13]
    Z. G. Ives, D. Florescu, M. Friedman, A. Levy, and D. S. Weld. An Adaptive Query Execution System for Data Integration. In A. Delis, C. Faloutsos, and S. Ghandeharizadeh, eds., The 1999 ACM SIGMOD International Conference on Management of Data: SIGMOD’ 99, volume 28(2) of SIGMOD Record (ACM Special Interest Group on Management of Data), pages 299–310, Philadelphia, PA, USA, June 1999. ACM Press. 271Google Scholar
  14. [14]
    S. Jalali and S.P. Dandamudi. Pipelined Hash Joins using Network of Workstations. In Parallel and Distributed Computing Systems, 14th International Conference, pages 422–429, Richardson, Texas, 2001. 264, 266Google Scholar
  15. [15]
    Oracle Corporation. Oracle9i Real Application Clusters, 2001. Available at http://otn.oracle.com/products/oracle9i. 264
  16. [16]
    D.A. Schneider and D. J. DeWitt. A Performance Evaluation of Four Parallel Join Algorithms in a Shared-nothing Multiprocessor environment. SIGMOD Record, 18(2):110–121, June 1989. 265CrossRefGoogle Scholar
  17. [17]
    C. Soleimany and S.P. Dandamudi. Distributed Parallel Query Processing on Networks of Workstation. In High-Performance Computing and Networking, 8th International Conference (HPCN Europe), volume 1823 of Lecture Notes in Computer Science, pages427–436, Amsterdam, The Netherlands, May 2000. Springer-Verlag. 264Google Scholar
  18. [18]
    T. Tamura, M. Oguchi, and M. Kitsuregawa. High Performance Parallel Query Processing on a 100 Node ATM Connected PC Cluster. IEICE Transactions on Information and Systems, 1(1):54–63, Jan. 1999. 264Google Scholar
  19. [19]
    B. Xie and S.P. Dandamudi. Hierarchical Architecture for Parallel Query Processing on Networks of Workstations. In The 5th International Conference on High Performance Computing, Chennai, Madras, India, Dec. 1998. 264Google Scholar
  20. [20]
    S. Zeng and S.P. Dandamudi. Centralized Architecture for Parallel Query Processing on Networks of Workstations. In High-Performance Computing and Networking, 7th International Conference (HPCN Europe), volume 1593 of Lecture Notes in Computer Science, pages 683–692, Amsterdam, The Netherlands, May 1999. Springer-Verlag. 264Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Kenji Imasaki
    • 1
  • Hong Nguyen
    • 1
  • Sivarama P. Dandamudi
    • 1
  1. 1.Center for Networked ComputingSchool of Computer Science Carleton UniversityOttawaCanada

Personalised recommendations