Advertisement

On Disk Allocation of Intermediate Query Results in Parallel Database Systems

  • Holger Märtens
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1685)

Abstract

For complex queries in parallel database systems, substantial amounts of data must be redistributed between operators executed on different processing nodes. Frequently, such intermediate results cannot be held in main memory and must be stored on disk. To limit the ensuing performance penalty, a data allocation must be found that supports parallel I/O to the greatest possible extent.

In this paper, we propose declustering even self-contained units of temporary data processed in a single operation (such as individual buckets of parallel hash joins) across multiple disks. Using a suitable analytical model, we find that the improvement of parallel I/O outweighs the penalty of increased fragmentation.

Keywords

Complex Query Processing Node Optimal Degree Disk Access Disk Array 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    DeWitt, D.J., Naughton, J.F., Schneider, D.A., Seshadri, S.: Practical Skew Handling in Parallel Joins. Proc. 18th VLDB Conference, Vancouver (1992) 27–40Google Scholar
  2. 2.
    Graefe, G.: Query Evaluation Techniques for Large Databases. ACM Computing Surveys, Vol. 25, No. 2 (1993) 73–170Google Scholar
  3. 3.
    Kitsuregawa, M., Ogawa, Y.: Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC). Proc. 16th VLDB Conference, Brisbane (1990) 210–221Google Scholar
  4. 4.
    Märtens, H.: Skew-Insensitive Join Processing in Shared-Disk Database Systems. Proc. IDPT Conference, Vol. 2, Berlin (1998) 17–24Google Scholar
  5. 5.
    Märtens, H.: Disk Scheduling for Intermediate Results of Large Join Queries in Shared-Disk Parallel Database Systems. IfI-Report Nr. 9/98, Universität Leipzig (1998)Google Scholar
  6. 6.
    Merchant, A., Yu, P.S.: Analytic Modeling and Comparisons of Striping Strategies for Replicated Disk Arrays. IEEE Trans. Computers, Vol. 44, No. 3 (1995) 419–433Google Scholar
  7. 7.
    Nodine, M.H., Vitter, J.S.: Deterministic Distribution Sort in Shared and Distributed Memory Multiprocessors. Proc. 5th SPAA, Velen (1993) 120–129Google Scholar
  8. 8.
    Omiecinski, E.: Performance Analysis of a Load Balancing Hash-Join Algorithm for a Shared-Memory Multiprocessor. Proc. 17th VLDB Conference, Barcelona (1991) 375–385Google Scholar
  9. 9.
    Rahm, E.: Dynamic Load Balancing in Parallel Database Systems. Proc. Euro-Par’ 96 Conference, Lyon (1996) 37–52Google Scholar
  10. 10.
    Rahm, E., Marek, R.: Dynamic Multi-Resource Load Balancing in Parallel Database Systems. Proc. 21st VLDB Conference, Zürich (1995) 395–406Google Scholar
  11. 11.
    Ruemmler, C., Wilkes, J.: An introduction to disk drive modeling. IEEE Computer, Vol. 27, No. 3 (1994) 17–28Google Scholar
  12. 12.
    Scheuermann, P., Weikum, G., Zabback, P.: Data Partitioning and Load Balancing in Parallel Disk Systems. VLDB Journal, Vol. 7, No. 1 (1998) 48–66Google Scholar
  13. 13.
    Wolf, J.L., Dias, D.M., Yu, P.S., Turek, J.: New Algorithms for Parallelizing Relational Database Joins in the Presence of Data Skew. IEEE Trans. Knowl. Data Eng., Vol. 6, No. 6 (1994) 990–997Google Scholar
  14. 14.
    Wu, K.-L., Yu, P.S., Chung, J.-Y., Teng, J.Z.: A Performance Study of Workfile Disk Management for Concurrent Mergesorts in a Multiprocessor Database System. Proc. 21st VLDB Conference, Zürich (1995) 100–109Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Holger Märtens
    • 1
  1. 1.Institut für InformatikUniversität LeipzigLeipzigGermany

Personalised recommendations