Advertisement

Balanced, Locality-Based Parallel Irregular Reductions

  • Eladio Gutiérrez
  • Oscar Plata
  • Emilio L. Zapata
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2624)

Abstract

Much effort has been devoted recently to efficiently parallelize irregular reductions. Different parallelization techniques have been proposed during the last years that can be classified into two groups: LPO (Loop Partitioning Oriented methods) and DPO (Data Partitioning Oriented methods). We have analyzed both classes in terms of a set of performance aspects: data locality, memory overhead, parallelism and workload balancing. Load balancing is not an issue sufficiently analyzed in the literature in parallel reduction methods, specially those in the DPO class. In this paper we propose two techniques to introduce load balancing into a DPO method. The first technique is generic, as it can deal with any kind of load unbalancing present in the problem domain. The second technique handles a special case of load unbalancing, appearing when there are a large number of write operations on small regions of the reduction arrays. Efficient implementations of the proposed solutions to load balancing for an example DPO method are presented. Experiments on static and dynamic kernel codes were conducted making comparisons with other parallel reduction methods.

Keywords

Loop Iteration Execution Phase Memory Overhead Workload Balance Parallel Thread 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    R. Asenjo, E. Gutiérrez, Y. Lin, D. Padua, B. Pottengerg, and E. Zapata. On the Automatic Parallelization of Sparse and Irregular Fortran Codes. Technical Report 1512, University for Illinois at Urbana-Champaign, Center for Supercomputing R&D., December 1996.Google Scholar
  2. [2]
    T. Davis, The University of Florida Sparse Matrix Collection. NA Digest, 97(23), June 1997.Google Scholar
  3. [3]
    C. Ding and K. Kennedy, Improving Cache Performance of Dynamic Applications with Computation and Data Layout Transformations. In Proceedings of the ACM International Conference on Programming Language Design and Implementation (PLDI’99), pages 229–241, Atlanta, GA, May 1999.Google Scholar
  4. [4]
    E. Gutiérrez, O. Plata, and E.L. Zapata. An Automatic Parallelization of Irregular Reductions on Scalable Shared Memory Multiprocessors. In Proceedings of the 5th International Euro-Par Conference (EuroPar’99), pages 422–429, Tolouse, France, August–September 1999.Google Scholar
  5. [5]
    E. Gutiérrez, O. Plata, and E.L. Zapata. A Compiler Method for the Parallel Execution of Irregular Reductions in Scalable Shared Memory Multiprocessors. In Proceedings of the 14th ACM International Conference on Supercomputing (ICS’2000), pages 78–87, Santa Fe, NM, May 2000.Google Scholar
  6. [6]
    E. Gutiérrez, R. Asenjo, O. Plata, and E.L. Zapata. Automatic Parallelization of Irregular Applications. J. Parallel Computing, 26(13–14):1709–1738, December 2000.zbMATHCrossRefGoogle Scholar
  7. [7]
    H. Han and C.-W. Tseng, Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes. In Proceedings of the 11th Workshop on Languages and Compilers for Parallel Computing (LCPC’98), pages 181–196, Chapel Hill, NC, August 1998.Google Scholar
  8. [8]
    H. Han and C.-W. Tseng, Efficient Compiler and Run-Time Support for Parallel Irregular Reductions. J. Parallel Computing, 26(13–14):1709–1738, December 2000.Google Scholar
  9. [9]
    H. Han and C.-W. Tseng, Improving Locality for Adaptive Irregular Scientific Codes. In Proceedings of the 13th Workshop on Languages and Compilers for Parallel Computing (LCPC’00), Yorktown Heights, NY, August 2000.Google Scholar
  10. [10]
    H. Han and C.-W. Tseng, A Comparison of Parallelization Techniques for Irregular Reductions. In Proceedings of the 15th IEEE International Parallel and Distributed Processing Symposium (IPDPS’2001), San Francisco, CA, April 2001.Google Scholar
  11. [11]
    Y. Lin and D. Padua, On the Automatic Parallelization of Sparse and Irregular Fortran Programs. In Proceedings of the 4th Workshop on Languages, Compilers and Runtime Systems for Scalable Computers (LCR’98), Pittsburgh, PA, May 1998.Google Scholar
  12. [12]
    J. Morales and S. Toxvaerd. The Cell-Neighbour Table Method in Molecular Dynamics Simulations. Computer Physics Communication, 71:71–76, 1992.CrossRefGoogle Scholar
  13. [13]
    N. Mukherjee and J.R. Gurd, A Comparative Analysis of Four Parallelisation Schemes. In Proceedings of the 13th ACM International Conference on Supercomputing (ICS’99), pages 278–285, Rhodes, Greece, June 1999.Google Scholar
  14. [14]
    OpenMP Architecture Review Board. OpenMP: A Proposed Industry Standard API for Shared Memory Programming. http://www.openmp.org, 1997.
  15. [15]
    R. Ponnusamy, J. Saltz, A. Choudhary, S. Hwang, and G. Fox. Runtime Support and Compilation Methods for User-Specified Data Distributions. IEEE Transactions on Parallel and Distributed Systems, 6(8):815–831, June 1995.CrossRefGoogle Scholar
  16. [16]
    L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 218–232, La Jolla, CA, June 1995.Google Scholar
  17. [17]
    S. Toxvaerd. Algorithms for Canonical Molecular Dynamics Simulations. Molecular Physics, 72(1).159–168, 1991.CrossRefGoogle Scholar
  18. [18]
    H. Yu and L. Rauchwerger. Adaptive Reduction Parallelization Techniques. In Proceedings of the 14th ACM International Conference on Supercomputing (ICS’2000), pages 66–77, Santa Fe, NM, May 2000.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Eladio Gutiérrez
    • 1
  • Oscar Plata
    • 1
  • Emilio L. Zapata
    • 1
  1. 1.Department of Computer ArchitectureUniversity of MálagaMálagaSpain

Personalised recommendations