Abstract
This paper proposes a simple and efficient implementation method for a hierarchical coarse grain task parallel processing scheme on a SMP machine. OSCAR multigrain parallelizing compiler automatically generates parallelized code including OpenMP directives and its performance is evaluated on a commercial SMP machine. The coarse grain task parallel processing is important to improve the effective performance of wide range of multiprocessor systems from a single chip multiprocessor to a high performance computer beyond the limit of the loop parallelism. The proposed scheme decomposes a Fortran program into coarse grain tasks, analyzes parallelism among tasks by “Earliest Executable Condition Analysis” considering control and data dependencies, statically schedules the coarse grain tasks to threads or generates dynamic task scheduling codes to assign the tasks to threads and generates OpenMP Fortran source code for a SMP machine. The thread parallel code using OpenMP generated by OSCAR compiler forks threads only once at the beginning of the program and joins only once at the end even though the program is processed in parallel based on hierarchical coarse grain task parallel processing concept. The performance of the scheme is evaluated on 8-processor SMP machine, IBM RS6000 SP 604e High Node, using a newly developed OpenMP backend of OSCAR multigrain compiler. The evaluation shows that OSCAR compiler with IBM XL Fortran compiler version 5.1 gives us 1.5 to 3 times larger speedup than the native XL Fortran compiler for SPEC 95fp SWIM, TOMCATV, HYDRO2D, MGRID and Perfect Benchmarks ARC2D.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley, 1996.
U. Banerjee. Loop Parallelization. Kluwer Academic Pub., 1994.
U. Barnerjee. Dependence Analysis for Supercomputing. Kluwer Pub., 1989.
P. Petersen and D. Padua. Static and Dynamic Evaluation of Data Dependence Analysis. Proc. Int’l conf. on supemputing, Jun. 1993.
W. Pugh. The OMEGA Test: A Fast and Practical Integer Programming Algorithm for Dependence Alysis. Proc. Supercomputing’91, 1991.
M. R. Haghighat and C. D. Polychronopoulos. Symbolic Analysis for Parallelizing Compliers. Kluwer Academic Publishers, 1995.
P. Tu and D. Padua. Automatic Array Privatization. Proc. 6th Annual Workshop on Languages and Compilers for Parallel Computing, 1993.
M. Wolfe. Optimizing Supercompilers for Supercomputers. MIT Press, 1989.
D. Padua and M. Wolfe. Advanced Compiler Optimizations for Supercomputers. C.ACM, 29(12):1184–1201, Dec. 1986.
Polaris. http://polaris.cs.uiuc.edu/polaris/.
R. Eigenmann, J. Hoeflinger, and D. Padua. On the Automatic Parallelization of the Perfect Benchmarks. IEEE Trans. on parallel and distributed systems, 9(1), Jan. 1998.
L. Rauchwerger, N. M. Amato, and D. A. Padua. Run-Time Methods for Parallelizing Partially Parallel Loops. Proceedings of the 9th ACM International Conference on Supercomputing, Barcelona, Spain, pages 137–146, Jul. 1995.
M. W. Hall, B. R. Murp hy, S. P. Amarasinghe, S. Liao,, and M. S. Lam. Interprocedural Parallelization Analysis: A Case Study. Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing (LCPC95), Aug. 1995.
M. W. Hall, J. M. Anderson, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, E. Bugnion, and M. S. Lam. Maximizing Multiprocessor Performance with the SUIF Compiler. IEEE Computer, 1996.
S. Amarasinghe, J. Anderson, M. Lam, and C. Tseng. The SUIF Compiler for Scalable Parallel Machines. Proc. of the 7th SIAM conference on parallel processing for scientific computing, 1995.
M. S. Lam. Locallity Optimizations for Parallel Machines. Third Joint International Conference on Vector and Parallel Processing, Nov. 1994.
J. M. Anderson, S. P. Amarasinghe, and M. S. Lam. Data and Computation Transformations for Multiprocessors. Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Processing, Jul. 1995.
H. Han, G. Rivera, and C.-W. Tseng. Software Support for Improving Locality in Scientific Codes. 8th Workshop on Compilers for Parallel Computers (CPC'2000), Jan. 2000.
G. Rivera and C.-W. Tseng. Locality Optimizations for Multi-Level Caches. Super Computing’ 99, Nov. 1999.
A. Yoshida, K. Koshizuka, M. Okamoto, and H. Kasahara. A Data-Localization Scheme among Loops for each Layer in Hierarchical Coarse Grain Parallel Processing. Trans. of IPSJ, 40(5), May. 1999.
PROMIS. http://www.csrd.uiuc.edu/promis/.
C. J. Brownhill, A. Nicolau, S. Novack, and C. D. Polychronopoulos. Achieving Multi-level Parallelization. Proc. of ISHPC’97, Nov. 1997.
Parafrase2. http://www.csrd.uiuc.edu/parafrase2/.
M. Girkar and C. Polychronopoulos. Optimization of Data/Control Conditions in Task Graphs. Proc. 4th Workshop on Languages and Compilers for Parallel Computing, Aug. 1991.
X. Martorell, E. Ayguade, N. Navarro, J. Corbalan, M. Gozalez, and J. Labarta. Thread Fork/Join Techniques for Multi-level Parllelism Exploitation in NUMA Multiprocessors. ICS’99 Rhodes Greece, 1999.
E. Ayguade, X. Martorell, J. Labarta, M. Gonzalez, and N. Navarro. Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study. ICPP’99, Sep. 1999.
OpenMP: Simple, Portable, Scalable SMP Programming http://www.openmp.org/.
L. Dagum and R. Menon. OpenMP: An Industry Standard API for Shared Memory Programming. IEEE Computational Science & Engineering, 1998.
H. K. et al. A Multi-grain Parallelizing Compilation Scheme on OSCAR. Proc. 4th Workshop on Languages and Compilers for Parallel Computing, Aug. 1991.
M. Okamoto, K. Aida, M. Miyazawa, H. Honda, and H. Kasahara. A Hierarchical Macro-dataflow Computation Scheme of OSCAR Multi-grain Compiler. Trans. IPSJ, 35(4):513–521, Apr. 1994.
H. Kasahara, M. Okamoto, A. Yoshida, W. Ogata, K. Kimura, G. Matsui, H. Matsuzaki, and H. Honda. OSCAR Multi-grain Architecture and Its Evaluation. Proc. International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems, Oct. 1997.
H. Kasahara, H. Honda, M. Iwata, and M. Hirota. A Macro-dataflow Compilation Scheme for Hierarchical Multiprocessor Systems. Proc. Int’l. Conf. on Parallel Processing, Aug. 1990.
H. Honda, M. Iwata, and H. Kasahara. Coarse Grain Parallelism Detection Scheme of Fortran programs. Trans. IEICE (in Japanese), J73-D-I(12), Dec. 1990.
H. Kasahara. Parallel Processing Technology. Corona Publishing, Tokyo (in Japanese), Jun. 1991.
H. Kasahara, H. Honda, and S. Narita. Parallel Processing of Near Fine Grain Tasks Using Static Scheduling on OSCAR. Proc. IEEE ACM Supercomputing’90, Nov. 1990.
J. E. Moreira and C. D. Polychronopoulos. Autoscheduling in a Shared Memory Multiprocessor. CSRD Report No.1337, 1994.
H. Kasahara, S. Narita, and S. Hashimoto. OSCAR’s Architecture. Trans. IEICE (in Japanese), J71-D-I(8), Aug. 1988.
IBM. XL Fortran for AIX Language Reference.
D. H. Kulkarni, S. Tandri, L. Martin, N. Copty, R. Silvera, X.-M. Tian, X. Xue, and J. Wang. XL Fortran Compiler for IBM SMP Systems. AIXpert Magazine, Dec. 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kasahara, H., Obata, M., Ishizaka, K. (2001). Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP. In: Midkiff, S.P., et al. Languages and Compilers for Parallel Computing. LCPC 2000. Lecture Notes in Computer Science, vol 2017. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45574-4_13
Download citation
DOI: https://doi.org/10.1007/3-540-45574-4_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42862-6
Online ISBN: 978-3-540-45574-5
eBook Packages: Springer Book Archive