Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP

Kasahara, Hironori; Obata, Motoki; Ishizaka, Kazuhisa

doi:10.1007/3-540-45574-4_13

Hironori Kasahara⁵,
Motoki Obata⁵ &
Kazuhisa Ishizaka⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2017))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

299 Accesses
19 Citations

Abstract

This paper proposes a simple and efficient implementation method for a hierarchical coarse grain task parallel processing scheme on a SMP machine. OSCAR multigrain parallelizing compiler automatically generates parallelized code including OpenMP directives and its performance is evaluated on a commercial SMP machine. The coarse grain task parallel processing is important to improve the effective performance of wide range of multiprocessor systems from a single chip multiprocessor to a high performance computer beyond the limit of the loop parallelism. The proposed scheme decomposes a Fortran program into coarse grain tasks, analyzes parallelism among tasks by “Earliest Executable Condition Analysis” considering control and data dependencies, statically schedules the coarse grain tasks to threads or generates dynamic task scheduling codes to assign the tasks to threads and generates OpenMP Fortran source code for a SMP machine. The thread parallel code using OpenMP generated by OSCAR compiler forks threads only once at the beginning of the program and joins only once at the end even though the program is processed in parallel based on hierarchical coarse grain task parallel processing concept. The performance of the scheme is evaluated on 8-processor SMP machine, IBM RS6000 SP 604e High Node, using a newly developed OpenMP backend of OSCAR multigrain compiler. The evaluation shows that OSCAR compiler with IBM XL Fortran compiler version 5.1 gives us 1.5 to 3 times larger speedup than the native XL Fortran compiler for SPEC 95fp SWIM, TOMCATV, HYDRO2D, MGRID and Perfect Benchmarks ARC2D.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley, 1996.
Google Scholar
U. Banerjee. Loop Parallelization. Kluwer Academic Pub., 1994.
Google Scholar
U. Barnerjee. Dependence Analysis for Supercomputing. Kluwer Pub., 1989.
Google Scholar
P. Petersen and D. Padua. Static and Dynamic Evaluation of Data Dependence Analysis. Proc. Int’l conf. on supemputing, Jun. 1993.
Google Scholar
W. Pugh. The OMEGA Test: A Fast and Practical Integer Programming Algorithm for Dependence Alysis. Proc. Supercomputing’91, 1991.
Google Scholar
M. R. Haghighat and C. D. Polychronopoulos. Symbolic Analysis for Parallelizing Compliers. Kluwer Academic Publishers, 1995.
Google Scholar
P. Tu and D. Padua. Automatic Array Privatization. Proc. 6th Annual Workshop on Languages and Compilers for Parallel Computing, 1993.
Google Scholar
M. Wolfe. Optimizing Supercompilers for Supercomputers. MIT Press, 1989.
Google Scholar
D. Padua and M. Wolfe. Advanced Compiler Optimizations for Supercomputers. C.ACM, 29(12):1184–1201, Dec. 1986.
Article Google Scholar
Polaris. http://polaris.cs.uiuc.edu/polaris/.
R. Eigenmann, J. Hoeflinger, and D. Padua. On the Automatic Parallelization of the Perfect Benchmarks. IEEE Trans. on parallel and distributed systems, 9(1), Jan. 1998.
Google Scholar
L. Rauchwerger, N. M. Amato, and D. A. Padua. Run-Time Methods for Parallelizing Partially Parallel Loops. Proceedings of the 9th ACM International Conference on Supercomputing, Barcelona, Spain, pages 137–146, Jul. 1995.
Google Scholar
M. W. Hall, B. R. Murp hy, S. P. Amarasinghe, S. Liao,, and M. S. Lam. Interprocedural Parallelization Analysis: A Case Study. Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing (LCPC95), Aug. 1995.
Google Scholar
M. W. Hall, J. M. Anderson, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, E. Bugnion, and M. S. Lam. Maximizing Multiprocessor Performance with the SUIF Compiler. IEEE Computer, 1996.
Google Scholar
S. Amarasinghe, J. Anderson, M. Lam, and C. Tseng. The SUIF Compiler for Scalable Parallel Machines. Proc. of the 7th SIAM conference on parallel processing for scientific computing, 1995.
Google Scholar
M. S. Lam. Locallity Optimizations for Parallel Machines. Third Joint International Conference on Vector and Parallel Processing, Nov. 1994.
Google Scholar
J. M. Anderson, S. P. Amarasinghe, and M. S. Lam. Data and Computation Transformations for Multiprocessors. Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Processing, Jul. 1995.
Google Scholar
H. Han, G. Rivera, and C.-W. Tseng. Software Support for Improving Locality in Scientific Codes. 8th Workshop on Compilers for Parallel Computers (CPC'2000), Jan. 2000.
Google Scholar
G. Rivera and C.-W. Tseng. Locality Optimizations for Multi-Level Caches. Super Computing’ 99, Nov. 1999.
Google Scholar
A. Yoshida, K. Koshizuka, M. Okamoto, and H. Kasahara. A Data-Localization Scheme among Loops for each Layer in Hierarchical Coarse Grain Parallel Processing. Trans. of IPSJ, 40(5), May. 1999.
Google Scholar
PROMIS. http://www.csrd.uiuc.edu/promis/.
C. J. Brownhill, A. Nicolau, S. Novack, and C. D. Polychronopoulos. Achieving Multi-level Parallelization. Proc. of ISHPC’97, Nov. 1997.
Google Scholar
Parafrase2. http://www.csrd.uiuc.edu/parafrase2/.
M. Girkar and C. Polychronopoulos. Optimization of Data/Control Conditions in Task Graphs. Proc. 4th Workshop on Languages and Compilers for Parallel Computing, Aug. 1991.
Google Scholar
X. Martorell, E. Ayguade, N. Navarro, J. Corbalan, M. Gozalez, and J. Labarta. Thread Fork/Join Techniques for Multi-level Parllelism Exploitation in NUMA Multiprocessors. ICS’99 Rhodes Greece, 1999.
Google Scholar
E. Ayguade, X. Martorell, J. Labarta, M. Gonzalez, and N. Navarro. Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study. ICPP’99, Sep. 1999.
Google Scholar
OpenMP: Simple, Portable, Scalable SMP Programming http://www.openmp.org/.
L. Dagum and R. Menon. OpenMP: An Industry Standard API for Shared Memory Programming. IEEE Computational Science & Engineering, 1998.
Google Scholar
H. K. et al. A Multi-grain Parallelizing Compilation Scheme on OSCAR. Proc. 4th Workshop on Languages and Compilers for Parallel Computing, Aug. 1991.
Google Scholar
M. Okamoto, K. Aida, M. Miyazawa, H. Honda, and H. Kasahara. A Hierarchical Macro-dataflow Computation Scheme of OSCAR Multi-grain Compiler. Trans. IPSJ, 35(4):513–521, Apr. 1994.
Google Scholar
H. Kasahara, M. Okamoto, A. Yoshida, W. Ogata, K. Kimura, G. Matsui, H. Matsuzaki, and H. Honda. OSCAR Multi-grain Architecture and Its Evaluation. Proc. International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems, Oct. 1997.
Google Scholar
H. Kasahara, H. Honda, M. Iwata, and M. Hirota. A Macro-dataflow Compilation Scheme for Hierarchical Multiprocessor Systems. Proc. Int’l. Conf. on Parallel Processing, Aug. 1990.
Google Scholar
H. Honda, M. Iwata, and H. Kasahara. Coarse Grain Parallelism Detection Scheme of Fortran programs. Trans. IEICE (in Japanese), J73-D-I(12), Dec. 1990.
Google Scholar
H. Kasahara. Parallel Processing Technology. Corona Publishing, Tokyo (in Japanese), Jun. 1991.
Google Scholar
H. Kasahara, H. Honda, and S. Narita. Parallel Processing of Near Fine Grain Tasks Using Static Scheduling on OSCAR. Proc. IEEE ACM Supercomputing’90, Nov. 1990.
Google Scholar
J. E. Moreira and C. D. Polychronopoulos. Autoscheduling in a Shared Memory Multiprocessor. CSRD Report No.1337, 1994.
Google Scholar
H. Kasahara, S. Narita, and S. Hashimoto. OSCAR’s Architecture. Trans. IEICE (in Japanese), J71-D-I(8), Aug. 1988.
Google Scholar
IBM. XL Fortran for AIX Language Reference.
Google Scholar
D. H. Kulkarni, S. Tandri, L. Martin, N. Copty, R. Silvera, X.-M. Tian, X. Xue, and J. Wang. XL Fortran Compiler for IBM SMP Systems. AIXpert Magazine, Dec. 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

Waseda University, 3-4-1 Ohkubo, Shinjuku-ku, Tokyo, 169-8555, Japan
Hironori Kasahara, Motoki Obata & Kazuhisa Ishizaka

Authors

Hironori Kasahara
View author publications
You can also search for this author in PubMed Google Scholar
Motoki Obata
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhisa Ishizaka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IBM T.J.Watson Research Center, P.O. Box 218, Yorktown Heights, NY, 10598, USA
Samuel P. Midkiff , José E. Moreira , Manish Gupta & Siddhartha Chatterjee , , &
Computer Science and Engineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0114, USA
Jeanne Ferrante
Department of Computer Science, University of North Carolina, Chapel Hill, NC, 27599-3175, USA
Jan Prins
Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
William Pugh & Chau-Wen Tseng &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kasahara, H., Obata, M., Ishizaka, K. (2001). Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP. In: Midkiff, S.P., et al. Languages and Compilers for Parallel Computing. LCPC 2000. Lecture Notes in Computer Science, vol 2017. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45574-4_13

Download citation

DOI: https://doi.org/10.1007/3-540-45574-4_13
Published: 04 December 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42862-6
Online ISBN: 978-3-540-45574-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics