Abstract
One of the major challenges facing high performance computing is the daunting task of producing programs that will achieve acceptable levels of performance when run on parallel architectures. Although many organizations have been actively working in this area for some time, many programs have yet to be parallelized. Furthermore, some programs that were parallelized were done so for obsolete systems. These programs may run poorly, if at all, on the current generation of parallel computers. Therefore, a straightforward approach to parallelizing vectorizable codes is needed without introducing any changes to the algorithm or the convergence properties of the codes. Using the combination of loop-level parallelism, and RISC-based shared memory SMPs has proven to be a successful approach to solving this problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wang, G. and Tafti, D. K.: Performance Enhancement on Microprocessors With Hierarchical Memory Systems for Solving Large Sparse Linear System. International Journal of Supercomputing Applications (1997).
Steger, J. L., Ying, S. X., and Schiff, L. B.: A Partially Flux-Split Algorithm for Numerical Simulation of Compressible Inviscid and Viscous Flows. Proceedings of the Workshop on CFD. Davis, California (1986).
Bailey, D. H.: Microprocessors and Scientific Computing. Proceedings of Supercomputing 93. Los Alamitos, California (1993).
Schimmel, C.: UNIX Systems for Modern Architectures, Symmetric Multiprocessing, and Caching for Kernel Programmers. Addison-Wesley Publishing Company, Reading, Massachusetts (1994).
Frumkin, M., Hribar, M., Jin, H., Waheed, A., and Yan, J.: A Comparison of Automatic Parallelization Tools/Compilers on the SGI Origin 2000. Proceedings for the SC98 Conference. Supercomp Organization, IEEE Computer Society, and ACM (1998). http://www.supercomp.org/sc98/TechPapers.
Pressel, D. M.:Results From the Porting of the Computational Fluid Dynamics Code F3D to the Convex Exemplar (SPP-1000 and SPP-1600), ARL-TR-1923. U.S. Army Research Laboratory, Aberdeen Proving Ground, Maryland (1999).
Edge, H. L., Sahu, J., Sturek, W. B., Pressel, D. M., Heavey, K. R., Weinacht, P., Zoltani, C. K., Nietubicz, C. J., Clarke, J., Behr, and M., Collins, P.: Common High Performance Computing Software Support Initiative (CHSSI) Computational Fluid Dynamics (CFD) Project Final Report: ARL Block-Structured Gridding Zonal Navier-Stokes Flow (ZNSFLOW) Solver Software, ARL-TR-2084. U.S. Army Research Laboratory, Aberdeen Proving Ground, Maryland (2000).
Laudon, J. and Lenoski, D.: The SGI Origin: A ccNUMA Highly Scalable Server. Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA’ 97), Denver, Colorado. June 2-4, 1997. IEEE Computer Society, Los Alamitos, California.
Theys, M. D., Braun, T. D., and Siegel, H. J.: Widespread Acceptance of General-Purpose, Large-Scale Parallel Machines: Fact, Future, or Fantasy? IEEE Concurrency Parallel, Distributed, and Mobile Computing, Vol. 6, No.1 (1998).
Hisley, D. M., Agrawal, G., and Pollock, L.: Performance Studies of the Parallelization of a CFD Solver on the Origin 2000. Proceedings for the 21st Army Science Conference. Department of the Army (1998).
Oberlin, S.: Keynote Address at the International Symposium on Computer Architecture (ISCA’ 99) (1999).At the time, S. Oberlin had been the Vice President for Software at SGI, having previously been the Vice President for Hardware.
Behr, M., Pressel, D. M., and Sturek, W. B., Jr.: Comments on CFD Code Performance on Scalable Architectures. Computer Methods in Applied Mechanics and Engineering, Vol. 190 (2000) 263–277.
Taft, J.: Initial SGI Origin 2000 Tests Show Promise for CFD Codes. NAS News, Vol. 2, No. 25. NASA Ames Research Center (1997).
Taft, J. R.: Shared Memory Multi-Level Parallelism for CFD, Overflow-MLP: A Case Study. Presented at the Cray User Group Origin 2000 Workshop, Denver, Colorado, October 11-13, 1998.
Taft, J. R.: Achieving 60 GFLOP/S on the Production CFD CODE OVERFLOW-MLP. Presented at WOMPAT 2000, Workshop on OpenMP Applications and Tools, San Diego, California, July 6-7, 2000.
Keleher, P., Cox, A. L., Dwarkadas, S., and Zwaenepoel, W.: readMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. Proceedings of the Winter 94 Usenix Conference (1994).http://www.cs.rice.edu/~willy/TreadMarks/papers.html.
Hagersten, E. and Koster, M.: WildFire: A Scalable Path for SMPs. Proceedings of the 5th International Symposium on High-Performance Computer Architecture (HPCA), Orlando, Florida, January 9-13, 1999.IEEE Computer Society, Los Alamitos, California.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pressel, D.M., Sahu, J., Heavey, K.R. (2001). Using Loop- Level Parallelism to Parallelize Vectorizable Programs. In: Mueller, F. (eds) High-Level Parallel Programming Models and Supportive Environments. HIPS 2001. Lecture Notes in Computer Science, vol 2026. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45401-2_3
Download citation
DOI: https://doi.org/10.1007/3-540-45401-2_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41944-0
Online ISBN: 978-3-540-45401-4
eBook Packages: Springer Book Archive