Abstract
Parallelism is still one of the most prominent techniques to improve the performance of large application programs. Parallelism can be detected and exploited on several different levels, including instruction level parallelism, data parallelism, functional parallelism and loop parallelism. A suitable mixture of different levels of parallelism can often improve the performance significantly and the task of parallel programming is to find and code the corresponding programs,
We discuss the potential of using multiple levels of parallelism in applications from scientific computing and specifically consider the programming with hierarchically structured multiprocessor tasks. A multiprocessor task can be mapped on a group of processors and can be executed concurrently to other independent tasks. Internally, a multiprocessor task can consist of a hierarchical composition of smaller tasks or can incorporate any kind of data, thread, or SPMD parallelism. Such a programming model is suitable for applications with an inherent modular structure. Examples are environmental models combining atmospheric, surface water, and ground water models, or aircraft simulations combining models for fluid dynamics, structural mechanics, and surface heating. But also methods like specific ODE solvers or hierarchical matrix computations benefit from multiple levels of parallelism. Examples from both areas are discussed.
Chapter PDF
Similar content being viewed by others
References
Banicescu, I. and Velusamy, V. (2002). Load balancing highly irregular computations with the adaptive factoring. In Proc. of the IEEE-International Parallel and Distributed Processing Symposium (IPDPS 2002)-Heterogeneous Computing Workshop. IEEE Computer Society Press, Fort Lauderdale.
Banicescu, I., Velusamy, V, and Devaprasad, J. (2003). On the Scalability of Dynamic Scheduling Scientific Applications with Adaptive Weighted Factoring. Cluster Computing, The Journal of Networks, Software Tools and Applications, 6(3):215–226.
Deuflhard, P. (1985). Recent progress in extrapolation methods for ordinary differential equations. SIAM Review, 27:505–535.
Forum, H. P. F. (1993). High Performance Fortran Language Specification. Scientific Programming, 2(1).
Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., and Sunderam, V. (1996). PVM Parallel Virtual Machine: A User's Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge, MA.
Hairer, E., Norsett, S., and Wanner, G. (1993). Solving Ordinary Differential Equations I: Nonstiff Problems. Springer-Verlag, Berlin.
Hennessy, J. and Patterson, D. (2003). Computer Architecture — A Quantitative Approach. Morgan Kaufmann, 3nd edition.
Hippold, J. and Rünger, G. (2003). Task Pool Teams for Implementing Irregular Algorithms on Clusters of SMPs. In Proc. of the IPDPS (International Parallel and Distributed Processing Symposium), Nice, France. IEEE.
Hoffmann, R., Korch, M., and Rauber, T. (2004). Using Hardware Operations to Reduce the Synchronization Overhead of Task Pools. In Proc. of the Int. Conference on Parallel Processing (ICPP), pages 241–249.
Hunold, S., Rauber, T., and Rünger, G. (2004a). Hierarchical Matrix-Matrix Multiplication based on Multiprocessor Tasks. In Bubak, M., van Albada, G., Sloot, P. M., and Dongarra, J. J., editors, Proc. of the International Conference on Computational Science ICCS 2004, Part II, LNCS 3037, pages 1–8. Springer.
Hunold, S., Rauber, T., and Rünger, G. (2004b). Multilevel Hierarchical Matrix Multiplication on Clusters. In Proc. of the 18th Annual ACM International Conference on Supercomputing, ICS'04, pages 136–145.
Polychronopoulos, C. and Kuck, D. (1987). Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Transactions on Computers, C-36(l2): 1425–1439.
Rauber, T. and Rünger, G. (2000). A Transformation Approach to Derive Efficient Parallel Implementations. IEEE Transactions on Software Engineering, 26(4):315–339.
Rauber, T. and Rünger, G. (2002). Library Support for Hierarchical Multi-Processor Tasks. In Proc. of the Supercomputing 2002, Baltimore, USA. ACM/IEEE.
Singh, J. (1993). Parallel Hierarchical N-Body Methods and their Implication for Multiprocessors. PhD thesis, Stanford University.
Snir, M., Otto, S., Huss-Ledermann, S., Walker, D., and Dongarra, J. (1998). MPI: The Complete Reference, Vol.1: The MPI Core. MIT Press, Camdridge, MA.
van der Houwen, P. and Sommeijer, B. (1990a). Parallel Iteration of high—order Runge—Kutta Methods with stepsize control. Journal of Computational and Applied Mathematics, 29:111–127.
van der Houwen, P. and Sommeijer, B. (1990b). Parallel ODE Solvers. In Proc. of the ACM Int. Conf. on Supercomputing, pages 71–81.
Whaley, R. C. and Dongarra, J. J. (1997). Automatically Tuned Linear Algebra Software. Technical Report UT-CS-97-366, University of Tennessee.
Wolfe, M. (1996). High Performance Compilers for Parallel Computing. Addison Wesley.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 International Federation for Information Processing
About this paper
Cite this paper
Rauber, T., Rünger, G. (2005). Exploiting Multiple Levels of Parallelism in Scientific Computing. In: Ng, M.K., Doncescu, A., Yang, L.T., Leng, T. (eds) High Performance Computational Science and Engineering. IFIP — The International Federation for Information Processing, vol 172. Springer, Boston, MA. https://doi.org/10.1007/0-387-24049-7_1
Download citation
DOI: https://doi.org/10.1007/0-387-24049-7_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24048-0
Online ISBN: 978-0-387-24049-7
eBook Packages: Computer ScienceComputer Science (R0)