Performance Analysis of Parallel Programs

Rauber, Thomas; Rünger, Gudula

doi:10.1007/978-3-642-04818-0_4

Thomas Rauber³ &
Gudula Rünger⁴

3760 Accesses

The most important motivation for using a parallel system is the reduction of the execution time of computation-intensive application programs. The execution time of a parallel program depends on many factors, including the architecture of the execution platform, the compiler and operating system used, the parallel programming environment and the parallel programming model on which the environment is based, as well as properties of the application program such as locality of memory references or dependencies between the computations to be performed. In principle, all these factors have to be taken into consideration when developing a parallel program. However, there may be complex interactions between these factors, and it is therefore difficult to consider them all.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

F. Abolhassan, J. Keller, and W.J. Paul. On the Cost–Effectiveness of PRAMs. In Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing, pages 2–9, 1991.
Google Scholar
A. Aggarwal, A.K. Chandra, and M. Snir. On Communication Latency in PRAM Computations. In Proceedings of 1989 ACM Symposium on Parallel Algorithms and Architectures (SPAA’89), pages 11–21, 1989.
Google Scholar
A. Aggarwal, A.K. Chandra, and M. Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71: 3–28, 1990.
Article MATH MathSciNet Google Scholar
K. Al-Tawil and C.A. Moritz. LogGP Performance Evaluation of MPI. International Symposium on High-Performance Distributed Computing, page 366, 1998.
Google Scholar
A. Alexandrov, M. Ionescu, K.E. Schauser, and C. Scheiman. LogGP: Incorporating Long Messages into the LogP Model – One Step Closer Towards a Realistic Model for Parallel Computation. In Proceedings of the 7th ACM Symposium on Parallel Algorithms and Architectures (SPAA’95), pages 95–105, Santa Barbara, July 1995.
Google Scholar
G.S. Almasi and A. Gottlieb. Highly Parallel Computing. Benjamin Cummings, New York, 1994.
MATH Google Scholar
G. Amdahl. Validity of the Single Processor Approach to Achieving Large-Scale Computer Capabilities. In AFIPS Conference Proceedings, volume 30, pages 483–485, 1967.
Google Scholar
D.P. Bertsekas and J.N. Tsitsiklis. Parallel and Distributed Computation. Athena Scientific, Nashua, 1997.
Google Scholar
A. Chin. Complexity Models for All-Purpose Parallel Computation. In Lectures on Parallel Computation, chapter 14. Cambridge University Press, Cambridge, 1993.
Google Scholar
T.H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms. MIT Press, Cambridge, 2001.
MATH Google Scholar
D.E. Culler, A.C. Dusseau, R.P. Martin, and K.E. Schauser. Fast Parallel Sorting Under LogP: From Theory to Practice. In Portability and Performance for Parallel Processing, pages 71–98. Wiley, Southampton, 1994.
Google Scholar
D.E. Culler, R. Karp, A. Sahay, K.E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a Realistic Model of Parallel Computation. Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’93), pages 1–12, 1993.
Google Scholar
H.J. Curnov and B.A. Wichmann. A Synthetic Benchmark. The Computer Journal, 19(1): 43–49, 1976.
Article Google Scholar
DEC. The Whetstone Performance. Technical Report, Digital Equipment Corporation, 1986.
Google Scholar
J. Dongarra. Performance of various Computers using Standard Linear Equations Software in Fortran Environment. Technical Report CS-89–85, Computer Science Department, University of Tennessee, Knoxville, 1990.
Google Scholar
J. Dongarra and W. Gentzsch, editors. Computer Benchmarks. Elsevier, North Holland, 1993.
MATH Google Scholar
J.T. Feo. An analysis of the computational and parallel complexity of the Livermore loops. Parallel Computing, 7: 163–185, 1988.
Article MATH Google Scholar
S. Fortune and J. Wyllie. Parallelism in Random Access Machines. In Proceedings of the 10th ACM Symposium on Theory of Computing, pages 114–118, 1978.
Google Scholar
P.B. Gibbons. A More Practical PRAM Model. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures (SPAA’89), pages 158–168, 1989.
Google Scholar
S. Goedecker and A. Hoisie. Performance Optimization of Numerically Intensive Codes. SIAM, Philadelphia, 2001.
MATH Google Scholar
M.W. Goudreau, J.M. Hill, K. Lang, W.F. McColl, S.D. Rao, D.C. Stefanescu, T. Suel, and T. Tsantilas. A proposal for a BSP Worldwide standard. Technical Report, BSP Worldwide, www.bsp-worldwide.org, 1996.
A. Grame, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Programming. Addison Wesley, Reading, 2003.
Google Scholar
T. Grün, T. Rauber, and J. Röhrig. Support for efficient programming on the SB-PRAM. International Journal of Parallel Programming, 26(3): 209–240, 1998.
Article Google Scholar
J.L. Gustafson. Reevaluating Amdahl’s law. Communications of the ACM, 31(5): 532–533, 1988.
Article Google Scholar
J.L. Hennessy and D.A. Patterson. Computer Architecture – A Quantitative Approach. 4th edition, Morgan Kaufmann, Boston, 2007.
Google Scholar
M. Hill, W. McColl, and D. Skillicorn. Questions and answers about BSP. Scientific Programming, 6(3): 249–274, 1997.
Google Scholar
R.W. Hockney. The Science of Computer Benchmarking. SIAM, Philadelphia, 1996.
Google Scholar
F. Ino, N. Fujimoto, and K. Hagihara. LogGPS: A Parallel Computational Model for Synchronization Analysis. In PPoPP ’01: Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, pages 133–142, ACM, New York, 2001.
Google Scholar
J. Jájá. An Introduction to Parallel Algorithms. Addison-Wesley, New York, 1992.
MATH Google Scholar
S. Johnsson and C. Ho. Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers, 38(9): 1249–1268, 1989.
Article MathSciNet Google Scholar
J. Keller, C.W. Keβler, and J.L. Träff. Practical PRAM Programming. Wiley, New York, 2001.
Google Scholar
J. Keller, T. Rauber, and B. Rederlechner. Conservative Circuit Simulation on Shared–Memory Multiprocessors. In Proceedings of the 10th Workshop on Parallel and Distributed Simulation (PADS’96), pages 126–134, ACM, 1996.
Google Scholar
T. Kielmann, H.E. Bal, and K. Verstoep. Fast Measurement of LogP Parameters for Message Passing Platforms. In IPDPS ’00: Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing, pages 1176–1183, Springer, London, 2000.
Google Scholar
F. McMahon. The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range. Technical Report UCRL-53745, Lawrence Livermore National Laboratory, Livermore, 1986.
Google Scholar
R. Miller and L. Boxer. Algorithms Sequential and Parallel. Prentice Hall, Upper Saddle River, 2000.
Google Scholar
C.H. Papadimitriou and M. Yannakakis. Towards an Architecture-Independent Analysis of Parallel Algorithms. In Proceedings of the 20th ACM Symposium on Theory of Computing, pages 510–513, 1988.
Google Scholar
D.A. Patterson and J.L. Hennessy. Computer Organization & Design – The Hardware/Software Interface. 4th edition, Morgan Kaufmann, San Francisco, 2008.
Google Scholar
A. Podehl, T. Rauber, and G. Rünger. A shared-memory implementation of the hierarchical radiosity method. Theoretical Computer Science, 196(1–2): 215–240, 1998.
Article MATH Google Scholar
T. Rauber, G. Rünger, and C. Scholtes. Execution behavior analysis and performance prediction for a shared-memory implementation of an irregular particle simulation method. Simulation: Practice and Theory, 6: 665–687, 1998.
Article Google Scholar
J. Savage. Models of Computation: Exploring the Power of Computing. Addison-Wesley Longman Publishing Co., Inc., Boston, 1997.
Google Scholar
L.G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8): 103–111, 1990.
Article Google Scholar
L.G. Valiant. A Bridging Model for Multi-core Computing. In Proceedings of the ESA, volume 5193, pages 13–28, Springer LNCS, 2008.
Google Scholar
R.P. Weicker. Dhrystone: A synthetic system programming benchmark. Communications of the ACM, 29(10): 1013–1030, 1984.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Universität Bayreuth, 95440, Bayreuth, Germany
Thomas Rauber
Computer Science Department, Technische Universität Chemnitz, 09107, Chemnitz, Germany
Gudula Rünger

Authors

Thomas Rauber
View author publications
You can also search for this author in PubMed Google Scholar
Gudula Rünger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Rauber .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rauber, T., Rünger, G. (2010). Performance Analysis of Parallel Programs. In: Parallel Programming. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04818-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-04818-0_4
Published: 04 December 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04817-3
Online ISBN: 978-3-642-04818-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics