The most important motivation for using a parallel system is the reduction of the execution time of computation-intensive application programs. The execution time of a parallel program depends on many factors, including the architecture of the execution platform, the compiler and operating system used, the parallel programming environment and the parallel programming model on which the environment is based, as well as properties of the application program such as locality of memory references or dependencies between the computations to be performed. In principle, all these factors have to be taken into consideration when developing a parallel program. However, there may be complex interactions between these factors, and it is therefore difficult to consider them all.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
F. Abolhassan, J. Keller, and W.J. Paul. On the Cost–Effectiveness of PRAMs. In Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing, pages 2–9, 1991.
A. Aggarwal, A.K. Chandra, and M. Snir. On Communication Latency in PRAM Computations. In Proceedings of 1989 ACM Symposium on Parallel Algorithms and Architectures (SPAA’89), pages 11–21, 1989.
A. Aggarwal, A.K. Chandra, and M. Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71: 3–28, 1990.
K. Al-Tawil and C.A. Moritz. LogGP Performance Evaluation of MPI. International Symposium on High-Performance Distributed Computing, page 366, 1998.
A. Alexandrov, M. Ionescu, K.E. Schauser, and C. Scheiman. LogGP: Incorporating Long Messages into the LogP Model – One Step Closer Towards a Realistic Model for Parallel Computation. In Proceedings of the 7th ACM Symposium on Parallel Algorithms and Architectures (SPAA’95), pages 95–105, Santa Barbara, July 1995.
G.S. Almasi and A. Gottlieb. Highly Parallel Computing. Benjamin Cummings, New York, 1994.
G. Amdahl. Validity of the Single Processor Approach to Achieving Large-Scale Computer Capabilities. In AFIPS Conference Proceedings, volume 30, pages 483–485, 1967.
D.P. Bertsekas and J.N. Tsitsiklis. Parallel and Distributed Computation. Athena Scientific, Nashua, 1997.
A. Chin. Complexity Models for All-Purpose Parallel Computation. In Lectures on Parallel Computation, chapter 14. Cambridge University Press, Cambridge, 1993.
T.H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms. MIT Press, Cambridge, 2001.
D.E. Culler, A.C. Dusseau, R.P. Martin, and K.E. Schauser. Fast Parallel Sorting Under LogP: From Theory to Practice. In Portability and Performance for Parallel Processing, pages 71–98. Wiley, Southampton, 1994.
D.E. Culler, R. Karp, A. Sahay, K.E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a Realistic Model of Parallel Computation. Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’93), pages 1–12, 1993.
H.J. Curnov and B.A. Wichmann. A Synthetic Benchmark. The Computer Journal, 19(1): 43–49, 1976.
DEC. The Whetstone Performance. Technical Report, Digital Equipment Corporation, 1986.
J. Dongarra. Performance of various Computers using Standard Linear Equations Software in Fortran Environment. Technical Report CS-89–85, Computer Science Department, University of Tennessee, Knoxville, 1990.
J. Dongarra and W. Gentzsch, editors. Computer Benchmarks. Elsevier, North Holland, 1993.
J.T. Feo. An analysis of the computational and parallel complexity of the Livermore loops. Parallel Computing, 7: 163–185, 1988.
S. Fortune and J. Wyllie. Parallelism in Random Access Machines. In Proceedings of the 10th ACM Symposium on Theory of Computing, pages 114–118, 1978.
P.B. Gibbons. A More Practical PRAM Model. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures (SPAA’89), pages 158–168, 1989.
S. Goedecker and A. Hoisie. Performance Optimization of Numerically Intensive Codes. SIAM, Philadelphia, 2001.
M.W. Goudreau, J.M. Hill, K. Lang, W.F. McColl, S.D. Rao, D.C. Stefanescu, T. Suel, and T. Tsantilas. A proposal for a BSP Worldwide standard. Technical Report, BSP Worldwide, www.bsp-worldwide.org, 1996.
A. Grame, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Programming. Addison Wesley, Reading, 2003.
T. Grün, T. Rauber, and J. Röhrig. Support for efficient programming on the SB-PRAM. International Journal of Parallel Programming, 26(3): 209–240, 1998.
J.L. Gustafson. Reevaluating Amdahl’s law. Communications of the ACM, 31(5): 532–533, 1988.
J.L. Hennessy and D.A. Patterson. Computer Architecture – A Quantitative Approach. 4th edition, Morgan Kaufmann, Boston, 2007.
M. Hill, W. McColl, and D. Skillicorn. Questions and answers about BSP. Scientific Programming, 6(3): 249–274, 1997.
R.W. Hockney. The Science of Computer Benchmarking. SIAM, Philadelphia, 1996.
F. Ino, N. Fujimoto, and K. Hagihara. LogGPS: A Parallel Computational Model for Synchronization Analysis. In PPoPP ’01: Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, pages 133–142, ACM, New York, 2001.
J. Jájá. An Introduction to Parallel Algorithms. Addison-Wesley, New York, 1992.
S. Johnsson and C. Ho. Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers, 38(9): 1249–1268, 1989.
J. Keller, C.W. Keβler, and J.L. Träff. Practical PRAM Programming. Wiley, New York, 2001.
J. Keller, T. Rauber, and B. Rederlechner. Conservative Circuit Simulation on Shared–Memory Multiprocessors. In Proceedings of the 10th Workshop on Parallel and Distributed Simulation (PADS’96), pages 126–134, ACM, 1996.
T. Kielmann, H.E. Bal, and K. Verstoep. Fast Measurement of LogP Parameters for Message Passing Platforms. In IPDPS ’00: Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing, pages 1176–1183, Springer, London, 2000.
F. McMahon. The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range. Technical Report UCRL-53745, Lawrence Livermore National Laboratory, Livermore, 1986.
R. Miller and L. Boxer. Algorithms Sequential and Parallel. Prentice Hall, Upper Saddle River, 2000.
C.H. Papadimitriou and M. Yannakakis. Towards an Architecture-Independent Analysis of Parallel Algorithms. In Proceedings of the 20th ACM Symposium on Theory of Computing, pages 510–513, 1988.
D.A. Patterson and J.L. Hennessy. Computer Organization & Design – The Hardware/Software Interface. 4th edition, Morgan Kaufmann, San Francisco, 2008.
A. Podehl, T. Rauber, and G. Rünger. A shared-memory implementation of the hierarchical radiosity method. Theoretical Computer Science, 196(1–2): 215–240, 1998.
T. Rauber, G. Rünger, and C. Scholtes. Execution behavior analysis and performance prediction for a shared-memory implementation of an irregular particle simulation method. Simulation: Practice and Theory, 6: 665–687, 1998.
J. Savage. Models of Computation: Exploring the Power of Computing. Addison-Wesley Longman Publishing Co., Inc., Boston, 1997.
L.G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8): 103–111, 1990.
L.G. Valiant. A Bridging Model for Multi-core Computing. In Proceedings of the ESA, volume 5193, pages 13–28, Springer LNCS, 2008.
R.P. Weicker. Dhrystone: A synthetic system programming benchmark. Communications of the ACM, 29(10): 1013–1030, 1984.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Rauber, T., Rünger, G. (2010). Performance Analysis of Parallel Programs. In: Parallel Programming. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04818-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-04818-0_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04817-3
Online ISBN: 978-3-642-04818-0
eBook Packages: Computer ScienceComputer Science (R0)