Skip to main content

Performance Analysis of Parallel Programs

  • Chapter
  • First Online:
Book cover Parallel Programming
  • 3760 Accesses

The most important motivation for using a parallel system is the reduction of the execution time of computation-intensive application programs. The execution time of a parallel program depends on many factors, including the architecture of the execution platform, the compiler and operating system used, the parallel programming environment and the parallel programming model on which the environment is based, as well as properties of the application program such as locality of memory references or dependencies between the computations to be performed. In principle, all these factors have to be taken into consideration when developing a parallel program. However, there may be complex interactions between these factors, and it is therefore difficult to consider them all.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. F. Abolhassan, J. Keller, and W.J. Paul. On the Cost–Effectiveness of PRAMs. In Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing, pages 2–9, 1991.

    Google Scholar 

  2. A. Aggarwal, A.K. Chandra, and M. Snir. On Communication Latency in PRAM Computations. In Proceedings of 1989 ACM Symposium on Parallel Algorithms and Architectures (SPAA’89), pages 11–21, 1989.

    Google Scholar 

  3. A. Aggarwal, A.K. Chandra, and M. Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71: 3–28, 1990.

    Article  MATH  MathSciNet  Google Scholar 

  4. K. Al-Tawil and C.A. Moritz. LogGP Performance Evaluation of MPI. International Symposium on High-Performance Distributed Computing, page 366, 1998.

    Google Scholar 

  5. A. Alexandrov, M. Ionescu, K.E. Schauser, and C. Scheiman. LogGP: Incorporating Long Messages into the LogP Model – One Step Closer Towards a Realistic Model for Parallel Computation. In Proceedings of the 7th ACM Symposium on Parallel Algorithms and Architectures (SPAA’95), pages 95–105, Santa Barbara, July 1995.

    Google Scholar 

  6. G.S. Almasi and A. Gottlieb. Highly Parallel Computing. Benjamin Cummings, New York, 1994.

    MATH  Google Scholar 

  7. G. Amdahl. Validity of the Single Processor Approach to Achieving Large-Scale Computer Capabilities. In AFIPS Conference Proceedings, volume 30, pages 483–485, 1967.

    Google Scholar 

  8. D.P. Bertsekas and J.N. Tsitsiklis. Parallel and Distributed Computation. Athena Scientific, Nashua, 1997.

    Google Scholar 

  9. A. Chin. Complexity Models for All-Purpose Parallel Computation. In Lectures on Parallel Computation, chapter 14. Cambridge University Press, Cambridge, 1993.

    Google Scholar 

  10. T.H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms. MIT Press, Cambridge, 2001.

    MATH  Google Scholar 

  11. D.E. Culler, A.C. Dusseau, R.P. Martin, and K.E. Schauser. Fast Parallel Sorting Under LogP: From Theory to Practice. In Portability and Performance for Parallel Processing, pages 71–98. Wiley, Southampton, 1994.

    Google Scholar 

  12. D.E. Culler, R. Karp, A. Sahay, K.E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a Realistic Model of Parallel Computation. Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’93), pages 1–12, 1993.

    Google Scholar 

  13. H.J. Curnov and B.A. Wichmann. A Synthetic Benchmark. The Computer Journal, 19(1): 43–49, 1976.

    Article  Google Scholar 

  14. DEC. The Whetstone Performance. Technical Report, Digital Equipment Corporation, 1986.

    Google Scholar 

  15. J. Dongarra. Performance of various Computers using Standard Linear Equations Software in Fortran Environment. Technical Report CS-89–85, Computer Science Department, University of Tennessee, Knoxville, 1990.

    Google Scholar 

  16. J. Dongarra and W. Gentzsch, editors. Computer Benchmarks. Elsevier, North Holland, 1993.

    MATH  Google Scholar 

  17. J.T. Feo. An analysis of the computational and parallel complexity of the Livermore loops. Parallel Computing, 7: 163–185, 1988.

    Article  MATH  Google Scholar 

  18. S. Fortune and J. Wyllie. Parallelism in Random Access Machines. In Proceedings of the 10th ACM Symposium on Theory of Computing, pages 114–118, 1978.

    Google Scholar 

  19. P.B. Gibbons. A More Practical PRAM Model. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures (SPAA’89), pages 158–168, 1989.

    Google Scholar 

  20. S. Goedecker and A. Hoisie. Performance Optimization of Numerically Intensive Codes. SIAM, Philadelphia, 2001.

    MATH  Google Scholar 

  21. M.W. Goudreau, J.M. Hill, K. Lang, W.F. McColl, S.D. Rao, D.C. Stefanescu, T. Suel, and T. Tsantilas. A proposal for a BSP Worldwide standard. Technical Report, BSP Worldwide, www.bsp-worldwide.org, 1996.

  22. A. Grame, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Programming. Addison Wesley, Reading, 2003.

    Google Scholar 

  23. T. Grün, T. Rauber, and J. Röhrig. Support for efficient programming on the SB-PRAM. International Journal of Parallel Programming, 26(3): 209–240, 1998.

    Article  Google Scholar 

  24. J.L. Gustafson. Reevaluating Amdahl’s law. Communications of the ACM, 31(5): 532–533, 1988.

    Article  Google Scholar 

  25. J.L. Hennessy and D.A. Patterson. Computer ArchitectureA Quantitative Approach. 4th edition, Morgan Kaufmann, Boston, 2007.

    Google Scholar 

  26. M. Hill, W. McColl, and D. Skillicorn. Questions and answers about BSP. Scientific Programming, 6(3): 249–274, 1997.

    Google Scholar 

  27. R.W. Hockney. The Science of Computer Benchmarking. SIAM, Philadelphia, 1996.

    Google Scholar 

  28. F. Ino, N. Fujimoto, and K. Hagihara. LogGPS: A Parallel Computational Model for Synchronization Analysis. In PPoPP ’01: Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, pages 133–142, ACM, New York, 2001.

    Google Scholar 

  29. J. Jájá. An Introduction to Parallel Algorithms. Addison-Wesley, New York, 1992.

    MATH  Google Scholar 

  30. S. Johnsson and C. Ho. Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers, 38(9): 1249–1268, 1989.

    Article  MathSciNet  Google Scholar 

  31. J. Keller, C.W. Keβler, and J.L. Träff. Practical PRAM Programming. Wiley, New York, 2001.

    Google Scholar 

  32. J. Keller, T. Rauber, and B. Rederlechner. Conservative Circuit Simulation on Shared–Memory Multiprocessors. In Proceedings of the 10th Workshop on Parallel and Distributed Simulation (PADS’96), pages 126–134, ACM, 1996.

    Google Scholar 

  33. T. Kielmann, H.E. Bal, and K. Verstoep. Fast Measurement of LogP Parameters for Message Passing Platforms. In IPDPS ’00: Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing, pages 1176–1183, Springer, London, 2000.

    Google Scholar 

  34. F. McMahon. The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range. Technical Report UCRL-53745, Lawrence Livermore National Laboratory, Livermore, 1986.

    Google Scholar 

  35. R. Miller and L. Boxer. Algorithms Sequential and Parallel. Prentice Hall, Upper Saddle River, 2000.

    Google Scholar 

  36. C.H. Papadimitriou and M. Yannakakis. Towards an Architecture-Independent Analysis of Parallel Algorithms. In Proceedings of the 20th ACM Symposium on Theory of Computing, pages 510–513, 1988.

    Google Scholar 

  37. D.A. Patterson and J.L. Hennessy. Computer Organization & Design – The Hardware/Software Interface. 4th edition, Morgan Kaufmann, San Francisco, 2008.

    Google Scholar 

  38. A. Podehl, T. Rauber, and G. Rünger. A shared-memory implementation of the hierarchical radiosity method. Theoretical Computer Science, 196(1–2): 215–240, 1998.

    Article  MATH  Google Scholar 

  39. T. Rauber, G. Rünger, and C. Scholtes. Execution behavior analysis and performance prediction for a shared-memory implementation of an irregular particle simulation method. Simulation: Practice and Theory, 6: 665–687, 1998.

    Article  Google Scholar 

  40. J. Savage. Models of Computation: Exploring the Power of Computing. Addison-Wesley Longman Publishing Co., Inc., Boston, 1997.

    Google Scholar 

  41. L.G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8): 103–111, 1990.

    Article  Google Scholar 

  42. L.G. Valiant. A Bridging Model for Multi-core Computing. In Proceedings of the ESA, volume 5193, pages 13–28, Springer LNCS, 2008.

    Google Scholar 

  43. R.P. Weicker. Dhrystone: A synthetic system programming benchmark. Communications of the ACM, 29(10): 1013–1030, 1984.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Rauber .

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Rauber, T., Rünger, G. (2010). Performance Analysis of Parallel Programs. In: Parallel Programming. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04818-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04818-0_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04817-3

  • Online ISBN: 978-3-642-04818-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics