Skip to main content

Algorithm Engineering for Parallel Computation

  • Chapter
  • First Online:
Experimental Algorithmics

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2547))

Abstract

The emerging discipline of algorithm engineering has primarily focused on transforming pencil-and-paper sequential algorithms into robust, efficient, well tested, and easily used implementations. As parallel computing becomes ubiquitous, we need to extend algorithm engineering techniques to parallel computation. Such an extension adds significant complications. After a short review of algorithm engineering achievements for sequential computing, we review the various complications caused by parallel computing, present some examples of successful efforts, and give a personal view of possible future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Aggarwal and J. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31:1116–1127, 1988.

    Article  MathSciNet  Google Scholar 

  2. A. Alexandrov, M. Ionescu, K. Schauser, and C. Scheiman. LogGP: iNcorporating long messages into the LogP model — one step closer towards a realistic model for parallel computation. In Proceedings of the 7th Annual Symposium on Parallel Algorithms and Architectures (SPAA’95), pages 95–105, 1995.

    Google Scholar 

  3. E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Cros, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostouchov, and D. Sorensen. LAPACK Users’ Guide. SIAM, Philadelphia, PA, 2nd edition, 1995.

    Google Scholar 

  4. D. A. Bader. An improved randomized selection algorithm with an experimental study. In Proceedings of the 2nd Workshop on Algorithm Engineering and Experiments (ALENEX’00), pages 115–129, 2000. http://www.cs.unm.edu/Conferences/ALENEX00/.

  5. D. A. Bader, D. R. Helman, and J. JáJá. Practical parallel algorithms for personalized communication and integer sorting. ACM Journal of Experimental Algorithmics, 1(3):1–42, 1996. http://www.jea.acm.org/1996/BaderPersonalized/.

    Google Scholar 

  6. D. A. Bader, A. K. Illendula, B. M. E. Moret, and N. Weisse-Bernstein. Using PRAM algorithms on a uniform-memory-access shared-memory architecture. In Proceedings of the 5th International Workshop on Algorithm Engineering (WAE’01). Springer Lecture Notes in Computer Science 2141, pages 129–144, 2001.

    Google Scholar 

  7. D. A. Bader and J. JáJá. Parallel algorithms for image histogramming and connected components with an experimental study. Journal of Parallel and Distributed Computing, 35(2):173–190, 1996.

    Article  Google Scholar 

  8. D. A. Bader and J. JáJá. Practical parallel algorithms for dynamic data redistribution, median finding, and selection. In Proceedings of the 10th International Parallel Processing Symposium (IPPS’96), pages 292–301, 1996.

    Google Scholar 

  9. D. A. Bader and J. JáJá. SIMPLE: a methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs). Journal of Parallel and Distributed Computing, 58(1):92–108, 1999.

    Article  Google Scholar 

  10. D. A. Bader, J. JáJá, and R. Chellappa. Scalable data parallel algorithms for texture synthesis using Gibbs random fields. IEEE Transactions on Image Processing, 4(10):1456–1460, 1995.

    Article  Google Scholar 

  11. D. A. Bader, J. JáJá, D. Harwood, and L. S. Davis. Parallel algorithms for image enhancement and segmentation by region growing with an experimental study. Journal on Supercomputing, 10(2):141–168, 1996.

    Google Scholar 

  12. D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga. The NAS parallel benchmarks. Technical Report RNR-94-007, Numerical Aerodynamic Simulation Facility, NASA Ames Research Center, Moffett Field, CA, March 1994.

    Google Scholar 

  13. D. H. Bailey. Twelve ways to fool the masses when giving performance results on parallel computers. Supercomputer Review, 4(8):54–55, 1991.

    Google Scholar 

  14. R. D. Barve and J. S. Vitter. A simple and efficient parallel disk mergesort. In Proceedings of the 11th Annual Symposium on Parallel Algorithms and Architectures (SPAA’99), pages 232–241, 1999.

    Google Scholar 

  15. A. Bäumker, W. Dittrich, and F. Meyer auf der Heide. Truly efficient parallel algorithms: 1-optimal multisearch for an extension of the BSP model. Theoretical Computer Science, 203(2):175–203, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  16. A. Bäumker, W. Dittrich, F. Meyer auf der Heide, and I. Rieping. Priority queue operations and selection for the BSP* model. In Proceedings of the 2nd International Euro-Par Conference. Springer Lecture Notes in Computer Science 1124, pages 369–376, 1996.

    Google Scholar 

  17. A. Bäumker, W. Dittrich, F. Meyer auf der Heide, and I. Rieping. Realistic parallel algorithms: priorityq ueue operations and selection for the BSP* model. In Proceedings of the 2nd International Euro-Par Conference. Springer Lecture Notes in Computer Science 1124, pages 27–29, 1996.

    Google Scholar 

  18. D. J. Becker, T. Sterling, D. Savarese, J. E. Dorband, U. A. Ranawak, and C. V. Packer. Beowulf: a parallel workstation for scientific computation. In Proceedings of the International Conference on Parallel Processing, vol. 1, pages 11–14, 1995.

    Google Scholar 

  19. L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. ScaLAPACK Users’ Guide. SIAM, Philadelphia, PA, 1997.

    MATH  Google Scholar 

  20. G. E. Blelloch, C. E. Leiserson, B. M. Maggs, C. G. Plaxton, S. J. Smith, and M. Zagha. A comparison of sorting algorithms for the connection machine CM-2. In Proceedings of the 3rd Symposium on Parallel Algorithms and Architectures (SPAA’91), pages 3–16, 1991.

    Google Scholar 

  21. G. E. Blelloch, C. E. Leiserson, B. M. Maggs, C. G. Plaxton, S. J. Smith, and M. Zagha. An experimental analysis of parallel sorting algorithms. Theory of Computing Systems, 31(2):135–167, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  22. O. Bonorden, B. Juurlink, I. von Otte, and I. Rieping. The Paderborn University BSP (PUB) library — design, implementation and performance. In Proceedings of the 13th International Parallel Processing Symposium and the 10th Symposium Parallel and Distributed Processing (IPPS/SPDP’99), 1999. http://www.uni-paderborn.de/~pub/.

  23. A. Charlesworth. Starfire: extending the SMP envelope. IEEE Micro, 18(1):39–49, 1998.

    Article  Google Scholar 

  24. J. Choi, J. J. Dongarra, R. Pozo, and D. W. Walker. ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers. In Proceedings of the 4th Symposium on the Frontiers of Massively Parallel Computations, pages 120–127, 1992.

    Google Scholar 

  25. D. E. Culler, A. C. Dusseau, R. P. Martin, and K. E. Schauser. Fast parallel sorting under LogP: from theory to practice. In Portability and Performance for Parallel Processing, chapter 4, pages 71–98. John Wiley & Sons, 1993.

    Google Scholar 

  26. D. E. Culler, R. M. Karp, D. A. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: towards a realistic model of parallel computation. In Proceedings of the 4th Symposium on the Principles and Practice of Parallel Programming, pages 1–12, 1993.

    Google Scholar 

  27. J. C. Cummings, J. A. Crotinger, S. W. Haney, W. F. Humphrey, S. R. Karmesin, J. V.W. Reynders, S. A. Smith, and T. J. Williams. Rapid application development and enhanced code interoperabily using the POOMA framework. In M. E. Henderson, C. R. Anderson, and S. L. Lyons, editors, Proceedings of the 1998 Workshop on Object Oriented Methods for Inter-operable Scientific and Engineering Computing, chapter 29. SIAM, Yorktown Heights, NY, 1999.

    Google Scholar 

  28. P. de la Torre and C. P. Kruskal. Submachine locality in the bulk synchronous setting. In Proceedings of the 2nd International Euro-Par Conference, pages 352–358, 1996.

    Google Scholar 

  29. S. J. Fink and S. B. Baden. Runtime support for multi-tier programming of block-structured applications on SMP clusters. In Y. Ishikawa et al., editors, Proceedings of the 1997 International Scientific Computing in Object-Oriented Parallel Environments Conference (ISCOPE’97). Springer Lecture Notes in Computer Science 1343, pages 1–8, 1997.

    Google Scholar 

  30. M. Frigo and S. G. Johnson. FFTW: An adaptive software architecture for the FFT. In Proceedings of the IEEE International Conference Acoustics, Speech, and Signal Processing, volume 3, pages 1381–1384, 1998.

    Google Scholar 

  31. M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS’99), pages 285–297, 1999.

    Google Scholar 

  32. A. V. Goldberg and B. M. E. Moret. Combinatorial algorithms test sets (CATS): the ACM/EATCS platform for experimental research. In Proceedings of the 10th Annual Symposium on Discrete Algorithms (SODA’99), pages 913–914, 1999.

    Google Scholar 

  33. W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Technical report, Argonne National Laboratory, Argonne, IL, 1996. http://www.mcs.anl.gov/mpi/mpich/.

    Google Scholar 

  34. S. E. Hambrusch and A. A. Khokhar. C 3: a parallel model for coarse-grained machines. Journal of Parallel and Distributed Computing, 32:139–154, 1996.

    Article  Google Scholar 

  35. D. R. Helman, D. A. Bader, and J. JáJá. A parallel sorting algorithm with an experimental study. Technical Report CS-TR-3549 and UMIACS-TR-95-102, UMIACS and Electrical Engineering, University of Maryland, College Park, MD, December 1995.

    Google Scholar 

  36. D. R. Helman, D. A. Bader, and J. JáJá. Parallel algorithms for personalized communication and sorting with an experimental study. In Proceedings of the 8th Annual Symposium on Parallel Algorithms and Architectures (SPAA’96), pages 211–220, 1996.

    Google Scholar 

  37. D. R. Helman, D. A. Bader, and J. JáJá. A randomized parallel sorting algorithm with an experimental study. Journal of Parallel and Distributed Computing, 52(1):1–23, 1998.

    Article  MATH  Google Scholar 

  38. D. R. Helman and J. JáJá. Sorting on clusters of SMP’s. In Proceedings of the 12th International Parallel Processing Symposium (IPPS’98), pages 1–7, 1998.

    Google Scholar 

  39. D. R. Helman and J. JáJá. Designing practical efficient algorithms for symmetric multiprocessors. In Proceedings of the 1st Workshop on Algorithm Engineering and Experiments (ALENEX’98). Springer Lecture Notes in Computer Science 1619, pages 37–56, 1998.

    Google Scholar 

  40. D. R. Helman and J. JáJá. Prefix computations on symmetric multiprocessors. Journal of Parallel and Distributed Computing, 61(2):265–278, 2001.

    Article  MATH  Google Scholar 

  41. D. R. Helman, J. JáJá, and D. A. Bader. A new deterministic parallel sorting algorithm with an experimental evaluation. ACM Journal of Experimental Algorithmics, 3(4), 1997. http://www.jea.acm.org/1998/HelmanSorting/.

  42. High Performance Fortran Forum. High Performance Fortran Language Specification, edition 1.0, May 1993.

    Google Scholar 

  43. J. M. D. Hill, B. McColl, D. C. Stefanescu, M. W. Goudreau, K. Lang, S. B. Rao, T. Suel, T. Tsantilas, and R. Bisseling. BSPlib: The BSP programming library. Technical Report PRG-TR-29-97, Oxford University Computing Laboratory, 1997. http://www.BSP-Worldwide.org/implmnts/oxtool/.

  44. J. JáJá. An Introduction to Parallel Algorithms. Addison-Wesley, New York, 1992.

    MATH  Google Scholar 

  45. B. H. H. Juurlink and H. A. G. Wijshoff. A quantitative comparison of parallel computation models. ACM Transactions on Computer Systems, 13(3):271–318, 1998.

    Article  Google Scholar 

  46. S. N. V. Kalluri, J. JáJá, D. A. Bader, Z. Zhang, J. R. G. Townshend, and H. Fallah-Adl. High performance computing algorithms for land cover dynamics using remote sensing data. International Journal of Remote Sensing, 21(6):1513–1536, 2000.

    Article  Google Scholar 

  47. J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA highly scalable server. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA’97), pages 241–251, 1997.

    Google Scholar 

  48. C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, W. D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D. S. Wells, M. C. Wong-Chan, S.-W. Yang, and R. Zak. The network architecture of the Connection Machine CM-5. Journal of Parallel and Distributed Computing, 33(2):145–158, 199.

    Google Scholar 

  49. M. J. Litzkow, M. Livny, and M. W. Mutka. Condor — a hunter of idle workstations. In Proceedings of the 8th International Conference on Distributed Computing Systems, pages 104–111, 1998.

    Google Scholar 

  50. C. C. McGeoch and B. M. E. Moret. How to present a paper on experimental work with algorithms. SIGACT News, 30(4):85–90, 1999.

    Article  Google Scholar 

  51. Message Passing Interface Forum. MPI: A Message-Passing Interface Standard. Technical report, University of Tennessee, Knoxville, TN, June 1995. Version 1.1.

    Google Scholar 

  52. F. Meyer auf der Heide and R. Wanka. Parallel bridging models and their impact on algorithm design. In Proceedings of the International Conference on Computational Science, Part II, Springer Lecture Notes in Computer Science 2074, pages 628–637, 2001.

    Google Scholar 

  53. B. M. E. Moret, D. A. Bader, and T. Warnow. High-performance algorithm engineering for computational phylogenetics. Journal on Supercomputing, 22:99–111, 2002. Special issue on the best papers from ICCS’01.

    Article  MATH  Google Scholar 

  54. B. M. E. Moret and H. D. Shapiro. Algorithms and experiments: the new (and old) methodology. Journal of Universal Computer Science, 7(5):434–446, 2001.

    MATH  MathSciNet  Google Scholar 

  55. B. M. E. Moret, A. C. Siepel, J. Tang, and T. Liu. Inversion medians outperform breakpoint medians in phylogeny reconstruction from gene-order data. In Proceedings of the 2nd Workshop on Algorithms in Bioinformatics (WABI’02). Springer Lecture Notes in Computer Science 2542, 2002.

    Google Scholar 

  56. MRJ Inc. The Portable Batch System (PBS). http://www.pbs.mrj.com.

  57. F. Müller. A library implementation of POSIX threads under UNIX. In Proceedings of the 1993 Winter USENIX Conference, pages 29–41, 1993. http://www.informatik.hu-berlin.de/~mueller/projects.html.

  58. W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach. VAMPIR: visualization and analysis of MPI resources. Supercomputer 63, 12(1):69–80, January 1996.

    Google Scholar 

  59. D. S. Nikolopoulos, T. S. Papatheodorou, C. D. Polychronopoulos, J. Labarta, and E. Ayguadé. Is data distribution necessary in OpenMP. In Proceedings of Supercomputing, 2000.

    Google Scholar 

  60. Ohio Supercomputer Center. LAM/MPI Parallel Computing. The Ohio State University, Columbus, OH, 1995. http://www.lam-mpi.org.

    Google Scholar 

  61. OpenMP Architecture Review Board. OpenMP: a proposed industry standard API for shared memory programming. http://www.openmp.org, October 1997.

  62. Platform Computing Inc. The Load Sharing Facility( LSF). http://www.platform.com.

  63. E. D. Polychronopoulos, D. S. Nikolopoulos, T. S. Papatheodorou, X. Martorell, J. Labarta, and N. Navarro. An efficient kernel-level scheduling methodology for multiprogrammed shared memory multiprocessors. In Proceedings of the 12th International Conference on Parallel and Distributed Computing Systems (PDCS’99), 1999.

    Google Scholar 

  64. POSIX. Information technology — Portable Operating System Interface (POSIX) — Part 1: System Application Program Interface (API). Portable Applications Standards Committee of the IEEE, edition 1996-07-12, 1996. ISO/IEC 9945-1, ANSI/IEEE Std. 1003.1.

    Google Scholar 

  65. N. Rahman and R. Raman. Adapting radix sort to the memory hierarchy. In Proceedings of the 2nd Workshop on Algorithm Engineering and Experiments (ALENEX’00), pages 131–146, 2000. http://www.cs.unm.edu/Conferences/ALENEX00/.

  66. D. A. Reed, R. A. Aydt, R. J. Noe, P. C. Roth, K. A. Shields, B. Schwartz, and L. F. Tavera. Scalable performance analysis: the Pablo performance analysis environment. In A. Skjellum, editor, Proceedings of the Scalable Parallel Libraries Conference, pages 104–113, 1993.

    Google Scholar 

  67. J. H. Reif, editor. Synthesis of Parallel Algorithms. Morgan Kaufmann, 1993.

    Google Scholar 

  68. R. Reussner, P. Sanders, L. Prechelt, and M. Müller. SKaMPI: a detailed, accurate MPI benchmark. In Proceedings of EuroPVM/MPI’98. Springer Lecture Notes in Computer Science 1497, pages 52–59, 1998. See also http://liinwww.ira.uka.de/~skampi/.

    Google Scholar 

  69. R. Reussner, P. Sanders, and J. Träff. SKaMPI: A comprehensive benchmark for public benchmarking of MPI. Scientific Programming, 2001. Accepted, conference version with L. Prechelt and M. Müller in Proceedings of EuroPVM/MPI’98.

    Google Scholar 

  70. P. Sanders. Load balancing algorithms for parallel depth first search (In German: Lastverteilungsalgorithmen für parallele Tiefensuche). Number 463 in Fortschrittsberichte, Reihe 10. VDI Verlag, Berlin, 1997.

    Google Scholar 

  71. P. Sanders. Randomized priority queues for fast parallel access. Journal of Parallel and Distributed Computing, 49(1):86–97, 1998. Special Issue on Parallel and Distributed Data Structures.

    Article  MATH  Google Scholar 

  72. P. Sanders. Accessing multiple sequences through set associative caches. In Proceedings of the 26th International Colloquium on Automata, Languages and Programming (ICALP’99). Springer Lecture Notes in Computer Science 1644, pages 655–664, 1999.

    Chapter  Google Scholar 

  73. P. Sanders and T. Hansch. On the efficient implementation of massively parallel quicksort. In Proceedings of the 4th International Workshop on Solving Irregularly Structured Problems in Parallel (IRREGULAR’97). Springer Lecture Notes in Computer Science 1253, pages 13–24, 1997.

    Google Scholar 

  74. U. Schöning. A probabilistic algorithm for k-SAT and constraint satisfaction problems. In Proceedings of the 40th IEEE Symposium on Foundations of Computer Science, pages 410–414, 1999.

    Google Scholar 

  75. S. Sen and S. Chatterjee. Towards a theory of cache-efficient algorithms. In Proceedings of the 11th Annual Symposium on Discrete Algorithms (SODA’00), pages 829–838, 2000.

    Google Scholar 

  76. T. L. Sterling, J. Salmon, and D. J. Becker. How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters. MIT Press, Cambridge, MA, 1999.

    Google Scholar 

  77. L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103–111, 1990.

    Article  Google Scholar 

  78. J. S. Vitter and E. A. M. Shriver. Algorithms for parallel memory I: two-level memories. Algorithmica, 12(2/3):110–147, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  79. J. S. Vitter and E. A.M. Shriver. Algorithms for parallel memory II: hierarchical multilevel memories. Algorithmica, 12(2/3):148–169, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  80. R. Whaley and J. Dongarra. Automatically tuned linear algebra software (ATLAS). In Proceedings of Supercomputing’98, 1998. http://www.netlib.org/utk/people/JackDongarra/PAPERS/atlas-sc98.ps.

  81. H. A. G. Wijshoff and B. H. H. Juurlink. A quantitative comparison of parallel computation models. In Proceedings of the 8th Annual Symposium on Parallel Algorithms and Architectures (SPAA’96), pages 13–24, 1996.

    Google Scholar 

  82. Y. Yan and X. Zhang. Lock bypassing: an efficient algorithm for concurrently accessing priority heaps. ACM Journal of Experimental Algorithmics, 3(3), 1998. http://www.jea.acm.org/1998/YanLock/.

  83. Z. Zhang, J. JáJá, D. A. Bader, S. Kalluri, H. Song, N. El Saleous, E. Vermote, and J. Townshend. Kronos: A Software System for the Processing and Retrieval of Large-Scale AVHRR Data Sets. Photogrammetric Engineering and Remote Sensing, 66(9):1073–1082, September 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bader, D.A., Moret, B.M.E., Sanders, P. (2002). Algorithm Engineering for Parallel Computation. In: Fleischer, R., Moret, B., Schmidt, E.M. (eds) Experimental Algorithmics. Lecture Notes in Computer Science, vol 2547. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36383-1_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-36383-1_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00346-5

  • Online ISBN: 978-3-540-36383-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics