Algorithm Engineering for Parallel Computation

Bader, David A.; Moret, Bernard M. E.; Sanders, Peter

doi:10.1007/3-540-36383-1_1

David A. Bader⁷,
Bernard M. E. Moret⁷ &
Peter Sanders⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2547))

750 Accesses
12 Citations

Abstract

The emerging discipline of algorithm engineering has primarily focused on transforming pencil-and-paper sequential algorithms into robust, efficient, well tested, and easily used implementations. As parallel computing becomes ubiquitous, we need to extend algorithm engineering techniques to parallel computation. Such an extension adds significant complications. After a short review of algorithm engineering achievements for sequential computing, we review the various complications caused by parallel computing, present some examples of successful efforts, and give a personal view of possible future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Aggarwal and J. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31:1116–1127, 1988.
Article MathSciNet Google Scholar
A. Alexandrov, M. Ionescu, K. Schauser, and C. Scheiman. LogGP: iNcorporating long messages into the LogP model — one step closer towards a realistic model for parallel computation. In Proceedings of the 7th Annual Symposium on Parallel Algorithms and Architectures (SPAA’95), pages 95–105, 1995.
Google Scholar
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Cros, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostouchov, and D. Sorensen. LAPACK Users’ Guide. SIAM, Philadelphia, PA, 2nd edition, 1995.
Google Scholar
D. A. Bader. An improved randomized selection algorithm with an experimental study. In Proceedings of the 2nd Workshop on Algorithm Engineering and Experiments (ALENEX’00), pages 115–129, 2000. http://www.cs.unm.edu/Conferences/ALENEX00/.
D. A. Bader, D. R. Helman, and J. JáJá. Practical parallel algorithms for personalized communication and integer sorting. ACM Journal of Experimental Algorithmics, 1(3):1–42, 1996. http://www.jea.acm.org/1996/BaderPersonalized/.
Google Scholar
D. A. Bader, A. K. Illendula, B. M. E. Moret, and N. Weisse-Bernstein. Using PRAM algorithms on a uniform-memory-access shared-memory architecture. In Proceedings of the 5th International Workshop on Algorithm Engineering (WAE’01). Springer Lecture Notes in Computer Science 2141, pages 129–144, 2001.
Google Scholar
D. A. Bader and J. JáJá. Parallel algorithms for image histogramming and connected components with an experimental study. Journal of Parallel and Distributed Computing, 35(2):173–190, 1996.
Article Google Scholar
D. A. Bader and J. JáJá. Practical parallel algorithms for dynamic data redistribution, median finding, and selection. In Proceedings of the 10th International Parallel Processing Symposium (IPPS’96), pages 292–301, 1996.
Google Scholar
D. A. Bader and J. JáJá. SIMPLE: a methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs). Journal of Parallel and Distributed Computing, 58(1):92–108, 1999.
Article Google Scholar
D. A. Bader, J. JáJá, and R. Chellappa. Scalable data parallel algorithms for texture synthesis using Gibbs random fields. IEEE Transactions on Image Processing, 4(10):1456–1460, 1995.
Article Google Scholar
D. A. Bader, J. JáJá, D. Harwood, and L. S. Davis. Parallel algorithms for image enhancement and segmentation by region growing with an experimental study. Journal on Supercomputing, 10(2):141–168, 1996.
Google Scholar
D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga. The NAS parallel benchmarks. Technical Report RNR-94-007, Numerical Aerodynamic Simulation Facility, NASA Ames Research Center, Moffett Field, CA, March 1994.
Google Scholar
D. H. Bailey. Twelve ways to fool the masses when giving performance results on parallel computers. Supercomputer Review, 4(8):54–55, 1991.
Google Scholar
R. D. Barve and J. S. Vitter. A simple and efficient parallel disk mergesort. In Proceedings of the 11th Annual Symposium on Parallel Algorithms and Architectures (SPAA’99), pages 232–241, 1999.
Google Scholar
A. Bäumker, W. Dittrich, and F. Meyer auf der Heide. Truly efficient parallel algorithms: 1-optimal multisearch for an extension of the BSP model. Theoretical Computer Science, 203(2):175–203, 1998.
Article MATH MathSciNet Google Scholar
A. Bäumker, W. Dittrich, F. Meyer auf der Heide, and I. Rieping. Priority queue operations and selection for the BSP* model. In Proceedings of the 2nd International Euro-Par Conference. Springer Lecture Notes in Computer Science 1124, pages 369–376, 1996.
Google Scholar
A. Bäumker, W. Dittrich, F. Meyer auf der Heide, and I. Rieping. Realistic parallel algorithms: priorityq ueue operations and selection for the BSP* model. In Proceedings of the 2nd International Euro-Par Conference. Springer Lecture Notes in Computer Science 1124, pages 27–29, 1996.
Google Scholar
D. J. Becker, T. Sterling, D. Savarese, J. E. Dorband, U. A. Ranawak, and C. V. Packer. Beowulf: a parallel workstation for scientific computation. In Proceedings of the International Conference on Parallel Processing, vol. 1, pages 11–14, 1995.
Google Scholar
L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. ScaLAPACK Users’ Guide. SIAM, Philadelphia, PA, 1997.
MATH Google Scholar
G. E. Blelloch, C. E. Leiserson, B. M. Maggs, C. G. Plaxton, S. J. Smith, and M. Zagha. A comparison of sorting algorithms for the connection machine CM-2. In Proceedings of the 3rd Symposium on Parallel Algorithms and Architectures (SPAA’91), pages 3–16, 1991.
Google Scholar
G. E. Blelloch, C. E. Leiserson, B. M. Maggs, C. G. Plaxton, S. J. Smith, and M. Zagha. An experimental analysis of parallel sorting algorithms. Theory of Computing Systems, 31(2):135–167, 1998.
Article MATH MathSciNet Google Scholar
O. Bonorden, B. Juurlink, I. von Otte, and I. Rieping. The Paderborn University BSP (PUB) library — design, implementation and performance. In Proceedings of the 13th International Parallel Processing Symposium and the 10th Symposium Parallel and Distributed Processing (IPPS/SPDP’99), 1999. http://www.uni-paderborn.de/~pub/.
A. Charlesworth. Starfire: extending the SMP envelope. IEEE Micro, 18(1):39–49, 1998.
Article Google Scholar
J. Choi, J. J. Dongarra, R. Pozo, and D. W. Walker. ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers. In Proceedings of the 4th Symposium on the Frontiers of Massively Parallel Computations, pages 120–127, 1992.
Google Scholar
D. E. Culler, A. C. Dusseau, R. P. Martin, and K. E. Schauser. Fast parallel sorting under LogP: from theory to practice. In Portability and Performance for Parallel Processing, chapter 4, pages 71–98. John Wiley & Sons, 1993.
Google Scholar
D. E. Culler, R. M. Karp, D. A. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: towards a realistic model of parallel computation. In Proceedings of the 4th Symposium on the Principles and Practice of Parallel Programming, pages 1–12, 1993.
Google Scholar
J. C. Cummings, J. A. Crotinger, S. W. Haney, W. F. Humphrey, S. R. Karmesin, J. V.W. Reynders, S. A. Smith, and T. J. Williams. Rapid application development and enhanced code interoperabily using the POOMA framework. In M. E. Henderson, C. R. Anderson, and S. L. Lyons, editors, Proceedings of the 1998 Workshop on Object Oriented Methods for Inter-operable Scientific and Engineering Computing, chapter 29. SIAM, Yorktown Heights, NY, 1999.
Google Scholar
P. de la Torre and C. P. Kruskal. Submachine locality in the bulk synchronous setting. In Proceedings of the 2nd International Euro-Par Conference, pages 352–358, 1996.
Google Scholar
S. J. Fink and S. B. Baden. Runtime support for multi-tier programming of block-structured applications on SMP clusters. In Y. Ishikawa et al., editors, Proceedings of the 1997 International Scientific Computing in Object-Oriented Parallel Environments Conference (ISCOPE’97). Springer Lecture Notes in Computer Science 1343, pages 1–8, 1997.
Google Scholar
M. Frigo and S. G. Johnson. FFTW: An adaptive software architecture for the FFT. In Proceedings of the IEEE International Conference Acoustics, Speech, and Signal Processing, volume 3, pages 1381–1384, 1998.
Google Scholar
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS’99), pages 285–297, 1999.
Google Scholar
A. V. Goldberg and B. M. E. Moret. Combinatorial algorithms test sets (CATS): the ACM/EATCS platform for experimental research. In Proceedings of the 10th Annual Symposium on Discrete Algorithms (SODA’99), pages 913–914, 1999.
Google Scholar
W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Technical report, Argonne National Laboratory, Argonne, IL, 1996. http://www.mcs.anl.gov/mpi/mpich/.
Google Scholar
S. E. Hambrusch and A. A. Khokhar. C ³: a parallel model for coarse-grained machines. Journal of Parallel and Distributed Computing, 32:139–154, 1996.
Article Google Scholar
D. R. Helman, D. A. Bader, and J. JáJá. A parallel sorting algorithm with an experimental study. Technical Report CS-TR-3549 and UMIACS-TR-95-102, UMIACS and Electrical Engineering, University of Maryland, College Park, MD, December 1995.
Google Scholar
D. R. Helman, D. A. Bader, and J. JáJá. Parallel algorithms for personalized communication and sorting with an experimental study. In Proceedings of the 8th Annual Symposium on Parallel Algorithms and Architectures (SPAA’96), pages 211–220, 1996.
Google Scholar
D. R. Helman, D. A. Bader, and J. JáJá. A randomized parallel sorting algorithm with an experimental study. Journal of Parallel and Distributed Computing, 52(1):1–23, 1998.
Article MATH Google Scholar
D. R. Helman and J. JáJá. Sorting on clusters of SMP’s. In Proceedings of the 12th International Parallel Processing Symposium (IPPS’98), pages 1–7, 1998.
Google Scholar
D. R. Helman and J. JáJá. Designing practical efficient algorithms for symmetric multiprocessors. In Proceedings of the 1st Workshop on Algorithm Engineering and Experiments (ALENEX’98). Springer Lecture Notes in Computer Science 1619, pages 37–56, 1998.
Google Scholar
D. R. Helman and J. JáJá. Prefix computations on symmetric multiprocessors. Journal of Parallel and Distributed Computing, 61(2):265–278, 2001.
Article MATH Google Scholar
D. R. Helman, J. JáJá, and D. A. Bader. A new deterministic parallel sorting algorithm with an experimental evaluation. ACM Journal of Experimental Algorithmics, 3(4), 1997. http://www.jea.acm.org/1998/HelmanSorting/.
High Performance Fortran Forum. High Performance Fortran Language Specification, edition 1.0, May 1993.
Google Scholar
J. M. D. Hill, B. McColl, D. C. Stefanescu, M. W. Goudreau, K. Lang, S. B. Rao, T. Suel, T. Tsantilas, and R. Bisseling. BSPlib: The BSP programming library. Technical Report PRG-TR-29-97, Oxford University Computing Laboratory, 1997. http://www.BSP-Worldwide.org/implmnts/oxtool/.
J. JáJá. An Introduction to Parallel Algorithms. Addison-Wesley, New York, 1992.
MATH Google Scholar
B. H. H. Juurlink and H. A. G. Wijshoff. A quantitative comparison of parallel computation models. ACM Transactions on Computer Systems, 13(3):271–318, 1998.
Article Google Scholar
S. N. V. Kalluri, J. JáJá, D. A. Bader, Z. Zhang, J. R. G. Townshend, and H. Fallah-Adl. High performance computing algorithms for land cover dynamics using remote sensing data. International Journal of Remote Sensing, 21(6):1513–1536, 2000.
Article Google Scholar
J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA highly scalable server. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA’97), pages 241–251, 1997.
Google Scholar
C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, W. D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D. S. Wells, M. C. Wong-Chan, S.-W. Yang, and R. Zak. The network architecture of the Connection Machine CM-5. Journal of Parallel and Distributed Computing, 33(2):145–158, 199.
Google Scholar
M. J. Litzkow, M. Livny, and M. W. Mutka. Condor — a hunter of idle workstations. In Proceedings of the 8th International Conference on Distributed Computing Systems, pages 104–111, 1998.
Google Scholar
C. C. McGeoch and B. M. E. Moret. How to present a paper on experimental work with algorithms. SIGACT News, 30(4):85–90, 1999.
Article Google Scholar
Message Passing Interface Forum. MPI: A Message-Passing Interface Standard. Technical report, University of Tennessee, Knoxville, TN, June 1995. Version 1.1.
Google Scholar
F. Meyer auf der Heide and R. Wanka. Parallel bridging models and their impact on algorithm design. In Proceedings of the International Conference on Computational Science, Part II, Springer Lecture Notes in Computer Science 2074, pages 628–637, 2001.
Google Scholar
B. M. E. Moret, D. A. Bader, and T. Warnow. High-performance algorithm engineering for computational phylogenetics. Journal on Supercomputing, 22:99–111, 2002. Special issue on the best papers from ICCS’01.
Article MATH Google Scholar
B. M. E. Moret and H. D. Shapiro. Algorithms and experiments: the new (and old) methodology. Journal of Universal Computer Science, 7(5):434–446, 2001.
MATH MathSciNet Google Scholar
B. M. E. Moret, A. C. Siepel, J. Tang, and T. Liu. Inversion medians outperform breakpoint medians in phylogeny reconstruction from gene-order data. In Proceedings of the 2nd Workshop on Algorithms in Bioinformatics (WABI’02). Springer Lecture Notes in Computer Science 2542, 2002.
Google Scholar
MRJ Inc. The Portable Batch System (PBS). http://www.pbs.mrj.com.
F. Müller. A library implementation of POSIX threads under UNIX. In Proceedings of the 1993 Winter USENIX Conference, pages 29–41, 1993. http://www.informatik.hu-berlin.de/~mueller/projects.html.
W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach. VAMPIR: visualization and analysis of MPI resources. Supercomputer 63, 12(1):69–80, January 1996.
Google Scholar
D. S. Nikolopoulos, T. S. Papatheodorou, C. D. Polychronopoulos, J. Labarta, and E. Ayguadé. Is data distribution necessary in OpenMP. In Proceedings of Supercomputing, 2000.
Google Scholar
Ohio Supercomputer Center. LAM/MPI Parallel Computing. The Ohio State University, Columbus, OH, 1995. http://www.lam-mpi.org.
Google Scholar
OpenMP Architecture Review Board. OpenMP: a proposed industry standard API for shared memory programming. http://www.openmp.org, October 1997.
Platform Computing Inc. The Load Sharing Facility( LSF). http://www.platform.com.
E. D. Polychronopoulos, D. S. Nikolopoulos, T. S. Papatheodorou, X. Martorell, J. Labarta, and N. Navarro. An efficient kernel-level scheduling methodology for multiprogrammed shared memory multiprocessors. In Proceedings of the 12th International Conference on Parallel and Distributed Computing Systems (PDCS’99), 1999.
Google Scholar
POSIX. Information technology — Portable Operating System Interface (POSIX) — Part 1: System Application Program Interface (API). Portable Applications Standards Committee of the IEEE, edition 1996-07-12, 1996. ISO/IEC 9945-1, ANSI/IEEE Std. 1003.1.
Google Scholar
N. Rahman and R. Raman. Adapting radix sort to the memory hierarchy. In Proceedings of the 2nd Workshop on Algorithm Engineering and Experiments (ALENEX’00), pages 131–146, 2000. http://www.cs.unm.edu/Conferences/ALENEX00/.
D. A. Reed, R. A. Aydt, R. J. Noe, P. C. Roth, K. A. Shields, B. Schwartz, and L. F. Tavera. Scalable performance analysis: the Pablo performance analysis environment. In A. Skjellum, editor, Proceedings of the Scalable Parallel Libraries Conference, pages 104–113, 1993.
Google Scholar
J. H. Reif, editor. Synthesis of Parallel Algorithms. Morgan Kaufmann, 1993.
Google Scholar
R. Reussner, P. Sanders, L. Prechelt, and M. Müller. SKaMPI: a detailed, accurate MPI benchmark. In Proceedings of EuroPVM/MPI’98. Springer Lecture Notes in Computer Science 1497, pages 52–59, 1998. See also http://liinwww.ira.uka.de/~skampi/.
Google Scholar
R. Reussner, P. Sanders, and J. Träff. SKaMPI: A comprehensive benchmark for public benchmarking of MPI. Scientific Programming, 2001. Accepted, conference version with L. Prechelt and M. Müller in Proceedings of EuroPVM/MPI’98.
Google Scholar
P. Sanders. Load balancing algorithms for parallel depth first search (In German: Lastverteilungsalgorithmen für parallele Tiefensuche). Number 463 in Fortschrittsberichte, Reihe 10. VDI Verlag, Berlin, 1997.
Google Scholar
P. Sanders. Randomized priority queues for fast parallel access. Journal of Parallel and Distributed Computing, 49(1):86–97, 1998. Special Issue on Parallel and Distributed Data Structures.
Article MATH Google Scholar
P. Sanders. Accessing multiple sequences through set associative caches. In Proceedings of the 26th International Colloquium on Automata, Languages and Programming (ICALP’99). Springer Lecture Notes in Computer Science 1644, pages 655–664, 1999.
Chapter Google Scholar
P. Sanders and T. Hansch. On the efficient implementation of massively parallel quicksort. In Proceedings of the 4th International Workshop on Solving Irregularly Structured Problems in Parallel (IRREGULAR’97). Springer Lecture Notes in Computer Science 1253, pages 13–24, 1997.
Google Scholar
U. Schöning. A probabilistic algorithm for k-SAT and constraint satisfaction problems. In Proceedings of the 40th IEEE Symposium on Foundations of Computer Science, pages 410–414, 1999.
Google Scholar
S. Sen and S. Chatterjee. Towards a theory of cache-efficient algorithms. In Proceedings of the 11th Annual Symposium on Discrete Algorithms (SODA’00), pages 829–838, 2000.
Google Scholar
T. L. Sterling, J. Salmon, and D. J. Becker. How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters. MIT Press, Cambridge, MA, 1999.
Google Scholar
L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103–111, 1990.
Article Google Scholar
J. S. Vitter and E. A. M. Shriver. Algorithms for parallel memory I: two-level memories. Algorithmica, 12(2/3):110–147, 1994.
Article MATH MathSciNet Google Scholar
J. S. Vitter and E. A.M. Shriver. Algorithms for parallel memory II: hierarchical multilevel memories. Algorithmica, 12(2/3):148–169, 1994.
Article MATH MathSciNet Google Scholar
R. Whaley and J. Dongarra. Automatically tuned linear algebra software (ATLAS). In Proceedings of Supercomputing’98, 1998. http://www.netlib.org/utk/people/JackDongarra/PAPERS/atlas-sc98.ps.
H. A. G. Wijshoff and B. H. H. Juurlink. A quantitative comparison of parallel computation models. In Proceedings of the 8th Annual Symposium on Parallel Algorithms and Architectures (SPAA’96), pages 13–24, 1996.
Google Scholar
Y. Yan and X. Zhang. Lock bypassing: an efficient algorithm for concurrently accessing priority heaps. ACM Journal of Experimental Algorithmics, 3(3), 1998. http://www.jea.acm.org/1998/YanLock/.
Z. Zhang, J. JáJá, D. A. Bader, S. Kalluri, H. Song, N. El Saleous, E. Vermote, and J. Townshend. Kronos: A Software System for the Processing and Retrieval of Large-Scale AVHRR Data Sets. Photogrammetric Engineering and Remote Sensing, 66(9):1073–1082, September 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

Departments of Electrical and Computer Engineering, and Computer Science, University of New Mexico, 87131, Albuquerque, NM, USA
David A. Bader & Bernard M. E. Moret
Max-Planck-Institut für Informatik, Stuhlsatzenhausweg 85, 66123, Saarbrücken, Germany
Peter Sanders

Authors

David A. Bader
View author publications
You can also search for this author in PubMed Google Scholar
Bernard M. E. Moret
View author publications
You can also search for this author in PubMed Google Scholar
Peter Sanders
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
Rudolf Fleischer
Department of Computer Science, University of New Mexico, Farris Engineering Bldg, 87131-1386, Albuquerque, NM, USA
Bernard Moret
Department of Computer Science, University of Aarhus, Bld. 540, Ny Munkegade, 8000, Aarhus C, Denmark
Erik Meineche Schmidt

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bader, D.A., Moret, B.M.E., Sanders, P. (2002). Algorithm Engineering for Parallel Computation. In: Fleischer, R., Moret, B., Schmidt, E.M. (eds) Experimental Algorithmics. Lecture Notes in Computer Science, vol 2547. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36383-1_1

Download citation

DOI: https://doi.org/10.1007/3-540-36383-1_1
Published: 16 December 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00346-5
Online ISBN: 978-3-540-36383-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics