Asynchronous Random Polling Dynamic Load Balancing

  • Peter Sanders
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1741)


Many applications in parallel processing have to traverse large, implicitly defined trees with irregular shape. The receiver initiated load balancing algorithm random polling has long been known to be very efficient for these problems in practice. For any ε > 0, we prove that its parallel execution time is at most \( (1 + \in )T_{seq} /P + \mathcal{O}(T_{atomic} + h(\frac{1} { \in } + T_{rout} + T_{split} )) \) with high probability, where T rout, T split and T atomic bound the time for sending a message, splitting a subproblem and finishing a small unsplittable subproblem respectively. The maximum splitting depth h is related to the depth of the computation tree. Previous work did not prove efficiency close to one and used less accurate models. In particular, our machine model allows asynchronous communication with nonconstant message delays and does not assume that communication takes place in rounds. This model is compatible with the LogP model.


Processing Element Dead Time Automatic Test Pattern Gene Message Queue Communication Round 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    G. Aharoni, Amnon Barak, and Yaron Farber. An adaptive granularity control algorithm for the parallel execution of functional programs. Future Generation Computing Systems, 9:163–174, 1993.CrossRefGoogle Scholar
  2. 2.
    N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. In 10th ACM Symposium on Parallel Algorithms and Architectures, pages 119–129, 1998.Google Scholar
  3. 3.
    S. Arvindam, V. Kumar, V. N. Rao, and V. Singh. Automatic test pattern generator on parallel processors. Technical Report TR 90-20, University of Minnesota,1990.Google Scholar
  4. 4.
    G. S. Bloom and S. W. Golomb. Applications of numbered undirected graphs. Proceedings of the IEEE, 65(4):562–570, April 1977.CrossRefGoogle Scholar
  5. 5.
    R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. In Foudations of Computer Science, pages 356–368, Santa Fe, 1994.Google Scholar
  6. 6.
    M. Böhm and E. Speckenmeyer. A fast parallel SAT-solver-efficient workload balancing. Annals of Mathematics and Artificial Intelligence, 17:381–400, 1996.zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    S. Chakrabarti, A. Ranade, and K. Yelick. Randomized load balancing for tree-structured computation. In Scalable High Performance Computing Conference, pages 666–673, Knoxville, 1994.Google Scholar
  8. 8.
    D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. v. Eicken. LogP: Towards a realistic model of parallel computation. In Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 1–12, San Diego, 1993.Google Scholar
  9. 9.
    W. Ertel. Parallele Suche mit randomisiertem Wettbewerb in Inferenzsystemen. Dissertation, TU München, 1992.Google Scholar
  10. 10.
    P. Fatourou and P. Spirakis. Scheduling algorithms for strict multithreaded computations. In ISAAC: 7th International Symposium on Algorithms and Computation, number 1178 in LNCS, pages 407–416, 1996.Google Scholar
  11. 11.
    P. Fatourou and P. Spirakis. A new scheduling algorithm for general strict multi-threaded computations. In 13rd International Symposium on DIStributed Computing (DISC’99), Bratislava, Slovakia, 1999. to appear.Google Scholar
  12. 12.
    R. Feldmann, P. Mysliwietz, and B. Monien. Studying overheads in massively parallel min/max-tree evaluation. In ACM Symposium on Parallel Architectures and Algorithms, pages 94–103, 1994.Google Scholar
  13. 13.
    R. Finkel and U. Manber. DIB-A distributed implementation of backtracking. ACM Transactions on Programming Languages and Systems, 9(2):235–256, April 1987.CrossRefGoogle Scholar
  14. 14.
    C. Goumopoulos, E. Housos, and O. Liljenzin. Parallel crew scheduling on workstation networks using PVM. In EuroPVM-MPI, number 1332 in LNCS, Cracow, Poland, 1997.Google Scholar
  15. 15.
    V. Heun and E. W. Mayr. Efficient dynamic embedding of arbitrary binary trees into hypercubes. In International Workshop on Parallel Algorithms for Irregularly Structured Problems, number 1117 in LNCS, 1996.CrossRefGoogle Scholar
  16. 16.
    J. C. Kergommeaux and P. Codognet. Parallel logic programming systems. ACM Computing Surveys, 26(3):295–336, 1994.CrossRefGoogle Scholar
  17. 17.
    R. E. Korf. Depth-first iterative-deepening: An optimal admissible tree search. Artificial Intelligence, 27:97–109, 1985.zbMATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    V. Kumar and G. Y. Ananth. Scalable load balancing techniques for parallel computers. Technical Report TR 91-55, University of Minnesota, 1991.Google Scholar
  19. 19.
    V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing. Design and Analysis of Algorithms. Benjamin/Cummings, 1994.Google Scholar
  20. 20.
    F. T. Leighton, B. M. Maggs, A. G. Ranade, and S. B. Rao. Randomized routing and sorting on fixed-connection networks. Journal of Algorithms, 17:157–205, 1994.zbMATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    S. Martello and P. Toth. Knapsack Problems-Algorithms and Computer Implementations. Wiley, 1990.Google Scholar
  22. 22.
    F. Mattern. Algorithms for distributed termination detection. Distributed Computing, 2:161–175, 1987.CrossRefGoogle Scholar
  23. 23.
    M. Mitzenmacher. Analyses of load stealing models based on differential equations. In 10th ACM Symposium on Parallel Algorithms and Architectures, pages 212–221, 1998.Google Scholar
  24. 24.
    A. Nonnenmacher and D. A. Mlynski. Liquid crystal simulation using automatic differentiation and interval arithmetic. In G. Alefeld and A. Frommer, editors, Scientific Computing and Validated Numerics. Akademie Verlag, 1996.Google Scholar
  25. 25.
    W. H. Press, S.A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C. Cambridge University Press, 2. edition, 1992.Google Scholar
  26. 26.
    V. N. Rao and V. Kumar. Parallel depth first search. International Journal of Parallel Programming, 16(6):470–519, 1987.CrossRefGoogle Scholar
  27. 27.
    A. Reinefeld. Scalability of massively parallel depth-first search. In DIMACS Workshop, 1994.Google Scholar
  28. 28.
    P. Sanders. Analysis of random polling dynamic load balancing. Technical Report IB 12/94, Universität Karlsruhe, Fakultät für Informatik, April 1994.Google Scholar
  29. 29.
    P. Sanders. A detailed analysis of random polling dynamic load balancing. In International Symposium on Parallel Architectures, Algorithms and Networks, pages 382–389, Kanazawa, Japan, 1994.Google Scholar
  30. 30.
    P. Sanders. Better algorithms for parallel backtracking. In Workshop on Algorithms for Irregularly Structured Problems, number 980 in LNCS, pages 333–347, 1995.Google Scholar
  31. 31.
    P. Sanders. A scalable parallel tree search library. In S. Ranka, editor, 2nd Workshop on Solving Irregular Problems on Distributed Memory Machines, Honolulu, Hawaii, 1996.Google Scholar
  32. 32.
    P. Sanders. Lastverteilungsalgorithmen für parallele Tiefensuche. PhD thesis, University of Karlsruhe, 1997.Google Scholar
  33. 33.
    P. Sanders. Lastverteilungsalgorithmen für parallele Tiefensuche. Number 463 in Fortschrittsberichte, Reihe 10. VDI Verlag, 1997.Google Scholar
  34. 34.
    P. Sanders. Tree shaped computations as a model for parallel applications. In ALV’98 Workshop on application based load balancing. SFB 342, TU München, Germany, March 1998. Scholar
  35. 35.
    E. Speckenmeyer, B. Monien, and O. Vornberger. Superlinear speedup for parallel backtracking. In C. D. Houstis, E. N.; Papatheodorou, T. S.; Polychronopoulos, editor, Proceedings of the 1st International Conference on Supercomputing, number 297 in LNCS, pages 985–993, Athens, Greece, June 1987. Springer.Google Scholar
  36. 36.
    R. Wattenhofer and P. Widmayer. An inherent bottleneck in distributed counting. Journal Parallel and Distributed Processing, Special Issue on Parallel and Distributed Data Structures, 49:135–145, 1998.zbMATHGoogle Scholar
  37. 37.
    I. C. Wu and H. T. Kung. Communication complexity of parallel divide-and-conquer. In Foudations of Computer Science, pages 151–162, 1991.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Peter Sanders
    • 1
  1. 1.Im StadtwaldMax-Planck-Institut für InformatikSaarbrückenGermany

Personalised recommendations