Abstract
In the past, parallel algorithms were developed, for the most part, under the assumption that the number of processors is Θ(n) (where n is the size of the input) and that if in practice the actual number was smaller, this could be resolved using Brent’s Lemma to simulate the highly parallel solution on a lower-degree parallel architecture. In this paper, however, we argue that design and implementation issues of algorithms and architectures are significantly different—both in theory and in practice—between computational models with high and low degrees of parallelism. We report an observed gap in the behavior of a parallel architecture depending on the number of processors. This gap appears repeatedly in both empirical cases, when studying practical aspects of architecture design and program implementation as well as in theoretical instances when studying the behaviour of various parallel algorithms. It separates the performance, design and analysis of systems with a sublinear number of processors and systems with linearly many processors. More specifically we observe that systems with either logarithmically many cores or with O(n α) cores (with α < 1) exhibit a qualitatively different behavior than a system with a linear number of cores on the size of the input, i.e., Θ(n). The evidence we present suggests the existence of a sharp theoretical gap between the classes of problems that can be efficiently parallelized with o(n) processors and with Θ(n) processors unless P = NC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ajwani, D., Sitchinava, N., Zeh, N.: Geometric algorithms for private-cache chip multiprocessors. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part II. LNCS, vol. 6347, pp. 75–86. Springer, Heidelberg (2010)
Ajwani, D., Sitchinava, N., Zeh, N.: I/O-optimal distribution sweeping on private-cache chip multiprocessors. In: IPDPS, pp. 1114–1123. IEEE (2011)
Arge, L., Goodrich, M.T., Nelson, M.J., Sitchinava, N.: Fundamental parallel algorithms for private-cache chip multiprocessors. In: SPAA, pp. 197–206 (2008)
Arge, L., Goodrich, M.T., Sitchinava, N.: Parallel external memory graph algorithms. In: IPDPS, pp. 1–11. IEEE (2010)
Arlazarov, V., Dinic, E., Kronrod, M., Faradzev, I.: On economic construction of the transitive closure of a directed graph. Dokl. Akad. Nauk SSSR 194, 487–488 (1970) (in Russian); English translation in Soviet Math. Dokl. 11, 1209–1210 (1975)
Bender, M.A., Phillips, C.A.: Scheduling DAGs on asynchronous processors. In: SPAA, pp. 35–45. ACM (2007)
Blelloch, G.E., Chowdhury, R.A., Gibbons, P.B., Ramachandran, V., Chen, S., Kozuch, M.: Provably good multicore cache performance for divide-and-conquer algorithms. In: SODA. ACM (2008)
Blelloch, G.E., Gibbons, P.B.: Effectively sharing a cache among threads. In: SPAA, pp. 235–244. ACM (2004)
Blelloch, G.E., Gibbons, P.B., Matias, Y.: Provably efficient scheduling for languages with fine-grained parallelism. J. ACM 46, 281–321 (1999)
Bose, P., Chen, E.Y., He, M., Maheshwari, A., Morin, P.: Succinct geometric indexes supporting point location queries. In: SODA, pp. 635–644. SIAM (2009)
Brent, R.P.: The parallel evaluation of general arithmetic expressions. J. ACM 21(2), 201–206 (1974)
Burton, F.W., Sleep, M.R.: Executing functional programs on a virtual tree of processors. In: FPCA, pp. 187–194. ACM (1981)
Chowdhury, R.A., Ramachandran, V.: Cache-efficient dynamic programming algorithms for multicores. In: SPAA, pp. 207–216. ACM (2008)
Dorrigiv, R., López-Ortiz, A., Salinger, A.: Optimal speedup on a low-degree multi-core parallel architecture (LoPRAM). In: SPAA, pp. 185–187. ACM (2008)
Dymond, P.W., Tompa, M.: Speedups of deterministic machines by synchronous parallel machines. In: STOC, pp. 336–343. ACM (1983)
Feller, W.: An Introduction to Probability Theory and Its Applications, vol. 1. Wiley (1968)
Fujiwara, A., Inoue, M., Masuzawa, T.: Parallelizability of some P-complete problems. In: Rolim, J.D.P. (ed.) IPDPS 2000 Workshops. LNCS, vol. 1800, pp. 116–122. Springer, Heidelberg (2000)
Greenlaw, R., Hoover, H.J., Ruzzo, W.L.: Limits to parallel computation: P-completeness theory. Oxford University Press, Inc., New York (1995)
Hopcroft, J.E., Paul, W.J., Valiant, L.G.: On time versus space and related problems. In: FOCS, pp. 57–64. IEEE (1975)
Kruskal, C.P., Rudolph, L., Snir, M.: A complexity theory of efficient parallel algorithms. Theor. Comput. Sci. 71(1), 95–132 (1990)
Munro, J.I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)
Raab, M., Steger, A.: “Balls into Bins” - a simple and tight analysis. In: Rolim, J.D.P., Serna, M., Luby, M. (eds.) RANDOM 1998. LNCS, vol. 1518, pp. 159–170. Springer, Heidelberg (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
López-Ortiz, A., Salinger, A. (2013). On the Sublinear Processor Gap for Parallel Architectures. In: Chan, TH.H., Lau, L.C., Trevisan, L. (eds) Theory and Applications of Models of Computation. TAMC 2013. Lecture Notes in Computer Science, vol 7876. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38236-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-38236-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38235-2
Online ISBN: 978-3-642-38236-9
eBook Packages: Computer ScienceComputer Science (R0)