Abstract
This paper presents provable work-optimal parallelizations of STL (Standard Template Library) algorithms based on the work-stealing technique. Unlike previous approaches where a deque for each processor is typically used to locally store ready tasks and where a processor that runs out of work steals a ready task from the deque of a randomly selected processor, the current paper instead presents an original implementation of work-stealing without using any deque but a distributed list in order to bound overhead for task creations. The paper contains both theoretical and experimental results bounding the work/running time.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Musser, D.R., Derge, G.J., Saini, A.: STL tutorial and reference guide, 2nd edn. Addison-Wesley, Boston (2001)
Austern, M.H., Towle, R.A., Stepanov, A.A.: Range partition adaptors: a mechanism for parallelizing stl. SIGAPP Appl. Comput. Rev. 4(1), 5–6 (1996)
Reinders, J.: Intel Threading Building Blocks - Outfitting C++ for Multi-core Processor Parallelism. O’Reilly, Sebastopol (2007)
Danjean, V., Gillard, R., Guelton, S., Roch, J.L., Roche, T.: Adaptive loops with kaapi on multicore and grid: Applications in symmetric cryptography. In: ACM PASCO 2007, London, Canada (2007)
Singler, J., Sanders, P., Putze, F.: The multi-core standard template library. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641. Springer, Heidelberg (2007)
Yu, H., Rauchwerger, L.: An adaptive algorithm selection framework for reduction parallelization. IEEE Trans. Par. Dist. Syst. 17(10), 1084–1096 (2006)
Frigo, M., Leiserson, C., Randall, K.: The implementation of the cilk-5 multithreaded language. In: SIGPLAN Conf. PLDI, pp. 212–223 (1998)
Arora, N.S., Blumofe, R.D., Plaxton, C.G.: Thread scheduling for multiprogrammed multiprocessors. Theory Comput. Syst. 34(2), 115–144 (2001)
Gautier, T., Besseron, X., Pigeon, L.: Kaapi: A thread scheduling runtime system for data flow computations on cluster of multi-processors. In: ACM PASCO, London, Canada, pp. 15–23 (2007)
Chowdhury, R.A., Ramachandran, V., Blelloch, G.E., Gibbons, P., Chen, S., Kozuch, M.: Provably good multicore cache performance for divide-and-conquer algorithms. In: SIAM/ACM Symposium on Discrite Algorithms (SODA) (2008)
Ladner, R.E., Fischer, M.J.: Parallel prefix computation. Journal of the ACM 27(4), 831–838 (1980)
Bernard, J., Roch, J.L., Traore, D.: Processor-oblivious parallel stream computations. In: 16th Euromicro Conf. PDP, Toulouse, France (2007)
Bischof, H., Gorlatch, S., Leshchinskiy, R.: Generic parallel programming using c++ templates and skeletons. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 107–126. Springer, Heidelberg (2004)
Traoré, D., Roch, J.L., Cérin, C.: Algorithmes adaptatifs de tri parallèle. In: RenPar’18 / SympA 2008 / CFSE’6, Fribourg, Switzerland (2008)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Traoré, D., Roch, JL., Maillard, N., Gautier, T., Bernard, J. (2008). Deque-Free Work-Optimal Parallel STL Algorithms. In: Luque, E., Margalef, T., Benítez, D. (eds) Euro-Par 2008 – Parallel Processing. Euro-Par 2008. Lecture Notes in Computer Science, vol 5168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85451-7_95
Download citation
DOI: https://doi.org/10.1007/978-3-540-85451-7_95
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85450-0
Online ISBN: 978-3-540-85451-7
eBook Packages: Computer ScienceComputer Science (R0)