Abstract
We present a novel hardware algorithm for scheduling tasks with dependency constraints on multicore architectures. This algorithm provides a deadlock-free scheduling over a large class of architectures by employing a generalization of a fundamental algorithm by Tomasulo. Performance measurements show that the proposed algorithm can deliver higher performance than a large increase in the number of processing cores. Several authors have already pointed out how the “threads” model of computation can lead to a painstaking and error-prone programming process. Our approach does not preclude backward compatibility and the use of traditional techniques, but still supports a different and more advanced programming model, which is generally better suited for many complex embedded multicore systems.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Lee EA (2006) The problem with threads. Computer 39(5):33–42
Ayguadé E, Copty N, Duran A, Hoeflinger J, Lin Y, Massaioli F, Teruel X, Unnikrishnan P, Zhang G (2009) The design of OpenMP tasks. IEEE Trans Parallel Distributed Syst 20(3):404–418
Bellens P, Perez JM, Badia RM, Labarta J (2006) CellSs: a programming model for the Cell BE architecture. In: SC ‘06: Proceedings of the 2006 ACM/IEEE conference on supercomputing. ACM, New York
Tomasulo RM (1967) An efficient algorithm for exploiting multiple arithmetic units. IBM J Res Dev 11(1):25–33
Tomasulo RM, Anderson DW, Powers DM (1969) Execution unit with a common operand and resulting bussing system. United States Patent, August, number US3462744
Duran A, Pérez JM, Ayguadé E, Badia RM, Labarta J (2008) Extending the OpenMP tasking model to allow dependent tasks. In: International workshop on OpenMP ‘08, pp 111–122
Perez J, Badia R, Labarta J (2008) A dependency-aware task-based programming environment for multi-core architectures. In: IEEE international conference on cluster computing, October 2008, pp 142–151
Stensland HK, Griwodz C, Halvorsen P (2008) Evaluation of multicore scheduling mechanisms for heterogeneous processing architectures. In: NOSSDAV ‘08: Proceedings of the 18th international workshop on network and operating systems support for digital audio and video. ACM, New York, pp 33–38
Frigo M, Leiserson CE, Randall KH (1998) The implementation of the Cilk-5 multithreaded language. In: Proceedings of the ACM SIGPLAN ‘98 conference on programming language design and implementation, Montreal, Quebec, Canada, June, 1998, pp 212–223 (proceedings published ACM SIGPLAN Notices, vol 33(5), May 2008)
OpenMP Architecture Review Board (2008) OpenMP application program interface-version 3.0. Avaliable online: http://www.openmp.org/mp-documents/spec30.pdf
Salverda P, Zilles C (2008) Fundamental performance constraints in horizontal fusion of in-order cores. In: 14th international symposium on high performance computer architecture (HPCA), pp 252–263
Acknowledgments
This work has been partially supported by the German Federal Ministry of Education and Research (BMBF) under the project RapidMPSoC, grant number BMBF-01M3085B.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Di Gregorio, L. (2011). A Distributed Hardware Algorithm for Scheduling Dependent Tasks on Multicore Architectures. In: Conti, M., Orcioni, S., Martínez Madrid, N., Seepold, R. (eds) Solutions on Embedded Systems. Lecture Notes in Electrical Engineering, vol 81. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-0638-5_10
Download citation
DOI: https://doi.org/10.1007/978-94-007-0638-5_10
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-0637-8
Online ISBN: 978-94-007-0638-5
eBook Packages: EngineeringEngineering (R0)