Abstract
This paper presents the design and implementation of a runtime system (named “GodRunner”) on Godson-T many-core processor to support task-level parallelism efficiently and flexibly. GodRunner abstracts underlying hardware resource, providing ease-of-use programming interface. A two-grade task management mechanism is proposed to support both coarse-grained and fine-grained multithreading efficiently. Two load-balanced scheduling policies are combined flexibly in GodRunner. The software-controlled task management makes GodRunner more configurable and extensible than hard-wired ones. The experiment shows that the tasking overhead in GodRunner is as small as hundreds of cycles, which is about the hundreds of times faster than the conventional Pthread based multithreading on a SMP machine. Furthermore, our approach scales well and supports fine-grained tasks as small as 20k cycles optimally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Huang, H., Yuan, N., Lin, W., et al.: Architecture Supported Synchronization-Based Cache Coherence Protocol For Many-Core Processors. In: 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, In Conjunction with the 35th International Symposium on Computer Architecture, Beijing, China (June 2008)
Iftode, L., Singh, J.P., Li, K.: Scope Consistency: A Bridge between Release Consistency and Entry Consistency. In: Proc. of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures (1996)
Cuvillo, J.D., Zhu, W.R., Hu, Z., Gao, G.R.: TiNy threads: a thread virtual machine for the cyclops64 cellular architecture. In: Proceedings of 19th IEEE International Parallel and Distributed Processing Symposium, The Colorado, The USA (April 2005)
Kumar, S., Hughes, C.J., Nguyen, A.: Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors. In: Proceedings of 34th International Symposium on Computer Architecture, San Diego, California, USA (June 2007)
Palatin, P., Lhuillier, Y., Temam, O.: CAPSULE: hardware-assisted parallel execution of component-based Programs. In: Proceedings of 39th Annual IEEE/ACM International Symposium on Microarchitecture, Florida, USA (December 2006)
Chen, J., Juang, P., Ko, K., Contreras, G., Penry, D., Rangan, R., Stoler, A., Peh, L., Martonosi, M.: Hardware-Modulated Parallelism in Chip Multiprocessors. In: Proceedings of Workshop on Design, Architecture and Simulation of Chip Multi-Processors Conference (dasCMP), Spain, pp. 54–63 (November 2005)
Mueller, F.: Pthreads library interface. Technical report, Department of Computer Science, Florida State University (July 1993)
Rosenberg, J.: LWP user manual. Technical Report CMUITC- CMUITC-85-037, Information Technology Center, Carnegie- Mellon University (June 1985)
Nikolopoulos, D.S., Polychronopoulos, E.D., Papatheodorou, T.S.: Efficient runtime thread management for the Nano-Threads programming model. In: Proceedings of the 2nd IPPS/SPDP Workshop on Runtime Systems for Parallel Programming, Orlando, Florida, March 30, pp. 183–194 (1998)
Culler, D.E., Goldstein, S.C., Schauser, K.E., Eicken, T.V.: TAM – a compiler controlled threaded abstract machine. Journal of Parallel and Distributed Computing (July 1993)
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. In: Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (June 1998)
Theobald, K.B.: EARTH: An Efficient Architecture for Running Threads. PhD dissertation, McGill University (May 1999)
Woo, S.C., Ohara, M., Torrie, E., Pal Singh, J., Gupta, A.: The SPLASH-2 Programs: Characterization and Methodological Considerations. In: Proceedings of the 22nd International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, pp. 24–36 (June 1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Nan, Y., Lei, Y., Dong-rui, F. (2011). An Efficient and Flexible Task Management for Many Cores. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers IV. Lecture Notes in Computer Science, vol 6760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24568-8_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-24568-8_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24567-1
Online ISBN: 978-3-642-24568-8
eBook Packages: Computer ScienceComputer Science (R0)