Skip to main content

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 6760))

  • 665 Accesses

Abstract

This paper presents the design and implementation of a runtime system (named “GodRunner”) on Godson-T many-core processor to support task-level parallelism efficiently and flexibly. GodRunner abstracts underlying hardware resource, providing ease-of-use programming interface. A two-grade task management mechanism is proposed to support both coarse-grained and fine-grained multithreading efficiently. Two load-balanced scheduling policies are combined flexibly in GodRunner. The software-controlled task management makes GodRunner more configurable and extensible than hard-wired ones. The experiment shows that the tasking overhead in GodRunner is as small as hundreds of cycles, which is about the hundreds of times faster than the conventional Pthread based multithreading on a SMP machine. Furthermore, our approach scales well and supports fine-grained tasks as small as 20k cycles optimally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Huang, H., Yuan, N., Lin, W., et al.: Architecture Supported Synchronization-Based Cache Coherence Protocol For Many-Core Processors. In: 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, In Conjunction with the 35th International Symposium on Computer Architecture, Beijing, China (June 2008)

    Google Scholar 

  2. Iftode, L., Singh, J.P., Li, K.: Scope Consistency: A Bridge between Release Consistency and Entry Consistency. In: Proc. of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures (1996)

    Google Scholar 

  3. Cuvillo, J.D., Zhu, W.R., Hu, Z., Gao, G.R.: TiNy threads: a thread virtual machine for the cyclops64 cellular architecture. In: Proceedings of 19th IEEE International Parallel and Distributed Processing Symposium, The Colorado, The USA (April 2005)

    Google Scholar 

  4. Kumar, S., Hughes, C.J., Nguyen, A.: Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors. In: Proceedings of 34th International Symposium on Computer Architecture, San Diego, California, USA (June 2007)

    Google Scholar 

  5. Palatin, P., Lhuillier, Y., Temam, O.: CAPSULE: hardware-assisted parallel execution of component-based Programs. In: Proceedings of 39th Annual IEEE/ACM International Symposium on Microarchitecture, Florida, USA (December 2006)

    Google Scholar 

  6. Chen, J., Juang, P., Ko, K., Contreras, G., Penry, D., Rangan, R., Stoler, A., Peh, L., Martonosi, M.: Hardware-Modulated Parallelism in Chip Multiprocessors. In: Proceedings of Workshop on Design, Architecture and Simulation of Chip Multi-Processors Conference (dasCMP), Spain, pp. 54–63 (November 2005)

    Google Scholar 

  7. Mueller, F.: Pthreads library interface. Technical report, Department of Computer Science, Florida State University (July 1993)

    Google Scholar 

  8. Rosenberg, J.: LWP user manual. Technical Report CMUITC- CMUITC-85-037, Information Technology Center, Carnegie- Mellon University (June 1985)

    Google Scholar 

  9. Nikolopoulos, D.S., Polychronopoulos, E.D., Papatheodorou, T.S.: Efficient runtime thread management for the Nano-Threads programming model. In: Proceedings of the 2nd IPPS/SPDP Workshop on Runtime Systems for Parallel Programming, Orlando, Florida, March 30, pp. 183–194 (1998)

    Google Scholar 

  10. Culler, D.E., Goldstein, S.C., Schauser, K.E., Eicken, T.V.: TAM – a compiler controlled threaded abstract machine. Journal of Parallel and Distributed Computing (July 1993)

    Google Scholar 

  11. Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. In: Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (June 1998)

    Google Scholar 

  12. Theobald, K.B.: EARTH: An Efficient Architecture for Running Threads. PhD dissertation, McGill University (May 1999)

    Google Scholar 

  13. Woo, S.C., Ohara, M., Torrie, E., Pal Singh, J., Gupta, A.: The SPLASH-2 Programs: Characterization and Methodological Considerations. In: Proceedings of the 22nd International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, pp. 24–36 (June 1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Nan, Y., Lei, Y., Dong-rui, F. (2011). An Efficient and Flexible Task Management for Many Cores. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers IV. Lecture Notes in Computer Science, vol 6760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24568-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24568-8_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24567-1

  • Online ISBN: 978-3-642-24568-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics