Skip to main content

How Many Threads to Spawn during Program Multithreading?

  • Conference paper
Languages and Compilers for Parallel Computing (LCPC 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6548))

Abstract

Thread-level program parallelization is key for exploiting the hardware parallelism of the emerging multi-core systems. Several techniques have been proposed for program multithreading. However, the existing techniques do not address the following key issues associated with multithread execution of a given program: (a) Whether multithreaded execution is faster than sequential execution; (b) How many threads to spawn during program multithreading. In this paper, we address the above limitations. Specifically, we propose a novel approach – T-OPT – to determine how many threads to spawn during multithreaded execution of a given program region. The latter helps to check under-subscribing and over-subscribing of the hardware resources. This in turn facilitates exploitation on higher level of thread-level parallelism (TLP) than what can be achieved using the state-of-the-art. We show that, from program dependence standpoint, use of larger number of threads than advocated by the proposed approach does not yield higher degree of TLP. We present a couple of case studies and results using kernels, extracted from open source codes, to demonstrate the efficacy of our techniques on a real machine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AMD’s 16-core Interlagos, processor, http://www.tgdaily.com/content/view/42125/135/

  2. FreeBSD, http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/anoncvs.html

  3. GCC, the GNU Compiler Collection, http://gcc.gnu.org/

  4. Intel® CoreTM i7 Processor Datasheet, Vol. 1, http://download.intel.com/design/processor/datashts/320834.pdf

  5. OpenBSD, http://www.openbsd.org/

  6. OpenMP Specification, version 2.5, http://www.openmp.org/drupal/mp-documents/spec25.pdf

  7. SPEC CFP2000, http://www.spec.org/cpu2000/CFP2000

  8. SPEC CFP2006, http://www.spec.org/cpu2006/CFP2006

  9. SPEC CPU Benchmarks, http://www.spec.org/benchmarks.html

  10. SPEC CPU2006, http://www.spec.org/cpu2006

  11. Wine, http://sourceforge.net/project/showfiles.php?group_id=6241

  12. Aiken, A., Nicolau, A.: Optimal loop parallelization. In: Proceedings of the SIGPLAN 1988 Conference on Programming Language Design and Implementation, Atlanta, GA (June 1988)

    Google Scholar 

  13. Aiken, A.S.: Compaction-based parallelization. PhD thesis, Dept. of Computer Science, Cornell University (August 1988)

    Google Scholar 

  14. Allen, J.R., Kennedy, K., Porterfield, C., Warren, J.: Conversion of control dependence to data dependence. In: Conference Record of the Tenth Annual ACM Symposium on the Principles of Programming Languages, Austin, TX (January 1983)

    Google Scholar 

  15. Anderson, T.E., Lazowska, D.D., Levy, H.M.: The performance implications of thread management alternatives for shared-memory multiprocessors. In: SIGMETRICS 1989: Proceedings of the 1989 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Oakland, CA, pp. 49–60 (1989)

    Google Scholar 

  16. Banerjee, U.: Dependence Analysis. Kluwer Academic Publishers, Boston (1997)

    MATH  Google Scholar 

  17. Billionnet, A., Costa, M.C., Sutter, A.: An efficient algorithm for a task allocation problem. Journal of the ACM 39(3), 502–518 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  18. Bokhari, S.: Dual processor scheduling with dynamic reassignment. IEEE Transactions on Software Engineering SE-5, 341–349 (1979)

    Article  MathSciNet  Google Scholar 

  19. Bokhari, S.: On the mapping problem. IEEE Transactions on Computers C-30, 207–214 (1981)

    Article  MathSciNet  Google Scholar 

  20. Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. The MIT Press, Cambridge (1990)

    MATH  Google Scholar 

  21. Cytron, R.: Compile-time Scheduling and Optimization for Asynchronous Machines. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign (October 1984)

    Google Scholar 

  22. Ebcioğlu, K.: A compilation technique for software pipelining of loops with conditional jumps. In: Proceedings of the 20th Workshop on Microprogramming, Colarado Springs, CO (December 1987)

    Google Scholar 

  23. Ebcioğlu, K., Groves, R.D., Kim, K.C., Silberman, G.M., Ziv, I.: VLIW compilation techniques in a superscalar environment. In: Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, pp. 36–48 (1994)

    Google Scholar 

  24. Fisher, J.A.: VLIW architectures: an inevitable standard for the future? Supercomputer 7(2), 29–36 (1990)

    Google Scholar 

  25. Kejariwal, A.: On the evaluation and extraction of thread-level parallelism in ordinary programs. PhD thesis, University of California, Irvine, CA (January 2008)

    Google Scholar 

  26. Kejariwal, A., Nicolau, A.: Reading list of mutual exclusion, locking, synchronization and concurrent objects, http://www.ics.uci.edu/~akejariw/ConcurrentExecutionReadingList.pdf

  27. Kuck, D.: The Structure of Computers and Computations, vol. 1. John Wiley and Sons, New York (1978)

    Google Scholar 

  28. Lam, M.: Software pipelining: An effective scheduing technique for VLIW machines. In: Proceedings of the SIGPLAN 1988 Conference on Programming Language Design and Implementation, Atlanta, GA (June 1988)

    Google Scholar 

  29. Lundstrom, S.F., Barnes, G.H.: A controllable MIMD architectures. In: Proceedings of the 1980 International Conference on Parallel Processing, St. Charles, IL, pp. 19–27 (August 1980)

    Google Scholar 

  30. Nakatani, T., Ebcioğlu, K.: Making compaction based parallelization affordable. IEEE Transactions on Parallel and Distributed Systems 4(9), 1014–1029 (1993)

    Article  Google Scholar 

  31. Narlikar, G.J.: Scheduling threads for low space requirement and good locality. In: Proceedings of the 11th Annual ACM Symposium on Parallel Algorithms and Architectures, Saint Malo, France, pp. 83–95 (1999)

    Google Scholar 

  32. Nicolau, A.: Parallelism, memory anti-aliasing and correctness for trace scheduling compilers (disambiguation, flow-analysis, compaction). PhD thesis, Dept. of Computer Science, Yale University (1984)

    Google Scholar 

  33. Nicolau, A., Li, G., Kejariwal, A.: Techniques for efficient placement of synchronization primitives. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Raleigh, NC, USA, pp. 199–208 (February 2009)

    Google Scholar 

  34. Nicolau, A., Li, G., Veidenbaum, A.V., Kejariwal, A.: Synchronization optimizations for efficient execution on multi-cores. In: Proceedings of the 23rd ACM International Conference on Supercomputing, New York, NY, pp. 169–180 (2009)

    Google Scholar 

  35. Price, C.C.: Task allocation in distributed systems: A survey of practical strategies. In: Proceedings of the ACM 1982 Conference, pp. 176–181 (1982)

    Google Scholar 

  36. Rau, B.R., Fisher, J.A.: Instruction level parallel processing: History, overview and perspective 7(1), 97 (January 1993)

    Google Scholar 

  37. Rau, B.R., Glaeser, C.D.: Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In: Proceedings of the 14th Annual Workshop on Microprogramming, Chatham, MA, pp. 183–198 (December 1981)

    Google Scholar 

  38. Reinders, J.: Intel threading building blocks. O’Reilly & Associates, Inc., Sebastopol (2007)

    Google Scholar 

  39. Su, B., Ding, S., Xia, J.: URPR - an extension of urcr for software pipelining. In: Proceedings of the 19th Workshop on Microprogramming, New York, NY (October 1986)

    Google Scholar 

  40. Suleman, M.A., Qureshi, M.K., Patt, Y.N.: Feedback-driven threading: Power-efficient and high-performance execution of multi-threaded workloads on CMPs. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, Seattle, WA, pp. 277–286 (2008)

    Google Scholar 

  41. Weissman, B.: Performance counters and state sharing annotations: a unified approach to thread locality. In: Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, CA, pp. 127–138 (1998)

    Google Scholar 

  42. Wolfe, M.: The definition of dependence distance 16(4), 1114–1116 (1994)

    Google Scholar 

  43. Wolfe, M.J.: Optimizing Supercompilers for Supercomputers. The MIT Press, Cambridge (1989)

    MATH  Google Scholar 

  44. Wu, P., Kejariwal, A., Caşcaval, C.: Compiler-driven dependence profiling to guide program parallelization. In: Proceedings of the 21st International Workshop on Languages and Compilers for Parallel Computing, Alberta, Canada (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nicolau, A., Kejariwal, A. (2011). How Many Threads to Spawn during Program Multithreading?. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds) Languages and Compilers for Parallel Computing. LCPC 2010. Lecture Notes in Computer Science, vol 6548. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19595-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19595-2_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19594-5

  • Online ISBN: 978-3-642-19595-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics