How Many Threads to Spawn during Program Multithreading?

Nicolau, Alexandru; Kejariwal, Arun

doi:10.1007/978-3-642-19595-2_12

Alexandru Nicolau¹⁷ &
Arun Kejariwal¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6548))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

860 Accesses
1 Citations

Abstract

Thread-level program parallelization is key for exploiting the hardware parallelism of the emerging multi-core systems. Several techniques have been proposed for program multithreading. However, the existing techniques do not address the following key issues associated with multithread execution of a given program: (a) Whether multithreaded execution is faster than sequential execution; (b) How many threads to spawn during program multithreading. In this paper, we address the above limitations. Specifically, we propose a novel approach – T-OPT – to determine how many threads to spawn during multithreaded execution of a given program region. The latter helps to check under-subscribing and over-subscribing of the hardware resources. This in turn facilitates exploitation on higher level of thread-level parallelism (TLP) than what can be achieved using the state-of-the-art. We show that, from program dependence standpoint, use of larger number of threads than advocated by the proposed approach does not yield higher degree of TLP. We present a couple of case studies and results using kernels, extracted from open source codes, to demonstrate the efficacy of our techniques on a real machine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AMD’s 16-core Interlagos, processor, http://www.tgdaily.com/content/view/42125/135/
FreeBSD, http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/anoncvs.html
GCC, the GNU Compiler Collection, http://gcc.gnu.org/
Intel^® Core^TM i7 Processor Datasheet, Vol. 1, http://download.intel.com/design/processor/datashts/320834.pdf
OpenBSD, http://www.openbsd.org/
OpenMP Specification, version 2.5, http://www.openmp.org/drupal/mp-documents/spec25.pdf
SPEC CFP2000, http://www.spec.org/cpu2000/CFP2000
SPEC CFP2006, http://www.spec.org/cpu2006/CFP2006
SPEC CPU Benchmarks, http://www.spec.org/benchmarks.html
SPEC CPU2006, http://www.spec.org/cpu2006
Wine, http://sourceforge.net/project/showfiles.php?group_id=6241
Aiken, A., Nicolau, A.: Optimal loop parallelization. In: Proceedings of the SIGPLAN 1988 Conference on Programming Language Design and Implementation, Atlanta, GA (June 1988)
Google Scholar
Aiken, A.S.: Compaction-based parallelization. PhD thesis, Dept. of Computer Science, Cornell University (August 1988)
Google Scholar
Allen, J.R., Kennedy, K., Porterfield, C., Warren, J.: Conversion of control dependence to data dependence. In: Conference Record of the Tenth Annual ACM Symposium on the Principles of Programming Languages, Austin, TX (January 1983)
Google Scholar
Anderson, T.E., Lazowska, D.D., Levy, H.M.: The performance implications of thread management alternatives for shared-memory multiprocessors. In: SIGMETRICS 1989: Proceedings of the 1989 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Oakland, CA, pp. 49–60 (1989)
Google Scholar
Banerjee, U.: Dependence Analysis. Kluwer Academic Publishers, Boston (1997)
MATH Google Scholar
Billionnet, A., Costa, M.C., Sutter, A.: An efficient algorithm for a task allocation problem. Journal of the ACM 39(3), 502–518 (1992)
Article MathSciNet MATH Google Scholar
Bokhari, S.: Dual processor scheduling with dynamic reassignment. IEEE Transactions on Software Engineering SE-5, 341–349 (1979)
Article MathSciNet Google Scholar
Bokhari, S.: On the mapping problem. IEEE Transactions on Computers C-30, 207–214 (1981)
Article MathSciNet Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. The MIT Press, Cambridge (1990)
MATH Google Scholar
Cytron, R.: Compile-time Scheduling and Optimization for Asynchronous Machines. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign (October 1984)
Google Scholar
Ebcioğlu, K.: A compilation technique for software pipelining of loops with conditional jumps. In: Proceedings of the 20th Workshop on Microprogramming, Colarado Springs, CO (December 1987)
Google Scholar
Ebcioğlu, K., Groves, R.D., Kim, K.C., Silberman, G.M., Ziv, I.: VLIW compilation techniques in a superscalar environment. In: Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, pp. 36–48 (1994)
Google Scholar
Fisher, J.A.: VLIW architectures: an inevitable standard for the future? Supercomputer 7(2), 29–36 (1990)
Google Scholar
Kejariwal, A.: On the evaluation and extraction of thread-level parallelism in ordinary programs. PhD thesis, University of California, Irvine, CA (January 2008)
Google Scholar
Kejariwal, A., Nicolau, A.: Reading list of mutual exclusion, locking, synchronization and concurrent objects, http://www.ics.uci.edu/~akejariw/ConcurrentExecutionReadingList.pdf
Kuck, D.: The Structure of Computers and Computations, vol. 1. John Wiley and Sons, New York (1978)
Google Scholar
Lam, M.: Software pipelining: An effective scheduing technique for VLIW machines. In: Proceedings of the SIGPLAN 1988 Conference on Programming Language Design and Implementation, Atlanta, GA (June 1988)
Google Scholar
Lundstrom, S.F., Barnes, G.H.: A controllable MIMD architectures. In: Proceedings of the 1980 International Conference on Parallel Processing, St. Charles, IL, pp. 19–27 (August 1980)
Google Scholar
Nakatani, T., Ebcioğlu, K.: Making compaction based parallelization affordable. IEEE Transactions on Parallel and Distributed Systems 4(9), 1014–1029 (1993)
Article Google Scholar
Narlikar, G.J.: Scheduling threads for low space requirement and good locality. In: Proceedings of the 11th Annual ACM Symposium on Parallel Algorithms and Architectures, Saint Malo, France, pp. 83–95 (1999)
Google Scholar
Nicolau, A.: Parallelism, memory anti-aliasing and correctness for trace scheduling compilers (disambiguation, flow-analysis, compaction). PhD thesis, Dept. of Computer Science, Yale University (1984)
Google Scholar
Nicolau, A., Li, G., Kejariwal, A.: Techniques for efficient placement of synchronization primitives. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Raleigh, NC, USA, pp. 199–208 (February 2009)
Google Scholar
Nicolau, A., Li, G., Veidenbaum, A.V., Kejariwal, A.: Synchronization optimizations for efficient execution on multi-cores. In: Proceedings of the 23rd ACM International Conference on Supercomputing, New York, NY, pp. 169–180 (2009)
Google Scholar
Price, C.C.: Task allocation in distributed systems: A survey of practical strategies. In: Proceedings of the ACM 1982 Conference, pp. 176–181 (1982)
Google Scholar
Rau, B.R., Fisher, J.A.: Instruction level parallel processing: History, overview and perspective 7(1), 97 (January 1993)
Google Scholar
Rau, B.R., Glaeser, C.D.: Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In: Proceedings of the 14th Annual Workshop on Microprogramming, Chatham, MA, pp. 183–198 (December 1981)
Google Scholar
Reinders, J.: Intel threading building blocks. O’Reilly & Associates, Inc., Sebastopol (2007)
Google Scholar
Su, B., Ding, S., Xia, J.: URPR - an extension of urcr for software pipelining. In: Proceedings of the 19th Workshop on Microprogramming, New York, NY (October 1986)
Google Scholar
Suleman, M.A., Qureshi, M.K., Patt, Y.N.: Feedback-driven threading: Power-efficient and high-performance execution of multi-threaded workloads on CMPs. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, Seattle, WA, pp. 277–286 (2008)
Google Scholar
Weissman, B.: Performance counters and state sharing annotations: a unified approach to thread locality. In: Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, CA, pp. 127–138 (1998)
Google Scholar
Wolfe, M.: The definition of dependence distance 16(4), 1114–1116 (1994)
Google Scholar
Wolfe, M.J.: Optimizing Supercompilers for Supercomputers. The MIT Press, Cambridge (1989)
MATH Google Scholar
Wu, P., Kejariwal, A., Caşcaval, C.: Compiler-driven dependence profiling to guide program parallelization. In: Proceedings of the 21st International Workshop on Languages and Compilers for Parallel Computing, Alberta, Canada (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Irvine, Irvine, CA, 98612, USA
Alexandru Nicolau
Yahoo! Inc., Sunnyvale, CA, 94089, USA
Arun Kejariwal

Authors

Alexandru Nicolau
View author publications
You can also search for this author in PubMed Google Scholar
Arun Kejariwal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Rice University, 6100 Main Street, 77005-1892, Houston, TX, USA
Keith Cooper , John Mellor-Crummey & Vivek Sarkar , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nicolau, A., Kejariwal, A. (2011). How Many Threads to Spawn during Program Multithreading?. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds) Languages and Compilers for Parallel Computing. LCPC 2010. Lecture Notes in Computer Science, vol 6548. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19595-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-19595-2_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19594-5
Online ISBN: 978-3-642-19595-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics