Abstract
Thread-level program parallelization is key for exploiting the hardware parallelism of the emerging multi-core systems. Several techniques have been proposed for program multithreading. However, the existing techniques do not address the following key issues associated with multithread execution of a given program: (a) Whether multithreaded execution is faster than sequential execution; (b) How many threads to spawn during program multithreading. In this paper, we address the above limitations. Specifically, we propose a novel approach – T-OPT – to determine how many threads to spawn during multithreaded execution of a given program region. The latter helps to check under-subscribing and over-subscribing of the hardware resources. This in turn facilitates exploitation on higher level of thread-level parallelism (TLP) than what can be achieved using the state-of-the-art. We show that, from program dependence standpoint, use of larger number of threads than advocated by the proposed approach does not yield higher degree of TLP. We present a couple of case studies and results using kernels, extracted from open source codes, to demonstrate the efficacy of our techniques on a real machine.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AMD’s 16-core Interlagos, processor, http://www.tgdaily.com/content/view/42125/135/
FreeBSD, http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/anoncvs.html
GCC, the GNU Compiler Collection, http://gcc.gnu.org/
Intel® CoreTM i7 Processor Datasheet, Vol. 1, http://download.intel.com/design/processor/datashts/320834.pdf
OpenBSD, http://www.openbsd.org/
OpenMP Specification, version 2.5, http://www.openmp.org/drupal/mp-documents/spec25.pdf
SPEC CFP2000, http://www.spec.org/cpu2000/CFP2000
SPEC CFP2006, http://www.spec.org/cpu2006/CFP2006
SPEC CPU Benchmarks, http://www.spec.org/benchmarks.html
SPEC CPU2006, http://www.spec.org/cpu2006
Wine, http://sourceforge.net/project/showfiles.php?group_id=6241
Aiken, A., Nicolau, A.: Optimal loop parallelization. In: Proceedings of the SIGPLAN 1988 Conference on Programming Language Design and Implementation, Atlanta, GA (June 1988)
Aiken, A.S.: Compaction-based parallelization. PhD thesis, Dept. of Computer Science, Cornell University (August 1988)
Allen, J.R., Kennedy, K., Porterfield, C., Warren, J.: Conversion of control dependence to data dependence. In: Conference Record of the Tenth Annual ACM Symposium on the Principles of Programming Languages, Austin, TX (January 1983)
Anderson, T.E., Lazowska, D.D., Levy, H.M.: The performance implications of thread management alternatives for shared-memory multiprocessors. In: SIGMETRICS 1989: Proceedings of the 1989 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Oakland, CA, pp. 49–60 (1989)
Banerjee, U.: Dependence Analysis. Kluwer Academic Publishers, Boston (1997)
Billionnet, A., Costa, M.C., Sutter, A.: An efficient algorithm for a task allocation problem. Journal of the ACM 39(3), 502–518 (1992)
Bokhari, S.: Dual processor scheduling with dynamic reassignment. IEEE Transactions on Software Engineering SE-5, 341–349 (1979)
Bokhari, S.: On the mapping problem. IEEE Transactions on Computers C-30, 207–214 (1981)
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. The MIT Press, Cambridge (1990)
Cytron, R.: Compile-time Scheduling and Optimization for Asynchronous Machines. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign (October 1984)
Ebcioğlu, K.: A compilation technique for software pipelining of loops with conditional jumps. In: Proceedings of the 20th Workshop on Microprogramming, Colarado Springs, CO (December 1987)
Ebcioğlu, K., Groves, R.D., Kim, K.C., Silberman, G.M., Ziv, I.: VLIW compilation techniques in a superscalar environment. In: Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, pp. 36–48 (1994)
Fisher, J.A.: VLIW architectures: an inevitable standard for the future? Supercomputer 7(2), 29–36 (1990)
Kejariwal, A.: On the evaluation and extraction of thread-level parallelism in ordinary programs. PhD thesis, University of California, Irvine, CA (January 2008)
Kejariwal, A., Nicolau, A.: Reading list of mutual exclusion, locking, synchronization and concurrent objects, http://www.ics.uci.edu/~akejariw/ConcurrentExecutionReadingList.pdf
Kuck, D.: The Structure of Computers and Computations, vol. 1. John Wiley and Sons, New York (1978)
Lam, M.: Software pipelining: An effective scheduing technique for VLIW machines. In: Proceedings of the SIGPLAN 1988 Conference on Programming Language Design and Implementation, Atlanta, GA (June 1988)
Lundstrom, S.F., Barnes, G.H.: A controllable MIMD architectures. In: Proceedings of the 1980 International Conference on Parallel Processing, St. Charles, IL, pp. 19–27 (August 1980)
Nakatani, T., Ebcioğlu, K.: Making compaction based parallelization affordable. IEEE Transactions on Parallel and Distributed Systems 4(9), 1014–1029 (1993)
Narlikar, G.J.: Scheduling threads for low space requirement and good locality. In: Proceedings of the 11th Annual ACM Symposium on Parallel Algorithms and Architectures, Saint Malo, France, pp. 83–95 (1999)
Nicolau, A.: Parallelism, memory anti-aliasing and correctness for trace scheduling compilers (disambiguation, flow-analysis, compaction). PhD thesis, Dept. of Computer Science, Yale University (1984)
Nicolau, A., Li, G., Kejariwal, A.: Techniques for efficient placement of synchronization primitives. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Raleigh, NC, USA, pp. 199–208 (February 2009)
Nicolau, A., Li, G., Veidenbaum, A.V., Kejariwal, A.: Synchronization optimizations for efficient execution on multi-cores. In: Proceedings of the 23rd ACM International Conference on Supercomputing, New York, NY, pp. 169–180 (2009)
Price, C.C.: Task allocation in distributed systems: A survey of practical strategies. In: Proceedings of the ACM 1982 Conference, pp. 176–181 (1982)
Rau, B.R., Fisher, J.A.: Instruction level parallel processing: History, overview and perspective 7(1), 97 (January 1993)
Rau, B.R., Glaeser, C.D.: Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In: Proceedings of the 14th Annual Workshop on Microprogramming, Chatham, MA, pp. 183–198 (December 1981)
Reinders, J.: Intel threading building blocks. O’Reilly & Associates, Inc., Sebastopol (2007)
Su, B., Ding, S., Xia, J.: URPR - an extension of urcr for software pipelining. In: Proceedings of the 19th Workshop on Microprogramming, New York, NY (October 1986)
Suleman, M.A., Qureshi, M.K., Patt, Y.N.: Feedback-driven threading: Power-efficient and high-performance execution of multi-threaded workloads on CMPs. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, Seattle, WA, pp. 277–286 (2008)
Weissman, B.: Performance counters and state sharing annotations: a unified approach to thread locality. In: Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, CA, pp. 127–138 (1998)
Wolfe, M.: The definition of dependence distance 16(4), 1114–1116 (1994)
Wolfe, M.J.: Optimizing Supercompilers for Supercomputers. The MIT Press, Cambridge (1989)
Wu, P., Kejariwal, A., Caşcaval, C.: Compiler-driven dependence profiling to guide program parallelization. In: Proceedings of the 21st International Workshop on Languages and Compilers for Parallel Computing, Alberta, Canada (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nicolau, A., Kejariwal, A. (2011). How Many Threads to Spawn during Program Multithreading?. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds) Languages and Compilers for Parallel Computing. LCPC 2010. Lecture Notes in Computer Science, vol 6548. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19595-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-19595-2_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19594-5
Online ISBN: 978-3-642-19595-2
eBook Packages: Computer ScienceComputer Science (R0)