Speculative Optimizations for Parallel Programs on Multicores

Nagarajan, Vijay; Gupta, Rajiv

doi:10.1007/978-3-642-13374-9_22

Vijay Nagarajan¹⁸ &
Rajiv Gupta¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5898))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

832 Accesses
5 Citations

Abstract

The advent of multicores presents a promising opportunity for exploiting fine grained parallelism present in programs. Programs parallelized in the above fashion, typically involve threads that communicate via shared memory, and synchronize with each other frequently to ensure that shared memory dependences between different threads are correctly enforced. Such frequent synchronization operations, although required, can greatly affect program performance. In addition to forcing threads to wait for other threads and do no useful work, they also force the compiler to make conservative assumptions in generating code.

We analyzed a set of parallel programs with fine grained barrier synchronizations, and observed that the synchronizations used by these programs enforce interprocessor dependences which arise relatively infrequently. Motivated by this observation, our approach consists of creating two versions of the section of code between consecutive synchronization operations; one version is a highly optimized version created under the optimistic assumption that no interprocessor dependences that are enforced by the synchronization operation will actually arise. The other version is unoptimized code created under the pessimistic assumption that interprocessor dependences will arise. At runtime, we first speculatively execute the optimistic code and if misspeculation occurs, the results of this version will be discarded and the non-speculative version will be executed. Since interprocessor dependences arise infrequently, misspeculation rate remains low. To detect misspeculation efficiently, we modify existing architectural support for data speculation and adapt it for multicores. We utilize this scheme to perform two speculative optimizations to improve parallel program performance. First, by speculatively executing past barrier synchronizations, we reduce time spent idling on barriers, translating into a 12% increase in performance. Second, by promoting shared variables to registers in the presence of synchronization, we reduce a significant amount of redundant loads, translating into an additional performance increase of 2.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Itanium software developers manual, http://www.intel.com/design/itanium/manuals/iiasdmanual.htm
Ding, C., Shen, X., Kelsey, K., Tice, C., Huang, R., Zhang, C.: Software behavior oriented parallelization. In: PLDI, pp. 223–234 (2007)
Google Scholar
Gupta, R.: The fuzzy barrier: A mechanism for high speed synchronization of processors. In: ASPLOS, pp. 54–63 (1989)
Google Scholar
Herlihy, M., Moss, J.E.B.: Transactional memory: Architectural support for lock-free data structures. In: ISCA (1993)
Google Scholar
Kelsey, K., Bai, T., Ding, C., Zhang, C.: Fast track: A software system for speculative program optimization. In: CGO, pp. 157–168 (2009)
Google Scholar
Lin, J., Chen, T., Hsu, W.C., Yew, P.C.: Speculative register promotion using advanced load address table (alat). In: CGO, pp. 125–134 (2003)
Google Scholar
Lin, J., Chen, T., Hsu, W.C., Yew, P.C., Ju, R.D.C., Ngai, T.F., Chan, S.: A compiler framework for speculative analysis and optimizations. In: PLDI, pp. 289–299 (2003)
Google Scholar
Marathe, V.J., Scherer III, W.N., Scott, M.L.: Adaptive software transactional memory. In: Fraigniaud, P. (ed.) DISC 2005. LNCS, vol. 3724, pp. 354–368. Springer, Heidelberg (2005)
Chapter Google Scholar
Martínez, J.F., Torrellas, J.: Speculative synchronization: applying thread-level speculation to explicitly parallel applications. In: ASPLOS, pp. 18–29 (2002)
Google Scholar
Minh, C.C., Trautmann, M., Chung, J., McDonald, A., Bronson, N., Casper, J., Kozyrakis, C., Olukotun, K.: An effective hybrid transactional memory system with strong isolation guarantees. In: ISCA, pp. 69–80 (2007)
Google Scholar
Nagarajan, V., Gupta, R.: Ecmon: Exposing cache events for monitoring. In: ISCA (2009)
Google Scholar
Neelakantam, N., Rajwar, R., Srinivas, S., Srinivasan, U., Zilles, C.: Hardware atomicity for reliable software speculation. In: ISCA, pp. 174–185 (2007)
Google Scholar
Postiff, M., Greene, D., Mudge, T.N.: The store-load address table and speculative register promotion. In: MICRO, pp. 235–244 (2000)
Google Scholar
Rajwar, R., Goodman, J.R.: Speculative lock elision: enabling highly concurrent multithreaded execution. In: MICRO, pp. 294–305 (2001)
Google Scholar
Renau, J., Fraguela, B., Tuck, J., Liu, W., Prvulovic, M., Ceze, L., Sarangi, S., Sack, P., Strauss, K., Montesinos, P.: SESC simulator (January 2005), http://sesc.sourceforge.net
Rinard, M.C.: Analysis of multithreaded programs. In: Cousot, P. (ed.) SAS 2001. LNCS, vol. 2126, pp. 1–19. Springer, Heidelberg (2001)
Chapter Google Scholar
Salcianu, A., Rinard, M.C.: Pointer and escape analysis for multithreaded programs. In: PPOPP, pp. 12–23 (2001)
Google Scholar
Sampson, J., González, R., Collard, J.F., Jouppi, N.P., Schlansker, M., Calder, B.: Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers. In: MICRO, pp. 235–246 (2006)
Google Scholar
Shirako, J., Zhao, J.M., Nandivada, V.K., Sarkar, V.: Chunking parallel loops in the presence of synchronization. In: ICS, pp. 181–192 (2009)
Google Scholar
Shriraman, A., Spear, M.F., Hossain, H., Marathe, V.J., Dwarkadas, S., Scott, M.L.: An integrated hardware-software approach to flexible transactional memory. In: ISCA, pp. 104–115 (2007)
Google Scholar
Tian, C., Feng, M., Nagarajan, V., Gupta, R.: Copy or discard execution model for speculative parallelization on multicores. In: MICRO, pp. 330–341 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

CSE Department, University of California, Riverside, Riverside, CA, 92521
Vijay Nagarajan & Rajiv Gupta

Authors

Vijay Nagarajan
View author publications
You can also search for this author in PubMed Google Scholar
Rajiv Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, University of Delaware, 19716, Newark, DE, USA
Guang R. Gao & Xiaoming Li &
Department of Computer and Information Sciences, University of Delaware, 19716, Newark, DE, USA
Lori L. Pollock & John Cavazos &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nagarajan, V., Gupta, R. (2010). Speculative Optimizations for Parallel Programs on Multicores. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds) Languages and Compilers for Parallel Computing. LCPC 2009. Lecture Notes in Computer Science, vol 5898. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13374-9_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-13374-9_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13373-2
Online ISBN: 978-3-642-13374-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics