Just-In-Time Locality and Percolation for Optimizing Irregular Applications on a Manycore Architecture

Tan, Guangming; Sreedhar, Vugranam C.; Gao, Guang R.

doi:10.1007/978-3-540-89740-8_23

Guangming Tan^2,3,
Vugranam C. Sreedhar⁴ &
Guang R. Gao³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5335))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

891 Accesses
4 Citations

Abstract

This paper presents a new technique to optimize locality of irregular programs by leveraging parallelism on a massive many-core architecture – IBM Cyclops64 (C64). The key idea is to achieve Just-In-Time Locality which ensures that data are available locally for computation to use. The proposed percolation model for Just-In-Time Locality moves data proactively close to the computation and organizes the data layout such that locality is exploited effectively. The percolation model opens a door for exploiting locality through parallelism, which is an advantage of the future many-core architecture. We implemented the percolation strategy in the context of two irregular applications on C64. Our experimental results are very encouraging and we get an order of magnitude improvement in performance of irregular applications. We also drastically improve the scalability of the applications that we studied.

This work has been performed when the first author is a visiting scholar at Computer Architecture and Parallel System Laboratory (CAPSL) of University of Delaware. He is currently associated with Institute of Computing Technology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Steffan, J.G., Colohan, C.B., Zhai, A., Mowry, T.C.: A scalable approach to thread-level speculation. In: Proceedings of the 27th Annual International Symposium on Computer Architecture (2000)
Google Scholar
Rauchwerger, L., Zhan, Y., Torrellas, J.: Hardware for speculative run-time parallelization in distributed shared memory multiprocessors. In: Proceedings of the 4th International Symposium on High-Performance Computer Architecture, p. 162 (1998)
Google Scholar
Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., Chew, P.: Optimistic parallelism requires abstractions. In: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pp. 211–222 (2007)
Google Scholar
Zhu, W., Sreedhar, V.C., Hu, Z., Gao, G.R.: Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures. In: The 34th International Symposium on Computer Architecture (2007)
Google Scholar
Gordon, M., Thies, W., Amarasinghe, S.: Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In: International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA (2006)
Google Scholar
Bader, D.A.: Hpcs scalable synthetic compact applications 2 graph analysis (2006), http://www.highproductivity.org/SSCABmks.htm
Grama, A., Gupta, A., Karypis, G., Kumar, V.: Introduction to Parallel Computing. Addison Wesley, Reading (2003)
MATH Google Scholar
Zuker, M., Mathews, D.H., Turner, D.H.: Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide. Kluwer Academic Publishers, Dordrecht (1999)
Google Scholar
Tan, G., Feng, S., Sun, N.: Locality and parallelism optimization for dynamic programming algorithm in bioinformatics. In: SC 2006: Proceedings of the, ACM/IEEE conference on Supercomputing, p. 78. ACM, New York (2006)
Chapter Google Scholar
Cuvillo, J., Zhu, W., Hu, Z., Gao, G.R.: Tiny threads: a thread virtual machine for the cyclops-64 cellular architecture. In: Fifth Workshop on Massively Parallel Processing (WMPP), held in conjunction with the 19th rnational Parallel and Distributed Processing System (2005)
Google Scholar
Cuvillo, J., Zhu, W., Gao, G.R.: Landing openmp on cyclops-64: An efficient mapping of openmp to a many-core system-on-a-chip. In: The 3rd ACM International Conference on Computing Frontiers, Ischia, Italy (2005)
Google Scholar
Gao, G.R., Likharev, K.K., Messina, P.C., Sterling, T.L.: Hybrid technology multi-threaded architecture. In: Proceedings of Frontiers 1996: The Sixth Symposium on the Frontiers of Massively Parallel Computation, pp. 98–105 (1996)
Google Scholar
Amaral, J.N., Gao, G.R., Merkey, P., Sterling, T., Ruiz, Z., Ryan, S.: Performance prediction for the htmt: A programming example. In: TFP3 1999 (1999)
Google Scholar
Gao, G., Amaral, J.N., Marquez, A., Theobald, K.: A refinement of the ”htmt” program execution model. Technical report, CAPSL, University of Delaware (1998)
Google Scholar
Wu, Y.: Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching. In: PLDI 2002: Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, pp. 210–221. ACM, New York (2002)
Chapter Google Scholar
Zhang, Z., Torrellas, J.: Speeding up irregular applicaitons in shared-memory multiprocessors: Memory binding and group prefetching. In: 22nd International Symposium on Computer Architecture (1995)
Google Scholar
Mowry, T., Gupta, A.: Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. Journal of Parallel and Distributed Computing 12, 87–106 (1991)
Article Google Scholar
Collins, J.D., Tullsen, D.M., Wang, H., Shen, J.P.: Dynamic speculative precomputation. In: The 34th Annual International Symposium on Microarchitecture (2001)
Google Scholar
Zhang, W., Tullsen, D.M.: Accelerating and adapting precomputation threads for efficient prefetching. In: 3th International Symposium on High Performance Computer Architecture (2007)
Google Scholar
Erez, M., Ahn, J.H., Gummaraju, J., Rosenblum, M., Dally, W.J.: Executing irregular scientific applications on stream architectures. In: ICS 2007: Proceedings of the 21st annual international conference on Supercomputing, pp. 93–104. ACM, New York (2007)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Delaware, USA
Guangming Tan
Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Science, China
Guangming Tan & Guang R. Gao
IBM T. J. Watson Research Center, USA
Vugranam C. Sreedhar

Authors

Guangming Tan
View author publications
You can also search for this author in PubMed Google Scholar
Vugranam C. Sreedhar
View author publications
You can also search for this author in PubMed Google Scholar
Guang R. Gao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computing Science, University of Alberta, T6G-2E8, Edmonton, AB, Canada
José Nelson Amaral

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tan, G., Sreedhar, V.C., Gao, G.R. (2008). Just-In-Time Locality and Percolation for Optimizing Irregular Applications on a Manycore Architecture. In: Amaral, J.N. (eds) Languages and Compilers for Parallel Computing. LCPC 2008. Lecture Notes in Computer Science, vol 5335. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89740-8_23

Download citation

DOI: https://doi.org/10.1007/978-3-540-89740-8_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89739-2
Online ISBN: 978-3-540-89740-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics