Skip to main content

Just-In-Time Locality and Percolation for Optimizing Irregular Applications on a Manycore Architecture

  • Conference paper
Book cover Languages and Compilers for Parallel Computing (LCPC 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5335))

Abstract

This paper presents a new technique to optimize locality of irregular programs by leveraging parallelism on a massive many-core architecture – IBM Cyclops64 (C64). The key idea is to achieve Just-In-Time Locality which ensures that data are available locally for computation to use. The proposed percolation model for Just-In-Time Locality moves data proactively close to the computation and organizes the data layout such that locality is exploited effectively. The percolation model opens a door for exploiting locality through parallelism, which is an advantage of the future many-core architecture. We implemented the percolation strategy in the context of two irregular applications on C64. Our experimental results are very encouraging and we get an order of magnitude improvement in performance of irregular applications. We also drastically improve the scalability of the applications that we studied.

This work has been performed when the first author is a visiting scholar at Computer Architecture and Parallel System Laboratory (CAPSL) of University of Delaware. He is currently associated with Institute of Computing Technology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Steffan, J.G., Colohan, C.B., Zhai, A., Mowry, T.C.: A scalable approach to thread-level speculation. In: Proceedings of the 27th Annual International Symposium on Computer Architecture (2000)

    Google Scholar 

  2. Rauchwerger, L., Zhan, Y., Torrellas, J.: Hardware for speculative run-time parallelization in distributed shared memory multiprocessors. In: Proceedings of the 4th International Symposium on High-Performance Computer Architecture, p. 162 (1998)

    Google Scholar 

  3. Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., Chew, P.: Optimistic parallelism requires abstractions. In: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pp. 211–222 (2007)

    Google Scholar 

  4. Zhu, W., Sreedhar, V.C., Hu, Z., Gao, G.R.: Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures. In: The 34th International Symposium on Computer Architecture (2007)

    Google Scholar 

  5. Gordon, M., Thies, W., Amarasinghe, S.: Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In: International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA (2006)

    Google Scholar 

  6. Bader, D.A.: Hpcs scalable synthetic compact applications 2 graph analysis (2006), http://www.highproductivity.org/SSCABmks.htm

  7. Grama, A., Gupta, A., Karypis, G., Kumar, V.: Introduction to Parallel Computing. Addison Wesley, Reading (2003)

    MATH  Google Scholar 

  8. Zuker, M., Mathews, D.H., Turner, D.H.: Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide. Kluwer Academic Publishers, Dordrecht (1999)

    Google Scholar 

  9. Tan, G., Feng, S., Sun, N.: Locality and parallelism optimization for dynamic programming algorithm in bioinformatics. In: SC 2006: Proceedings of the, ACM/IEEE conference on Supercomputing, p. 78. ACM, New York (2006)

    Chapter  Google Scholar 

  10. Cuvillo, J., Zhu, W., Hu, Z., Gao, G.R.: Tiny threads: a thread virtual machine for the cyclops-64 cellular architecture. In: Fifth Workshop on Massively Parallel Processing (WMPP), held in conjunction with the 19th rnational Parallel and Distributed Processing System (2005)

    Google Scholar 

  11. Cuvillo, J., Zhu, W., Gao, G.R.: Landing openmp on cyclops-64: An efficient mapping of openmp to a many-core system-on-a-chip. In: The 3rd ACM International Conference on Computing Frontiers, Ischia, Italy (2005)

    Google Scholar 

  12. Gao, G.R., Likharev, K.K., Messina, P.C., Sterling, T.L.: Hybrid technology multi-threaded architecture. In: Proceedings of Frontiers 1996: The Sixth Symposium on the Frontiers of Massively Parallel Computation, pp. 98–105 (1996)

    Google Scholar 

  13. Amaral, J.N., Gao, G.R., Merkey, P., Sterling, T., Ruiz, Z., Ryan, S.: Performance prediction for the htmt: A programming example. In: TFP3 1999 (1999)

    Google Scholar 

  14. Gao, G., Amaral, J.N., Marquez, A., Theobald, K.: A refinement of the ”htmt” program execution model. Technical report, CAPSL, University of Delaware (1998)

    Google Scholar 

  15. Wu, Y.: Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching. In: PLDI 2002: Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, pp. 210–221. ACM, New York (2002)

    Chapter  Google Scholar 

  16. Zhang, Z., Torrellas, J.: Speeding up irregular applicaitons in shared-memory multiprocessors: Memory binding and group prefetching. In: 22nd International Symposium on Computer Architecture (1995)

    Google Scholar 

  17. Mowry, T., Gupta, A.: Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. Journal of Parallel and Distributed Computing 12, 87–106 (1991)

    Article  Google Scholar 

  18. Collins, J.D., Tullsen, D.M., Wang, H., Shen, J.P.: Dynamic speculative precomputation. In: The 34th Annual International Symposium on Microarchitecture (2001)

    Google Scholar 

  19. Zhang, W., Tullsen, D.M.: Accelerating and adapting precomputation threads for efficient prefetching. In: 3th International Symposium on High Performance Computer Architecture (2007)

    Google Scholar 

  20. Erez, M., Ahn, J.H., Gummaraju, J., Rosenblum, M., Dally, W.J.: Executing irregular scientific applications on stream architectures. In: ICS 2007: Proceedings of the 21st annual international conference on Supercomputing, pp. 93–104. ACM, New York (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tan, G., Sreedhar, V.C., Gao, G.R. (2008). Just-In-Time Locality and Percolation for Optimizing Irregular Applications on a Manycore Architecture. In: Amaral, J.N. (eds) Languages and Compilers for Parallel Computing. LCPC 2008. Lecture Notes in Computer Science, vol 5335. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89740-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89740-8_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89739-2

  • Online ISBN: 978-3-540-89740-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics