Skip to main content

Reconfigurable Preexecution in Data Parallel Applications on Multicore Systems

  • Chapter
  • First Online:
Electrical Engineering and Intelligent Systems

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 130))

  • 1680 Accesses

Abstract

The performance of data-intensive applications is often limited not only by the computational power of current computers but also by the performance gap between the CPU and the main system memory. Data prefetch mechanisms mask this latency by moving data closer to the CPU automatically. These methods rely on predicting future memory addresses; however, they are not suited for applications with random memory access patterns. Preexecution is a prefetch method which executes a slice of the original algorithm in parallel with the main thread to calculate memory addresses and issue loads early. In this paper we propose a lightweight software preexecution strategy for data parallel applications that accelerates the main working thread with an adaptive preexecution helper thread acting as a perfect predictor and consuming cache misses. With automatic parameter tuning the helper thread adapts to the application and system it is executed on. This method was able to achieve an average speedup of 10–30% in a real-life data parallel application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bryant RE (2007) Data-intensive supercomputing: the case for DISC. Technical report CMU-CS-07-128. School of Computer Science, Carnegie Mellon University

    Google Scholar 

  2. Perkins LS, Andrews P, Panda D, Morton D, Bonica R, Werstiuk N, Kreiser R (2006) Data intensive computing. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing SC’06, New York, NY, USA, p 69

    Google Scholar 

  3. Luk C-K (2001) Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors. ACM SIGARCH Comput Architect News 29(2): 40–51

    Article  Google Scholar 

  4. Chappell RS, Stark J, Kim SP, Reinhardt SK, Patt YN (1999) Simultaneous subordinate microthreading (SSMT). Int Symp Comput Architect 27(2):186–195

    Article  Google Scholar 

  5. Dubois M (2004) Fighting the memory wall with assisted execution. In: Proceedings of the 1st conference on computing frontiers. Ischia, Italy, pp 168–180

    Google Scholar 

  6. Kim D et al (2004) Physical experimentation with prefetching helper threads on Intel’s hyper-threaded processors. In: Proceedings of the international symposium on code generation and optimization, pp 27–38

    Google Scholar 

  7. Malhotra V, Kozyrakis C (2006) Library-based prefetching for pointer-intensive applications. Technical Report. Available online. http://csl.stanford.edu/~christos/publications/2006.library_prefetch.manuscript.pdf

  8. Roth A, Moshovos A, Sohi GS (1998) Dependence based prefetching for linked data structures. ACM SIGPLAN Notices 33(11):115–126

    Article  Google Scholar 

  9. Zhou J, Cieslewicz J, Ross KA, Shah M (2005) Improving database performance on simultaneous multithreading processors. In: Proceedings of the 31st international conference on very large data bases. Trondheim, Norway, pp 49–60

    Google Scholar 

  10. Zilles C, Sohi G (2001) Execution-based prediction using speculative slices. ACM SIGARCH Comput Architect News 29(2):2–13

    Article  Google Scholar 

  11. Hillis WD, Steele GL (1986) Data parallel algorithms. Commun ACM 29(12):1170–1183

    Article  Google Scholar 

  12. Cintra M, Llanos D (2003) Toward efficient and robust software speculative parallelization on multiprocessors. In: Proceedings of the ninth ACM SIGPLAN symposium on principles and practice of parallel programming, pp 13–24

    Google Scholar 

  13. Song Y, Kalogeropulos S, Tirumalai P (2005) Design and implementation of a compiler framework for helper threading on multi-core processors. In: Proceedings of 14th international conference on parallel architectures and compilation techniques (PACT’05), pp 99–109

    Google Scholar 

  14. Kim D, Yeung D (2002) Design and evaluation of compiler algorithms for pre-execution. ACM SIGPLAN Notices 37(10):159

    Article  Google Scholar 

  15. Kim D, Yeung D (2004) A study of source-level compiler algorithms for automatic construction of pre-execution code. ACM Trans Comput Syst 22(3):326–379

    Article  Google Scholar 

  16. Ro WW, Gaudiot J-L (2004) SPEAR: a hybrid model for speculative pre-execution. In: Proceedings of the 18th international parallel and distributed processing symposium, pp 75–84

    Google Scholar 

  17. Dundas J, Mudge T (1997) Improving data cache performance by pre-executing instructions under a cache miss. In: Proceedings of the 11th international conference on supercomputing (ICS’97). New York, NY, USA, pp 68–75

    Google Scholar 

  18. Mutlu O, Stark J, Wilkerson C, Patt YN (2003) Runahead execution: an alternative to very large instruction windows for out-of-order processors. In: Proceedings of the 9th international symposium on high-performance computer architecture, pp 129–140

    Google Scholar 

  19. Dudás Á, Juhász S (2011) Using pre-execution and helper threads for speeding up data intensive applications. In: Proceedings of the world congress on engineering 2011 (WCE 2011). Lecture Notes in Engineering and Computer Science, London, UK, 6–8 July 2011, pp 1288–1293

    Google Scholar 

  20. Nelder JA, Mead R (1965) A simplex method for function minimization. The Comput J 7(4): 308–313

    Article  MATH  Google Scholar 

  21. Juhász S, Dudás Á (2009) Adapting hash table design to real-life datasets. In: Proceedings of the IADIS european conference on informatics 2009 (Part of the IADIS multiconference of computer science and information systems 2009). Algarve, Portugal, pp 3–10

    Google Scholar 

Download references

Acknowledgment

This project is supported by the New Hungary Development Plan (Project ID: TÁMOP-4.2.1/B-09/1/KMR-2010-0002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ákos Dudás .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Dudás, Á., Juhász, S. (2013). Reconfigurable Preexecution in Data Parallel Applications on Multicore Systems. In: Ao, SI., Gelman, L. (eds) Electrical Engineering and Intelligent Systems. Lecture Notes in Electrical Engineering, vol 130. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-2317-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-2317-1_3

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-2316-4

  • Online ISBN: 978-1-4614-2317-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics