Abstract
The performance of data-intensive applications is often limited not only by the computational power of current computers but also by the performance gap between the CPU and the main system memory. Data prefetch mechanisms mask this latency by moving data closer to the CPU automatically. These methods rely on predicting future memory addresses; however, they are not suited for applications with random memory access patterns. Preexecution is a prefetch method which executes a slice of the original algorithm in parallel with the main thread to calculate memory addresses and issue loads early. In this paper we propose a lightweight software preexecution strategy for data parallel applications that accelerates the main working thread with an adaptive preexecution helper thread acting as a perfect predictor and consuming cache misses. With automatic parameter tuning the helper thread adapts to the application and system it is executed on. This method was able to achieve an average speedup of 10–30% in a real-life data parallel application.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bryant RE (2007) Data-intensive supercomputing: the case for DISC. Technical report CMU-CS-07-128. School of Computer Science, Carnegie Mellon University
Perkins LS, Andrews P, Panda D, Morton D, Bonica R, Werstiuk N, Kreiser R (2006) Data intensive computing. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing SC’06, New York, NY, USA, p 69
Luk C-K (2001) Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors. ACM SIGARCH Comput Architect News 29(2): 40–51
Chappell RS, Stark J, Kim SP, Reinhardt SK, Patt YN (1999) Simultaneous subordinate microthreading (SSMT). Int Symp Comput Architect 27(2):186–195
Dubois M (2004) Fighting the memory wall with assisted execution. In: Proceedings of the 1st conference on computing frontiers. Ischia, Italy, pp 168–180
Kim D et al (2004) Physical experimentation with prefetching helper threads on Intel’s hyper-threaded processors. In: Proceedings of the international symposium on code generation and optimization, pp 27–38
Malhotra V, Kozyrakis C (2006) Library-based prefetching for pointer-intensive applications. Technical Report. Available online. http://csl.stanford.edu/~christos/publications/2006.library_prefetch.manuscript.pdf
Roth A, Moshovos A, Sohi GS (1998) Dependence based prefetching for linked data structures. ACM SIGPLAN Notices 33(11):115–126
Zhou J, Cieslewicz J, Ross KA, Shah M (2005) Improving database performance on simultaneous multithreading processors. In: Proceedings of the 31st international conference on very large data bases. Trondheim, Norway, pp 49–60
Zilles C, Sohi G (2001) Execution-based prediction using speculative slices. ACM SIGARCH Comput Architect News 29(2):2–13
Hillis WD, Steele GL (1986) Data parallel algorithms. Commun ACM 29(12):1170–1183
Cintra M, Llanos D (2003) Toward efficient and robust software speculative parallelization on multiprocessors. In: Proceedings of the ninth ACM SIGPLAN symposium on principles and practice of parallel programming, pp 13–24
Song Y, Kalogeropulos S, Tirumalai P (2005) Design and implementation of a compiler framework for helper threading on multi-core processors. In: Proceedings of 14th international conference on parallel architectures and compilation techniques (PACT’05), pp 99–109
Kim D, Yeung D (2002) Design and evaluation of compiler algorithms for pre-execution. ACM SIGPLAN Notices 37(10):159
Kim D, Yeung D (2004) A study of source-level compiler algorithms for automatic construction of pre-execution code. ACM Trans Comput Syst 22(3):326–379
Ro WW, Gaudiot J-L (2004) SPEAR: a hybrid model for speculative pre-execution. In: Proceedings of the 18th international parallel and distributed processing symposium, pp 75–84
Dundas J, Mudge T (1997) Improving data cache performance by pre-executing instructions under a cache miss. In: Proceedings of the 11th international conference on supercomputing (ICS’97). New York, NY, USA, pp 68–75
Mutlu O, Stark J, Wilkerson C, Patt YN (2003) Runahead execution: an alternative to very large instruction windows for out-of-order processors. In: Proceedings of the 9th international symposium on high-performance computer architecture, pp 129–140
Dudás Á, Juhász S (2011) Using pre-execution and helper threads for speeding up data intensive applications. In: Proceedings of the world congress on engineering 2011 (WCE 2011). Lecture Notes in Engineering and Computer Science, London, UK, 6–8 July 2011, pp 1288–1293
Nelder JA, Mead R (1965) A simplex method for function minimization. The Comput J 7(4): 308–313
Juhász S, Dudás Á (2009) Adapting hash table design to real-life datasets. In: Proceedings of the IADIS european conference on informatics 2009 (Part of the IADIS multiconference of computer science and information systems 2009). Algarve, Portugal, pp 3–10
Acknowledgment
This project is supported by the New Hungary Development Plan (Project ID: TÁMOP-4.2.1/B-09/1/KMR-2010-0002).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Dudás, Á., Juhász, S. (2013). Reconfigurable Preexecution in Data Parallel Applications on Multicore Systems. In: Ao, SI., Gelman, L. (eds) Electrical Engineering and Intelligent Systems. Lecture Notes in Electrical Engineering, vol 130. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-2317-1_3
Download citation
DOI: https://doi.org/10.1007/978-1-4614-2317-1_3
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-2316-4
Online ISBN: 978-1-4614-2317-1
eBook Packages: EngineeringEngineering (R0)