Reconfigurable Preexecution in Data Parallel Applications on Multicore Systems

Dudás, Ákos; Juhász, Sándor

doi:10.1007/978-1-4614-2317-1_3

Ákos Dudás³ &
Sándor Juhász³

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 130))

1680 Accesses

Abstract

The performance of data-intensive applications is often limited not only by the computational power of current computers but also by the performance gap between the CPU and the main system memory. Data prefetch mechanisms mask this latency by moving data closer to the CPU automatically. These methods rely on predicting future memory addresses; however, they are not suited for applications with random memory access patterns. Preexecution is a prefetch method which executes a slice of the original algorithm in parallel with the main thread to calculate memory addresses and issue loads early. In this paper we propose a lightweight software preexecution strategy for data parallel applications that accelerates the main working thread with an adaptive preexecution helper thread acting as a perfect predictor and consuming cache misses. With automatic parameter tuning the helper thread adapts to the application and system it is executed on. This method was able to achieve an average speedup of 10–30% in a real-life data parallel application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bryant RE (2007) Data-intensive supercomputing: the case for DISC. Technical report CMU-CS-07-128. School of Computer Science, Carnegie Mellon University
Google Scholar
Perkins LS, Andrews P, Panda D, Morton D, Bonica R, Werstiuk N, Kreiser R (2006) Data intensive computing. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing SC’06, New York, NY, USA, p 69
Google Scholar
Luk C-K (2001) Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors. ACM SIGARCH Comput Architect News 29(2): 40–51
Article Google Scholar
Chappell RS, Stark J, Kim SP, Reinhardt SK, Patt YN (1999) Simultaneous subordinate microthreading (SSMT). Int Symp Comput Architect 27(2):186–195
Article Google Scholar
Dubois M (2004) Fighting the memory wall with assisted execution. In: Proceedings of the 1st conference on computing frontiers. Ischia, Italy, pp 168–180
Google Scholar
Kim D et al (2004) Physical experimentation with prefetching helper threads on Intel’s hyper-threaded processors. In: Proceedings of the international symposium on code generation and optimization, pp 27–38
Google Scholar
Malhotra V, Kozyrakis C (2006) Library-based prefetching for pointer-intensive applications. Technical Report. Available online. http://csl.stanford.edu/~christos/publications/2006.library_prefetch.manuscript.pdf
Roth A, Moshovos A, Sohi GS (1998) Dependence based prefetching for linked data structures. ACM SIGPLAN Notices 33(11):115–126
Article Google Scholar
Zhou J, Cieslewicz J, Ross KA, Shah M (2005) Improving database performance on simultaneous multithreading processors. In: Proceedings of the 31st international conference on very large data bases. Trondheim, Norway, pp 49–60
Google Scholar
Zilles C, Sohi G (2001) Execution-based prediction using speculative slices. ACM SIGARCH Comput Architect News 29(2):2–13
Article Google Scholar
Hillis WD, Steele GL (1986) Data parallel algorithms. Commun ACM 29(12):1170–1183
Article Google Scholar
Cintra M, Llanos D (2003) Toward efficient and robust software speculative parallelization on multiprocessors. In: Proceedings of the ninth ACM SIGPLAN symposium on principles and practice of parallel programming, pp 13–24
Google Scholar
Song Y, Kalogeropulos S, Tirumalai P (2005) Design and implementation of a compiler framework for helper threading on multi-core processors. In: Proceedings of 14th international conference on parallel architectures and compilation techniques (PACT’05), pp 99–109
Google Scholar
Kim D, Yeung D (2002) Design and evaluation of compiler algorithms for pre-execution. ACM SIGPLAN Notices 37(10):159
Article Google Scholar
Kim D, Yeung D (2004) A study of source-level compiler algorithms for automatic construction of pre-execution code. ACM Trans Comput Syst 22(3):326–379
Article Google Scholar
Ro WW, Gaudiot J-L (2004) SPEAR: a hybrid model for speculative pre-execution. In: Proceedings of the 18th international parallel and distributed processing symposium, pp 75–84
Google Scholar
Dundas J, Mudge T (1997) Improving data cache performance by pre-executing instructions under a cache miss. In: Proceedings of the 11th international conference on supercomputing (ICS’97). New York, NY, USA, pp 68–75
Google Scholar
Mutlu O, Stark J, Wilkerson C, Patt YN (2003) Runahead execution: an alternative to very large instruction windows for out-of-order processors. In: Proceedings of the 9th international symposium on high-performance computer architecture, pp 129–140
Google Scholar
Dudás Á, Juhász S (2011) Using pre-execution and helper threads for speeding up data intensive applications. In: Proceedings of the world congress on engineering 2011 (WCE 2011). Lecture Notes in Engineering and Computer Science, London, UK, 6–8 July 2011, pp 1288–1293
Google Scholar
Nelder JA, Mead R (1965) A simplex method for function minimization. The Comput J 7(4): 308–313
Article MATH Google Scholar
Juhász S, Dudás Á (2009) Adapting hash table design to real-life datasets. In: Proceedings of the IADIS european conference on informatics 2009 (Part of the IADIS multiconference of computer science and information systems 2009). Algarve, Portugal, pp 3–10
Google Scholar

Download references

Acknowledgment

This project is supported by the New Hungary Development Plan (Project ID: TÁMOP-4.2.1/B-09/1/KMR-2010-0002).

Author information

Authors and Affiliations

Department of Automation and Applied Informatics, Budapest University of Technology and Economics, Magyar Tudósok körútja 2, 1117, Budapest, Hungary
Ákos Dudás & Sándor Juhász

Authors

Ákos Dudás
View author publications
You can also search for this author in PubMed Google Scholar
Sándor Juhász
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ákos Dudás .

Editor information

Editors and Affiliations

International Association of Engineers, Unit 1, 1/F, 37-39 Hung To Road, Hong Kong, China
Sio-Iong Ao
School of Engineering, Applied Mathematics and Computing, Cranfield University, College Road, Cranfield, MK43 0AL, Bedfordshire, United Kingdom
Len Gelman

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dudás, Á., Juhász, S. (2013). Reconfigurable Preexecution in Data Parallel Applications on Multicore Systems. In: Ao, SI., Gelman, L. (eds) Electrical Engineering and Intelligent Systems. Lecture Notes in Electrical Engineering, vol 130. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-2317-1_3

Download citation

DOI: https://doi.org/10.1007/978-1-4614-2317-1_3
Published: 02 May 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-2316-4
Online ISBN: 978-1-4614-2317-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics