A Code-Based Analytical Approach for Using Separate Device Coprocessors in Computing Systems

Hampel, Volker; Goronzy, Grigori; Maehle, Erik

doi:10.1007/978-3-642-19137-4_1

Volker Hampel¹⁹,
Grigori Goronzy¹⁹ &
Erik Maehle¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6566))

Included in the following conference series:

International Conference on Architecture of Computing Systems

852 Accesses
2 Citations

Abstract

Special hardware accelerators like FPGAs and GPUs are commonly introduced into a computing system as a separate device. Consequently, the accelerator and the host system do not share a common memory. Sourcing out the data to the additional hardware thus introduces a communication penalty. Based on a combination of a program’s source code and execution profiling we perform an analysis which evaluates the arithmetic intensity as a cost function to identify those parts most reasonable to source out to the accelerating hardware. The basic principles of this analysis are introduced and tested with a sample application. Its concrete results are discussed and evaluated based on the performance of a FPGA-based and a GPU-based implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Harris, M.: Mapping Computational Concepts to GPUs. In: Pharr, M. (ed.) GPU Gems 2, ch. 31, Addison-Wesley Longman, Amsterdam (2005)
Google Scholar
Palmer, J.: The Intel^® 8087 numeric data processor. In: ISCA 1980: Proceedings of the 7th annual symposium on Computer Architecture, La Baule, USA, pp. 174–181 (1980), http://doi.acm.org/10.1145/800053.801923
Tripp, J.L., Gokhale, M.B., Peterson, K.D.: Trident: From High-Level Language to Hardware Circuitry. Computer 40(3), 28–37 (2007), http://dx.doi.org/10.1109/MC.2007.107
Article Google Scholar
Han, T.D., Abdelrahman, T.S.: hiCUDA: High-Level GPGPU Programming. IEEE Transactions on Parallel and Distributed Systems (March 31, 2010), http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.62
Weber, R., Gothandaraman, A., Hinde, R.J., Peterson, G.D.: Comparing Hardware Accelerators in Scientific Applications: A Case Study. IEEE Transactions on Parallel and Distributed Systems (June 02, 2010), http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.125
Park, S.J., Ross, J., Shires, D., Richie, D., Henz, B., Nguyen, L.: Hybrid Core Acceleration of UWB SIRE Radar Signal Processing. IEEE Transactions on Parallel and Distributed Systems (May 27, 2010), http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.117
Park, I.K., Singhal, N., Lee, M.H., Cho, S., Kim, C.: Design and Performance Evaluation of Image Processing Algorithms on GPUs. IEEE Transactions on Parallel and Distributed Systems (May 27, 2010), http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.115
Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S.-Z., Stratton, J.A., Hwu, W.W.: Program optimization space pruning for a multithreaded gpu. In: CGO 2008: Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization, Boston, MA, USA, pp. 195–204 (2008), http://doi.acm.org/10.1145/1356058.1356084
Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Salt Lake City, UT, USA, pp. 73–83 (2008), http://doi.acm.org/10.1145/1345206.1345220
Suffern, K.G.: Ray Tracing from the Ground up, A K Peters Ltd (2007)
Google Scholar
Sobe, P., Hampel, V.: FPGA-Accelerated Deletion-Tolerant Coding for Reliable Distributed Storage. In: Lukowicz, P., Thiele, L., Tröster, G. (eds.) ARCS 2007. LNCS, vol. 4415, pp. 14–27. Springer, Heidelberg (2007), http://dx.doi.org/10.1007/978-3-540-71270-1_2
Chapter Google Scholar
Cray Inc.: Cray XD1 FPGA Development. Release 1.4 (2006)
Google Scholar
Valgrind Developers: Valgrind User Manual. Release 3.5.0 (August 19, 2009)
Google Scholar
Munshi, A. (ed.): The OpenCL-Specification. Version 1.1 (June 11, 2010)
Google Scholar
Nvidia Corp.: NVIDIA CUDA C Programming Guide. Version 3.2 (September 8, 2010)
Google Scholar
Nvidia Corp.: NVIDIA OpenCL Best Practices Guide. Version 2.3 (August 31, 2009)
Google Scholar
Brewer, T.M.: Hybrid-core Computing: Punching through the power/performance wall. Scientific Computing, November/December (2009), http://www.conveycomputer.com/Resources/ScientificComputing62629.pdf

Download references

Author information

Authors and Affiliations

Institute of Computer Engineering, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany
Volker Hampel, Grigori Goronzy & Erik Maehle

Authors

Volker Hampel
View author publications
You can also search for this author in PubMed Google Scholar
Grigori Goronzy
View author publications
You can also search for this author in PubMed Google Scholar
Erik Maehle
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Datentechnik und Kommunikationsnetze, Hans-Sommer-Straße 66, 38106, Braunschweig, Germany
Mladen Berekovic
Dipartimento di elettronica e informazione, Via Ponzio 34/5, 20133, Milano, Italy
William Fornaciari & Cristina Silvano &
Johann Wolfgang Goethe-Universität Frankfurt, Robert-Mayer-Straße 11-15, 60325, Frankfurt am Main, Germany
Uwe Brinkschulte

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hampel, V., Goronzy, G., Maehle, E. (2011). A Code-Based Analytical Approach for Using Separate Device Coprocessors in Computing Systems. In: Berekovic, M., Fornaciari, W., Brinkschulte, U., Silvano, C. (eds) Architecture of Computing Systems - ARCS 2011. ARCS 2011. Lecture Notes in Computer Science, vol 6566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19137-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-19137-4_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19136-7
Online ISBN: 978-3-642-19137-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics