Abstract
FPGAs and GPUs are increasingly used in a range of high performance computing applications. When implementing numerical algorithms on either platform, we can choose to represent operands with different levels of accuracy. A trade-off exists between the numerical accuracy of arithmetic operators and the resources needed to implement them. Where algorithmic requirements for numerical stability are captured in a design description, this trade-off can be exploited to optimize performance by using high-accuracy operators only where they are most required. Support for half and double-double floating point representations allows additional flexibility to achieve this. The aim of this work is to study the language and hardware support, and the achievable peak performance for non-standard precisions on a GPU and an FPGA. A compute intensive program, matrix-matrix multiply, is selected as a benchmark and implemented for various different matrix sizes. The results show that for large-enough matrices, GPUs out-perform FPGA-based implementations but for some smaller matrix sizes, specialized FPGA floating-point operators for half and double-double precision can deliver higher throughput than implementation on a GPU.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dennard, R., Gaensslen, F., Rideout, V., Bassous, E., LeBlanc, A.: Design of Ion-Implanted MOSFET’s with Very Small Physical Dimensions. IEEE Journal of Solid-State Circuits 9(5), 256–268 (1974)
NVIDIA Corporation, Santa Clara, U.: Tesla C1060 Computing Processor Board (January 2010)
Xilinx Corporation: Virtex-6 Family Overview. Technical Report DS150 (January 2012)
Xilinx Corporation: LogiCORE Floating-Point Operator v5.0. (2011)
De Dinechin, F., Pasca, B.: Designing Custom Arithmetic Data Paths with FloPoCo. IEEE Design & Test of Computers 28(4), 18–27 (2011)
Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune Dense Linear Algebra. In: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, p. 31. IEEE Press (2008)
NVIDIA Corporation: CUBLAS library v5.5. Technical report (2013)
NVIDIA Corporation: CUDA library documentation 4.1, http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online
Thall, A.: Extended-Precision Floating-Point Numbers for GPU Computation. In: ACM SIGGRAPH 2006 Research posters, p. 52. ACM (2006)
Lu, M., He, B., Luo, Q.: Supporting Extended Precision on Graphics Processors. In: Proceedings of the Sixth International Workshop on Data Management on New Hardware, pp. 19–26. ACM (2010)
Minhas, U.: GPU vs FPGA: A Comparative Performance Analysis for Non-Standard Precision. Master’s thesis, Imperial College London (2013)
Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated Empirical Optimizations of Software and the ATLAS project. Parallel Computing 27(12), 3–35 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Minhas, U.I., Bayliss, S., Constantinides, G.A. (2014). GPU vs FPGA: A Comparative Analysis for Non-standard Precision. In: Goehringer, D., Santambrogio, M.D., Cardoso, J.M.P., Bertels, K. (eds) Reconfigurable Computing: Architectures, Tools, and Applications. ARC 2014. Lecture Notes in Computer Science, vol 8405. Springer, Cham. https://doi.org/10.1007/978-3-319-05960-0_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-05960-0_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05959-4
Online ISBN: 978-3-319-05960-0
eBook Packages: Computer ScienceComputer Science (R0)