Implementation and Evaluation of Quadruple Precision BLAS Functions on GPUs

  • Daichi Mukunoki
  • Daisuke Takahashi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7133)


We implemented the quadruple precision Basic Linear Algebra Subprograms (BLAS) functions, AXPY, GEMV and GEMM, on graphics processing units (GPUs), and evaluated their performance. We used DD-type quadruple precision operations, which combine two double precision values to represent a quadruple precision value. On an NVIDIA Tesla C1060, our BLAS functions are up to approximately 30 times faster than the existing quadruple precision BLAS on an Intel Core i7 920. Additionally, the execution time of quadruple precision AXPY takes only approximately 2.7 times longer than that of double precision AXPY on the Tesla C1060. We have shown that quadruple precision BLAS operations are suitable for GPUs.


quadruple precision BLAS double-double precision GPU 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bailey, D.H.: QD (C++/Fortran-90 double–double and quad-double package),
  2. 2.
    Corporation, N.: CUBLAS Library (including CUDA Toolkit),
  3. 3.
  4. 4.
    Graça, G.D., Defour, D.: Implementation of float-float operators on graphics hardware. In: Proc. 7th Conference on Real Numbers and Computers, RNC7 (2006)Google Scholar
  5. 5.
  6. 6.
    Hasegawa, H.: Utilizing the quadruple-precision floating-point arithmetic operation for the Krylov Subspace Methods. In: Proc. SIAM Conference on Applied Linear Algebra, LA 2003 (2003)Google Scholar
  7. 7.
    Hida, Y., Li, X.S., Bailey, D.H.: Algorithms for Quad-Double Precision Floating Point Arithmetic. In: Proc. 15th Symposium on Computer Arithmetic (2001)Google Scholar
  8. 8.
    Li, X.S., Demmel, J.W., Bailey, D.H., Hida, Y., Iskandar, J., Kapur, A., Martin, M.C., Thompson, B., Tung, T., Yoo, D.J.: XBLAS – Extra Precise Basic Linear Algebra Subroutines,
  9. 9.
    Lu, M., He, B., Luo, Q.: Supporting Extended Precision on Graphics Processors. In: Proc. Sixth International Workshop on Data Management on New Hardware, DaMoN 2010 (2010)Google Scholar
  10. 10.
    Nakata, M.: The MPACK; Multiple precision arithmetic BLAS (MBLAS) and LAPACK (MLAPACK),
  11. 11.
    Shewchuk, J.R.: Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates. Discrete and Computational Geometry 18, 305–363 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Thall, A.: Extended-Precision Floating-Point Numbers for GPU Computation. In: ACM SIGGRAPH 2006 Research Posters (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Daichi Mukunoki
    • 1
  • Daisuke Takahashi
    • 1
  1. 1.Graduate School of Systems and Information EngineeringUniversity of TsukubaTsukubaJapan

Personalised recommendations