Skip to main content

GPUMixer: Performance-Driven Floating-Point Tuning for GPU Scientific Applications

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11501))

Abstract

We present GPUMixer, a tool to perform mixed-precision floating-point tuning on scientific GPU applications. While precision tuning techniques are available, they are designed for serial programs and are accuracy-driven, i.e., they consider configurations that satisfy accuracy constraints, but these configurations may degrade performance. GPUMixer, in contrast, presents a performance-driven approach for tuning. We introduce a novel static analysis that finds Fast Imprecise Sets (FISets), sets of operations on low precision that minimize type conversions, which often yield performance speedups. To estimate the relative error introduced by GPU mixed-precision, we propose shadow computations analysis for GPUs, the first of this class for multi-threaded applications. GPUMixer obtains performance improvements of up to \(46.4\%\) of the ideal speedup in comparison to only \(20.7\%\) found by state-of-the-art methods.

This work was performed when P. C. Wood and R. Singh wereat Purdue University.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. CoMD-CUDA (2017). https://github.com/NVIDIA/CoMD-CUDA

  2. Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization (IISWC 2009), pp. 44–54. IEEE (2009)

    Google Scholar 

  3. Chiang, W.F., Baranowski, M., Briggs, I., Solovyev, A., Gopalakrishnan, G., Rakamarić, Z.: Rigorous floating-point mixed-precision tuning. In: 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017. Association for Computing Machinery (2017)

    Google Scholar 

  4. Chiang, W.-F., Gopalakrishnan, G., Rakamaric, Z., Solovyev, A.: Efficient search for inputs causing high floating-point errors. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2014, pp. 43–52. ACM, New York (2014)

    Google Scholar 

  5. Damouche, N., Martel, M., Chapoutot, A.: Intra-procedural optimization of the numerical accuracy of programs. In: Núñez, M., Güdemann, M. (eds.) FMICS 2015. LNCS, vol. 9128, pp. 31–46. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19458-5_3

    Chapter  MATH  Google Scholar 

  6. Darulova, E., Kuncak, V.: Towards a compiler for reals. ACM Trans. Program. Lang. Syst. (TOPLAS) 39(2), 8 (2017)

    Article  Google Scholar 

  7. Guo, H., Rubio-González, C.: Exploiting community structure for floating-point precision tuning. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 333–343. ACM (2018)

    Google Scholar 

  8. Harris, M.: Mini-nbody: a simple N-body code (2014). https://github.com/harrism/mini-nbody

  9. Iskhodzhanov, T., Potapenko, A., Samsonov, A., Serebryany, K., Stepanov, E., Vyukov, D.: ThreadSanitizer, MemorySanitizer, 8 November 2012. https://urldefense.proofpoint.com/v2/url?u=http-3A__www.llvm.org_devmtg_2012-2D11_Serebryany-5FTSan-2DMSan.pdf&d=DwIF-g&c=vh6FgFnduejNhPPD0fl_yRaSfZy8CWbWnIf4XJhSqx8&r=UyK1_569d50MjVlUSODJYRW2epEY0RveVNq0YCmePcDz4DQHW-CkWcttrwneZ0md&m=QbB1B0a55LgDuuwoFrE3U3GhMpMGOKghlpBLKQdmd1A&s=XadD1efiG2KOXnZcaadrIMuS10vDECEVJu__wnFtYQU&e=

  10. Karlin, I., Keasler, J., Neely, R.: Lulesh 2.0 updates and changes. Technical report LLNL-TR-641973, August 2013

    Google Scholar 

  11. Lam, M.O., Hollingsworth, J.K.: Fine-grained floating-point precision analysis. Int. J. High Perform. Comput. Appl. 32, 231 (2016). 1094342016652462

    Article  Google Scholar 

  12. Lam, M.O., Hollingsworth, J.K., de Supinski, B.R., LeGendre, M.P.: Automatically adapting programs for mixed-precision floating-point computation. In: Proceedings of the 27th International ACM Conference on Supercomputing, pp. 369–378. ACM (2013)

    Google Scholar 

  13. Lam, M.O., Rountree, B.L.: Floating-point shadow value analysis. In: Proceedings of the 5th Workshop on Extreme-Scale Programming Tools, pp. 18–25. IEEE Press (2016)

    Google Scholar 

  14. Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, p. 75. IEEE Computer Society (2004)

    Google Scholar 

  15. Luk, C.-K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Not. 40, 190–200 (2005)

    Article  Google Scholar 

  16. Menon, H., et al.: ADAPT: algorithmic differentiation applied to floating-point precision tuning. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, p. 48. IEEE Press (2018)

    Google Scholar 

  17. NDIDIA. CUDA ToolKit Documentation - NVVM IR Specification 1.5 (2018). https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html

  18. Nguyen, H.: GPU Gems 3, pp. 677–694. Addison-Wesley Professional, Reading (2007). chapter 31

    Google Scholar 

  19. Nvidia. Nvidia Tesla P100 GPU. Pascal Architecture White Paper (2016)

    Google Scholar 

  20. Nvidia. CUDA C Programming Guide, v9.0 (2018). http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

  21. Paganelli, G., Ahrendt, W.: Verifying (in-) stability in floating-point programs by increasing precision, using SMT solving. In: 2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 209–216. IEEE (2013)

    Google Scholar 

  22. Rubio-González, C., et al.: Floating-point precision tuning using blame analysis. In: Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, pp. 1074–1085. ACM, New York (2016)

    Google Scholar 

  23. Rubio-González, C., et al.: Precimonious: tuning assistant for floating-point precision. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 27. ACM (2013)

    Google Scholar 

Download references

Acknowledgments

We thank the anonymous reviewers for their suggestions and comments on the paper. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DEAC52-07NA27344 (LLNL-CONF-748618).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ignacio Laguna .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 This is a U.S. government work and not under copyright protection in the United States; foreign copyright protection may apply

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Laguna, I., Wood, P.C., Singh, R., Bagchi, S. (2019). GPUMixer: Performance-Driven Floating-Point Tuning for GPU Scientific Applications. In: Weiland, M., Juckeland, G., Trinitis, C., Sadayappan, P. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11501. Springer, Cham. https://doi.org/10.1007/978-3-030-20656-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20656-7_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20655-0

  • Online ISBN: 978-3-030-20656-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics