Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Simulation of bevel gear cutting with GPGPUs—performance and productivity

  • 198 Accesses

  • 8 Citations


The desire for general purpose computation on graphics processing units caused the advance of new programming paradigms, e.g. OpenCL C/C++, CUDA C or the PGI Accelerator Model. In this paper, we apply these programming approaches to the software KegelSpan for simulating bevel gear cutting. This engineering application simulates an important manufacturing process in the automotive industry. The results obtained are compared to an OpenMP implementation on various hardware configurations. The discussion covers performance results, but also productivity of code development realized in this effort.

This is a preview of subscription content, log in to check access.


  1. 1.

    BMW AG, Klingelnberg GmbH, ZF Friedrichshafen AG: Application and manufacturing

  2. 2.

    Bordawekar R, Bondhugula U, Rao R (2010) Can CPUs match GPUs on performance with productivity?: experiences with optimizing a FLOP-intensive application on CPUs and GPU. Tech rep, IBM Res Division

  3. 3.

    Brecher C, Klocke F, Schröder T, Rütjes U (2008) Analysis and simulation of different manufacturing processes for bevel gear cutting. J Adv Mech Des Syst Manuf 2(1):165–172

  4. 4.

    Brecher C, Gorgels C, Hardjosuwito A (2010) Simulation based tool wear analysis in bevel gear cutting. In: International conference on gears, VDI-Berichte, vol 2108.2. VDI Verlag, Düsseldorf, pp 1381–1384

  5. 5.

    Che S, Boyer M, Meng J, Tarjan D, Sheaffer J, Skadron K (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380

  6. 6.

    Gharaibeh A, Ripeanu M (2010) Size matters: space/time tradeoffs to improve GPGPU applications performance. In: Proceedings of the SC’10. IEEE Computer Society, Washington, pp 1–12

  7. 7.

    Griebel M, Zaspel P (2010) A multi-GPU accelerated solver for the three-dimensional two-phase incompressible Navier–Stokes equations. Comput Sci—R & D 25(1):65–73

  8. 8.

    Hacker H, Trinitis C, Weidendorfer J, Brehm M (2011) Considering GPGPU for HPC centers: is it worth the effort? In: Keller R, Kramer D, Weiss JP (eds) Facing the multicore-challenge. LNCS, vol 6310. Springer, Berlin, pp 118–130

  9. 9.

    Kapinos P, an Mey D (2009) Parallel simulation of bevel gear cutting processes with OpenMP tasks. In: Müller M, de Supinski B, Chapman B (eds) Evolving OpenMP in an age of extreme parallelism. LNCS, vol 5568. Springer, Berlin, pp 1–14

  10. 10.

    Karimi K, Dickson NG, Hamze F (2010) A performance comparison of CUDA and OpenCL. CoRR 1005.2581

  11. 11.

    Khronos OpenCL Working Group (2009) The OpenCL specification, version 1.0.48

  12. 12.

    Kirk DB, Hwu WW (2010) Programming massively parallel processors: a hands-on approach, 1st edn. Morgan Kaufmann, San Mateo

  13. 13.

    Klocke F, Gorgels C, Herzhoff S, Hardjosuwito A (2010) Simulation of bevel gear cutting. In: 3rd WZL gear conference. KAPP NILES, Boulder

  14. 14.

    Komatsu K, Sato K, Arai Y, Koyama K, Takizawa H, Kobayashi H (2010) Evaluating performance and portability of OpenCL programs. In: The fifth international workshop on automatic performance tuning

  15. 15.

    Loh E (2010) The ideal HPC programming language. Commun ACM 53:42–47

  16. 16.

    NVIDIA (2010) CUDA C programming guide, v3.2

  17. 17.

    NVIDIA (2010) OpenCL best practices guide

  18. 18.

    OpenMP Architecture Review Board (2008) OpenMP application program interface, version 3.0

  19. 19.

    Pennycook SJ, Hammond SD, Jarvis SA, Mudalige GR (2010) Performance analysis of a hybrid MPI/CUDA implementation of the NAS-LU benchmark. PMBS 10, in conjunction with SC’10, New Orleans, LA, USA

  20. 20.

    Sanders J, Kandrot E (2010) CUDA by example: an introduction to general-purpose GPU programming, 1st edn. Addison–Wesley, Reading

  21. 21.

    The Portland Group (2010) PGI Fortran & C accelerator programming model, version 1.2

  22. 22.

    Weber T (2009) Optimierung der Rechenzeit bei der Spanungsdickenberechnung für das Kegelradfräsen mittels Grafikkarten. Master’s thesis, Aachen University of Applied Sciences

Download references

Author information

Correspondence to Sandra Wienke.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Wienke, S., Plotnikov, D., an Mey, D. et al. Simulation of bevel gear cutting with GPGPUs—performance and productivity. Comput Sci Res Dev 26, 165–174 (2011). https://doi.org/10.1007/s00450-011-0158-0

Download citation


  • OpenCL
  • CUDA
  • PGI Accelerator
  • OpenMP
  • Productivity
  • Parallelization