Computational Mechanics

, Volume 63, Issue 5, pp 805–819 | Cite as

The spectral cell method for wave propagation in heterogeneous materials simulated on multiple GPUs and CPUs

  • Farshid MossaibyEmail author
  • Meysam Joulaian
  • Alexander Düster
Original Paper


Efficient simulation of wave propagation in heterogeneous materials is still a challenging task. The spectral cell method, representing a combination of spectral elements with the fictitious domain concept, has proven to be an efficient approach for wave propagation analysis in materials with complicated microstructure. In this paper, we report details of parallel implementation of the spectral cell method using multi-core CPUs as well as GPUs. In our CPU implementation, we employ the OpenMP directives to parallelize the loops. On GPUs, however, we use the OpenCL framework to develop single- and multi-GPU versions of the code. In all of our implementations, the core operation is a sparse matrix-vector multiplication (SpMV) kernel. We analyze each implementation to determine its features and bottlenecks. The results show that speedups of up to 128 relative to serial CPU code can be achieved using multi-GPU code.


Spectral cell method Parallel implementation SpMV kernel Multi-GPU OpenCL OpenMP 



The authors would like to acknowledge Prof. Dr.-Ing. Thomas Rung and Dr.-Ing. Christian Janßen from Hamburg University of Technology (TUHH) for kindly providing access to HPC facilities. The first author would like to thank the Deutscher Akademischer Austauschdienst (DAAD) for partially supporting this work during his visit at TUHH in 2016. Also, the first author would like to thank Dr.-Ing. Karl Rupp for helpful discussions on the matter.


  1. 1.
    Abdelfattah A, Ltaief H, Keyes D (2015) High performance multi-GPU SpMV for multi-component PDE-based applications. Springer, Berlin, pp 601–612. Google Scholar
  2. 2.
    Abdelfattah A, Ltaief H, Keyes D, Dongarra J (2016) Performance optimization of sparse matrix-vector multiplication for multi-component PDE-based applications using GPUs. Concurr Comput 28(12):3447–3465. CrossRefGoogle Scholar
  3. 3.
    Agosta G, Barenghi A, Di Federico A, Pelosi G (2015) OpenCL performance portability for general-purpose computation on graphics processor units: an exploration on cryptographic primitives. Concurr Comput 27(14):3633–3660. CrossRefGoogle Scholar
  4. 4.
    Ashari A, Sedaghati N, Eisenlohr J, Parthasarath S, Sadayappan P (2014) Fast sparse matrix-vector multiplication on GPUs for graph applications. In: SC14: international conference for high performance computing, networking, storage and analysis, pp 781–792.
  5. 5.
    de la Asunción M, Castro M, Mantas J, Ortega S (2016) Numerical simulation of tsunamis generated by landslides on multiple gpus. Adv Eng Softw 99(Supplement C):59–72.
  6. 6.
    Bathe KJ (1996) Finite element procedures. Prentice Hall, Upper Saddle RiverzbMATHGoogle Scholar
  7. 7.
    Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis, SC ’09, pp 18:1–18:11. ACM, New York, NY, USA.
  8. 8.
    Choi JW, Singh A, Vuduc RW (2010) Model-driven autotuning of sparse matrix-vector multiply on GPUs. In: Proceedings of the 15th ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP ’10, pp 115–126. ACM, New York, NY, USA.
  9. 9.
    Cohen G (2002) Higher-order numerical methods for transient wave equations. Springer, BerlinCrossRefzbMATHGoogle Scholar
  10. 10.
    Du P, Weber R, Luszczek P, Tomov S, Peterson G, Dongarra J (2012) From CUDA to OpenCL: towards a performance-portable solution for multi-platform GPU programming. Parallel Comput 38(8):391–407. Application accelerators in HPC
  11. 11.
    Duczek S, Joulaian M, Düster A, Gabbert U (2014) Numerical analysis of Lamb waves using the finite and spectral cell method. Int J Numer Methods Eng 99:26–53. MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Düster A, Parvizian J, Yang Z, Rank E (2008) The finite cell method for three-dimensional problems of solid mechanics. Comput Methods Appl Mech Eng 197:3768–3782MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Düster A, Rank E, Szabó B (2017) The p-version of the finite element and finite cell methods. In: Stein E, de Borst R, Hughes TJR (eds) Encyclopedia of computational mechanics, 2nd edn. Wiley, Hoboken, pp 137–171. vol. Part 1. Solids and Structures, chap. 4Google Scholar
  14. 14.
    Falch TL, Elster AC (2017) Machine learning-based auto-tuning for enhanced performance portability of OpenCL applications. Concurr Comput 29(8):e4029. CrossRefGoogle Scholar
  15. 15.
    Filippone S, Cardellini V, Barbieri D, Fanfarillo A (2017) Sparse matrix-vector multiplication on GPGPUs. ACM Trans Math Softw 43(4):30:1–30:49. MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Fries TP, Omerović S (2016) Higher-order accurate integration of implicit geometries. Int J Numer Methods Eng 106(5):323–371MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Gao J, Wang Y, Wang J (2017) A novel multi-graphics processing unit parallel optimization framework for the sparse matrix-vector multiplication. Concurr Comput 29(5):e3936. CrossRefGoogle Scholar
  18. 18.
    Godwin J, Holewinski J, Sadayappan P (2012) High-performance sparse matrix-vector multiplication on GPUs for structured grid computations. In: Proceedings of the 5th annual workshop on general purpose processing with graphics processing units, GPGPU-5, pp 47–56. ACM, New York, NY, USA.
  19. 19.
    Gopalakrishnan S, Chakraborty A, Roy Mahapatra D (2008) Spectral finite element method—wave propagation, diagnostics and control in anisotropic and inhomogeneous structuresa. Springer, London (Computational Fluid and Solid Mechanics)zbMATHGoogle Scholar
  20. 20.
    Gopalakrishnan S, Ruzzene M, Hanagud S (2011) Computational techniques for structural health monitoring. Springer, LondonCrossRefGoogle Scholar
  21. 21.
    He G, Wang H, Li E, Huang G, Li G (2015) A multiple-gpu based parallel independent coefficient reanalysis method and applications for vehicle design. Adv Eng Softw 85(Supplement C):108–124.
  22. 22.
    Hinton E, Rock T, Zienkiewicz OC (1976) A note on mass lumping and related processes in the finite element method. Earthq Eng Struct Dyn 4:245–249CrossRefGoogle Scholar
  23. 23.
    Hubrich S, Di Stolfo P, Kudela L, Kollmannsberger S, Rank E, Schröder A, Düster A (2017) Numerical integration of discontinuous functions: moment fitting and smart octree. Comput Mech 60:863–881. MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Joulaian M (2017) The hierarchical finite cell method for problems in structural mechanics. Ph.D. thesis, Hamburg University of TechnologyGoogle Scholar
  25. 25.
    Joulaian M, Duczek S, Gabbert U, Düster A (2014) Finite and spectral cell method for wave propagation in heterogeneous materials. Comput Mech 54:661–675. MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Jung JH, Bae DS (2017) An improved direct linear equation solver using multi-gpu in multi-body dynamics. Adv Eng Softw. Google Scholar
  27. 27.
    Karwacki M, Bylina B, Bylina J (2012) Multi-GPU implementation of the uniformization method for solving markov models. In: 2012 Federated conference on computer science and information systems (FedCSIS), pp 533–537Google Scholar
  28. 28.
    Komatitsch D, Vilotte JP, Vai R, Castillo-Covarrubias J, Sanchez-Sesma F (1999) The spectral element method for elastic wave equations—application to 2-D and 3-D seismic problems. Int J Numer Methods Eng 45:1139–1164CrossRefzbMATHGoogle Scholar
  29. 29.
    Kreutzer M, Hager G, Wellein G, Fehske H, Basermann A, Bishop A.R (2012) Sparse matrix-vector multiplication on GPGPU clusters: a new storage format and a scalable implementation. In: 2012 IEEE 26th international parallel and distributed processing symposium workshops Ph.D. Forum, pp 1696–1702.
  30. 30.
    Laugier P, Haïat G (2010) Bone quantitative ultrasound. Springer, DordrechtGoogle Scholar
  31. 31.
    McCalpin JD (1995) Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter pp 19–25Google Scholar
  32. 32.
    Monakov A, Lokhmotov A, Avetisyan A (2010) Automatically tuning sparse matrix-vector multiplication for GPU architectures. Springer, Berlin, pp 111–125. Google Scholar
  33. 33.
    Mossaiby F, Rossi R, Dadvand P, Idelsohn S (2012) OpenCL-based implementation of an unstructured edge-based finite element convection-diffusion solver on graphics hardware. Int J Numer Methods Eng 89(13):1635–1651. CrossRefzbMATHGoogle Scholar
  34. 34.
    Mossaiby F, Shojaei A, Zaccariotto M, Galvanetto U (2017) OpenCL implementation of a high performance 3D Peridynamic model on graphics accelerators. Comput Math Appl. MathSciNetzbMATHGoogle Scholar
  35. 35.
    Ostachowicz W, Kudela P, Krawczuk M, Zak A (2012) Guided waves in structures for SHM. Wiley, ChichesterCrossRefzbMATHGoogle Scholar
  36. 36.
    Parvizian J, Düster A, Rank E (2007) Finite cell method - h- and p-extension for embedded domain problems in solid mechanics. Comput Mech 41:121–133MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Patera AT (1984) A spectral element method for fluid dynamics: Laminar flow in a channel expansion. J Comput Phys 54:468–488CrossRefzbMATHGoogle Scholar
  38. 38.
    Pennycook S, Hammond S, Wright S, Herdman J, Miller I, Jarvis S (2013) An investigation of the performance portability of opencl. J Parallel Distrib Comput 73(11):1439–1450. Novel architectures for high-performance computing
  39. 39.
    Richter C, Schöps S, Clemens M (2016) Multi-GPU acceleration of algebraic multigrid preconditioners. Springer International Publishing, Cham, pp 83–90. zbMATHGoogle Scholar
  40. 40.
    Rossi R, Mossaiby F, Idelsohn SR (2013) A portable OpenCL-based unstructured edge-based finite element Navier-Stokes solver on graphics hardware. Comput Fluids 81:134–144. MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    Rul S, Vandierendonck H, D’Haene J, De Bosschere K (2010) An experimental study on performance portability of OpenCL kernels. In: 2010 Symposium on application accelerators in high performance computing (SAAHPC ’10). biblio.ugent.beGoogle Scholar
  42. 42.
    Staszewski WJ (2003) Health monitoring for aerospace structures. Wiley, ChichesterCrossRefGoogle Scholar
  43. 43.
    Vázquez F, Fernández JJ, Garzón EM (2011) A new approach for sparse matrix vector product on NVIDIA GPUs. Concurr Comput 23(8):815–826. CrossRefGoogle Scholar
  44. 44.
    Willberg C, Duczek S, Vivar Perez JM, Schmicker D, Gabbert U (2012) Comparison of different higher order finite element schemes for the simulation of Lamb waves. Comput Methods Appl Mech Eng 241–244:246–261CrossRefzbMATHGoogle Scholar
  45. 45.
    Yang X, Parthasarathy S, Sadayappan P (2011) Fast sparse matrix-vector multiplication on GPUs: implications for graph mining. Proc VLDB Endow 4(4):231–242. CrossRefGoogle Scholar
  46. 46.
    Zhang Y, Sinclair M, Chien AA (2013) Improving performance portability in OpenCL programs. Springer, Berlin, pp 136–150. Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Civil EngineeringUniversity of IsfahanIsfahanIran
  2. 2.Numerical Structural Analysis with Application in Ship Technology (M-10)Hamburg University of TechnologyHamburgGermany

Personalised recommendations