OpenMP\(^*\) SIMD Vectorization and Threading of the Elmer Finite Element Software

  • Mikko Byckling
  • Juhani Kataja
  • Michael KlemmEmail author
  • Thomas Zwinger
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10468)


We describe the design and implementation of hierarchical high-order basis functions with OpenMP* SIMD constructs in the Elmer Finite Element software. We give rationale of our design decisions and present some of the key challenges encountered during the implementation. Our numerical results on a platform supporting Intel® AVX2 show that the new basis function implementation is 3x to 4x faster when compared to the same code without OpenMP SIMD in use, or 5x to 10x faster when compared to the original Elmer implementation. In addition, our numerical results show similar speedups for the entire finite element assembly process.


Finite elements Basis functions Implementation OpenMP SIMD 



Thomas Zwinger was supported by the Nordic Centre of Excellence, eSTICC.

Intel and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other brands and names are the property of their respective owners.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to

Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.


  1. 1.
    Baiocchi, C., Franca, L.P., Franca, L.P.: Virtual bubbles and the Galerkin least squares method. Comput. Methods Appl. Mech. Eng. 105, 125–141 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Bangerth, W., Hartmann, R., Kanschat, G.: deal.II - a general purpose object oriented finite element library. ACM Trans. Math. Softw. (TOMS) 33(4), 24 (2007)Google Scholar
  3. 3.
    Braess, D.: Finite Elements, 2nd edn. Cambridge University Press, Cambridge (2001)Google Scholar
  4. 4.
    Çatalyürek, Ü.V., Feo, J., Gebremedhin, A.H., Halappanavar, M., Pothen, A.: Graph coloring algorithms for multi-core and massively multithreaded architectures. Parallel Comput. 38(10), 576–594 (2012)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Ciarlet, P.G.: The Finite Element Method for Elliptic Problems. North-Holland, Amsterdam (1978)Google Scholar
  6. 6.
    Demkowicz, L.: Computing with hp-Adaptive Finite Elements: Volume 1 One and Two Dimensional Elliptic and Maxwell Problems. CRC Press, Boca Raton (2006)Google Scholar
  7. 7.
    Demkowicz, L., Kurtz, J., Pardo, D., Paszyński, M., Rachowicz, W., Zdunek, A.: Computing with hp-Adaptive Finite Element Method: Volume II Frontiers: Three Dimensional Elliptic and Maxwell Problems. Chapmann & Hall/CRC, Boca Raton (2007). Applied Mathematics & Nonlinear ScienceGoogle Scholar
  8. 8.
    Franca, L.P., Frey, S.L.: Stabilized finite element methods: II, the incompressible Navier-Stokes equations. Comput. Methods Appl. Mech. Eng. 99, 209–233 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Gagliardini, O., Zwinger, T., Gillet-Chaulet, F., Durand, G., Favier, L., de Fleurian, B., Greve, R., Malinen, M., Martín, C., Råback, P., Ruokolainen, J., Sacchettini, M., Schäfer, M., Seddik, J.T.H.: Capabilities and performance of Elmer/Ice, a new-generation ice sheet model. Geosci. Model Dev. 6, 2135–2152 (2013)Google Scholar
  10. 10.
    Logg, A., Mardal, K.A., Wells, G.: Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book, vol. 84. Springer, Berlin (2012)Google Scholar
  11. 11.
    Lyly, M., Ruokolainen, J., Järvinen, E.: ELMER-a finite element solver for multiphysics. CSC-report on scientific computing 2000, pp. 156–159 (1999)Google Scholar
  12. 12.
    Råback, P., Malinen, M., Ruokolainen, J., Pursula, A., Zwinger, T.: Elmer Models Manual, March 2016Google Scholar
  13. 13.
    Schöberl, J.: C++11 implementation of finite elements in NGsolve. Technical report 30, TU Wien (2014)Google Scholar
  14. 14.
    Schöberl, J., et al.: NGsolve finite element library.
  15. 15.
    Snir, M., Otto, S.W., Huss-Lederman, S., Walker, D.W., Dongarra, J.: MPI - The Complete Reference, vol. 1, 2nd edn. MIT Press, Cambridge (1998)Google Scholar
  16. 16.
    Solin, P., Segeth, K., Dolezel, I.: Higher-Order Finite Element Methods. Chapman & Hall/CRC Press, London (2003)Google Scholar
  17. 17.
    Szabo, B.A., Babuska, I.: Finite Element Analysis. Wiley, Chichester (1991)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Mikko Byckling
    • 1
  • Juhani Kataja
    • 2
  • Michael Klemm
    • 3
    Email author
  • Thomas Zwinger
    • 2
  1. 1.Intel FinlandTampereFinland
  2. 2.CSC - IT Center for ScienceEspooFinland
  3. 3.Intel Deutschland GmbHFeldkirchenFinland

Personalised recommendations