Computational Efficiency of Parallel Unstructured Finite Element Simulations

  • Malte Neumann
  • Ulrich Küttler
  • Sunil Reddy Tiyyagura
  • Wolfgang A. Wall
  • Ekkehard Ramm
Conference paper


In this paper we address various efficiency aspects of finite element (FE) simulations on vector computers. Especially for the numerical simulation of large scale Computational Fluid Dynamics (CFD) and Fluid-Structure Interaction (FSI) problems efficiency and robustness of the algorithms are two key requirements.

In the first part of this paper a straightforward concept is described to increase the performance of the integration of finite elements in arbitrary, unstructured meshes by allowing for vectorization. In addition the effect of different programming languages and different array management techniques on the performance will be investigated.

Besides the element calculation, the solution of the linear system of equations takes a considerable part of computation time. Using the jagged diagonal format (JAD) for the sparse matrix, the average vector length can be increased. Block oriented computation schemes lead to considerably less indirect addressing and at the same time packaging more instructions. Thus, the overall performance of the iterative solver can be improved.

The last part discusses the input and output facility of parallel scientific software. Next to efficiency the crucial requirements for the IO subsystem in a parallel setting are scalability, flexibility and long term reliability.


Computational Fluid Dynamics Vector Length Iterative Solver Matrix Vector Multiplication Innermost Loop 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Behr, M., Pressel, D.M., Sturek, W.B.: Comments on CFD Code Performance on Scalable Architectures. Computer Methods in Applied Mechanics and Engineering 190 (2000) 263–277zbMATHCrossRefGoogle Scholar
  2. 2.
    Oliker, L., Canning, A., Carter, J., Shalf, J., Skinner, D., Ethier, S., Biswas, R., Djomehri, J., van der Wijngaart, R.: Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations. In: Proceedings of the ACM/IEEE Supercomputing Conference 2003, Phoenix, Arizona, USA. (2003)Google Scholar
  3. 3.
    Veldhuizen, T.L.: Scientific Computing: C++ Versus Fortran: C++ has more than caught up. Dr. Dobb’s Journal of Software Tools 22 (1997) 34, 36–38, 91Google Scholar
  4. 4.
    Veldhuizen, T.L., Jernigan, M.E.: Will C++ be Faster than Fortran? In: Proceedings of the 1st International Scientific Computing in Object-Oriented Parallel Environments (ISCOPE’97). Lecture Notes in Computer Science, Springer-Verlag (1997)Google Scholar
  5. 5.
    Pohl, T., Deserno, F., Thürey, N., Rüde, U., Lammers, P., Wellein, G., Zeiser, T.: Performance Evaluation of Parallel Large-Scale Lattice Boltzmann Applications on Three Supercomputing Architectures. In: Proceedings of the ACM/IEEE Supercomputing Conference 2004, Pittsburgh, USA. (2004)Google Scholar
  6. 6.
    Ethier, C., Steinman, D.: Exact Fully 3d Navier Stokes Solution for Benchmarking. International Journal for Numerical Methods in Fluids 19 (1994) 369–375zbMATHCrossRefGoogle Scholar
  7. 7.
    Wall, W.A.: Fluid-Struktur-Interaktion mit stabilisierten Finiten Elementen. phdthesis, Institut für Baustatik, Universität Stuttgart (1999)Google Scholar
  8. 8.
    D’Azevedo, E.F., Fahey, M.R., Mills, R.T.: Vectorized Sparse Matrix Multiply for Compressed Row Storage Format. In: Proceedings of the 5th International Conference on Computational Science, Atlanta, USA. (2005)Google Scholar
  9. 9.
    Tuminaro, R.S., Shadid, J.N., Hutchinson, S.A.: Parallel Sparse Matrix Vector Multiply Software for Matrices with Data Locality. Concurrency: Practice and Experience 10–3 (1998) 229–247CrossRefGoogle Scholar
  10. 10.
    Nakajima, K.: Parallel Iterative Solvers of GeoFEM with Selective Blocking Preconditioning for Nonlinear Contact Problems on the Earth Simulator. GeoFEM 2003-005, RIST/Tokyo (2003)Google Scholar
  11. 11.
    National Center for Supercomputing Applications. University of Illinois: Hierarchical Data Format. (2005)Google Scholar
  12. 12.
    Unidata Community: Network Common Data Form. (2005)Google Scholar
  13. 13.
    Hunt, A., Thomas, D.: The Pragmatic Programmer: From Journeyman to Master. Addison-Wesley, Reading, MA (2000)Google Scholar
  14. 14.
    Aho, A.V., Sethi, R., Ullman, J.D.: Compilers. Addison-Wesley, Reading, MA (1986)Google Scholar
  15. 15.
    Kennedy, J., Behr, M., Kalro, V., Tezduyar, T.: Implementation of implicit finite element methods for incompressible flows on the CM-5. Computer Methods in Applied Mechanics and Engineering 119 (1994) 95–111zbMATHCrossRefGoogle Scholar
  16. 16.
    Guo, M., Pan, Y.: Improving Communication Scheduling for Array Redistribution. Journal of Parallel and Distributed Computing (5)65 (2005) 553–563zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Malte Neumann
    • 1
  • Ulrich Küttler
    • 2
  • Sunil Reddy Tiyyagura
    • 3
  • Wolfgang A. Wall
    • 2
  • Ekkehard Ramm
    • 1
  1. 1.Institute of Structural MechanicsUniversity of StuttgartStuttgartGermany
  2. 2.Computational MechanicsTechnical University of MunichGarchingGermany
  3. 3.High Performance Computing Center Stuttgart (HLRS)StuttgartGermany

Personalised recommendations