# Parallel Shared-Memory Isogeometric Residual Minimization (iGRM) for Three-Dimensional Advection-Diffusion Problems

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12143)

## Abstract

In this paper, we present a residual minimization method for three-dimensional isogeometric analysis simulations of advection-diffusion equations. First, we apply the implicit time integration scheme for the three-dimensional advection-diffusion equation. Namely, we utilize the Douglas-Gunn time integration scheme. Second, in every time step, we apply the residual minimization method for stabilization of the numerical solution. Third, we use isogeometric analysis with B-spline basis functions for the numerical discretization. We perform alternating directions splitting of the resulting system of linear equations, so the computational cost of the sequential LU factorization is linear $$\mathcal{O}(N)$$. We test our method on the three-dimensional simulation of the advection-diffusion problem. We parallelize the solver for shared-memory machine using the GALOIS framework.

## Keywords

Isogeometric analysis Implicit dynamics Advection-diffusion problems Linear computational cost Direct solvers GALOIS framework

## 1 Introduction

The alternating direction implicit method (ADI) is a popular method for performing finite difference simulations on regular grids. The first papers concerning the ADI method were published in 1960 [1, 3, 5, 19]. This method is still popular for fast solutions of different classes of problems with finite difference method [8, 9]. In its basic version, the method introduces intermediate time steps, and the differential operator splits into the x, y (and z in 3D) components. As a result of this operation, on the left-hand side, we only deal with derivatives in one direction, while the rest of the operator is on the right-hand side. The resulting system of linear equations has a multi-diagonal form, so the factorization of this system is possible with a linear $$\mathcal{O}(N)$$ computational cost. It is a common misunderstanding that the direction splitting solvers are limited to simple geometries. They can be also applied to discretizations in extremely complicated geometries, as described in .

In this paper, we generalize this method for three-dimensional simulations of the time-dependent advection-diffusion problem with the residual minimization method. We use the basic version of the direction splitting algorithm, working on a regular computational cube, since this approach is straightforward and it is enough to proof our claims that the residual minimization stabilizes the advection-diffusion simulations. In particular, we apply the residual minimization method with isogeometric finite element method simulations over a three-dimensional cube shape computational grids with tensor product B-spline basis functions. The resulting system of linear equations can be factorized in a linear $$\mathcal{O}(N)$$ computational cost when executed in sequential mode.

We use the finite element method discretizations with B-spline basis functions. This setup, as opposed to the traditional finite difference discretization, allows us to apply the residual minimization method to stabilize our simulations.

The isogeometric analysis (IGA)  is a modern method for performing finite element method (FEM) simulations with B-splines and NURBS. In enables higher order and continuity B-spline based approximations of the modeled phenomena. The direction splitting method has been rediscovered to solve the isogeometric $$L^2$$ projection problem over regular grids with tensor product B-spline basis functions [6, 7]. The direction splitting, in this case, is performed with respect to space, and the splitting is possible by exploiting the Kronecker product structure of the Gram matrix with tensor product structure of the B-spline basis functions. The $$L^2$$ projections with IGA-FEM were applied for performing fast and smooth simulations of explicit dynamics [11, 12, 13, 14, 15, 16, 20]. This is because the explicit dynamics with isogeometric discretization is equivalent to the solution of a sequence of isogeometric $$L^2$$ projections.

In this paper, we focus on the advection-diffusion equation used for simulation of the propagation of a pollutant from a chimney. We introduce implicit time integration scheme, that allows for the alternating direction splitting of the advection-diffusion equation. We discover that the numerical simulations are unstable, and deliver some unexpected oscillations and reflections. Next, we utilize the residual minimization method in a way that it preserves the Kronecker product structure of the matrix and enables stabilized linear computational cost solutions.

The actual mathematical theory concerning the stability of the numerical method for weak formulations is based on the famous “Babuśka-Brezzi condition” (BBC) developed in years 1971–1974 at the same time by Ivo Babuśka, and Franco Brezzi [25, 26, 27]. The condition states that a weak problem is stable when
\begin{aligned} \sup _{v \in V} {{|b(u,v)|}\over {\Vert v\Vert _V}} \ge \gamma \Vert u\Vert _U, \forall u \in U. \end{aligned}
(1)
However, the inf-sup condition in the above form concerns the abstract formulation where we consider all the test functions from $$v \in V$$ and look for solution at $$u \in U$$ (e.g. $$U=V$$). The above condition is satisfied also if we restrict to the space of trial functions $$u_h \in U_h\subset U$$
\begin{aligned} \sup _{v \in V} {{|b(u_h , v)|} \over {\Vert v\Vert _V}} \ge \gamma \Vert u_h\Vert _{U_h}. \end{aligned}
(2)
However, if we use test functions from the finite dimensional test space $$V_h = {\text {span}} \{ v_h \} \subset V$$
\begin{aligned} \sup _{v_h \in V_h} {{|b(u_h,v_h)|} \over {\Vert v_h\Vert _{V_h}}} \ge \gamma _h \Vert u_h \Vert _{U_h}, \end{aligned}
(3)
we do not have a guarantee that the supremum (3) will be equal to the original supremum (1), since we have restricted V to $$V_h$$. The optimality of the method depends on the quality of the polynomial test functions defining the space $$V_h = {\text {span}}\{ v_h \}$$ and how far are they from the supremum defined in (1). There are many method for stabilization of different PDEs [28, 29, 30, 31]. In 2010, the Discontinuous Petrov Galerkin (DPG) method was proposed, with the modern summary of the method described in .
The DPG method utilizes the residual minimization with broken test spaces. In other words, it first generates a system of linear equations
\begin{aligned} \begin{bmatrix} G &{} -B \\ B^T &{} 0 \\ \end{bmatrix} \begin{bmatrix} r \\ u \end{bmatrix} = \begin{bmatrix} l \\ 0 \end{bmatrix}. \end{aligned}
(4)
This system of linear equations has the inner product block G over the test space, the two blocks with the actual weak form B and $$B^T$$, and the zero block 0. The test space is larger than the trial space, and the inner product and the weak form blocks are rather sparse matrices. Therefore, the dimension of the system of linear equations is at least two times larger than the original system of equations arising from standard Galerkin method. In the DPG method, the test space is broken in order to obtain a block-diagonal matrix G and the Schur complements can be locally computed over each finite element. The price to pay is the presence of the additional fluxes on the element interfaces, resulting from breaking the test spaces, so the system over each finite element looks like
\begin{aligned} \begin{bmatrix} G &{} -B_1 &{} -B_2 \\ B_1^T &{} 0 &{} 0 \\ B_2^T &{} 0 &{} 0 \end{bmatrix} \begin{bmatrix} r \\ u \\ t \end{bmatrix} = \begin{bmatrix} l \\ 0 \\ 0 \end{bmatrix}. \end{aligned}
(5)
We do not know any other reason of breaking the test spaces in the DPG method other then reduction of the computational cost of the solver.

In this paper, we want to avoid dealing with fluxes and broken spaces since it is technically very complicated. Thus, we stay with the unbroken global system (4) and then we have to face one of the two possible methods. The first one would be to apply adaptive finite element method, but then the cost of factorization in 3D would be up to four times slower than in the standard finite element method and broken DPG (without the static condensation). This is because depending on the structure of the refined mesh, we will have a computational cost of the multi-frontal solver varying between $$\mathcal{O}(N)$$ to $$\mathcal{O}(N^2)$$ , and our N is two times bigger than in the original weak problem, and $$2^2=4$$. This could be an option that we will discuss in a future paper.

Another method that we exploit in this paper is to keep a tensor product structure of the computational patch of elements with tensor product B-spline basis functions, decompose the system matrix into a Kronecker product structure, and utilize a linear computational cost alternating directions solver. Even for the system (4) resulting from the residual minimization we successfully perform direction splitting to obtain a Kronecker product structure of the matrix to maintain the linear computational cost of the alternating directions method.

In order the stabilize the time-dependent advection-diffusion simulations, we perform the following steps. First, we apply the time integration scheme. We use the Douglas-Gunn second order time integration scheme . Second, we stabilize a system from every time step by employing the residual minimization method [34, 35, 36]. Finally, we perform numerical discretization with isogeometric analysis , using tensor product B-spline basis functions over a three-dimensional cube shape patch of elements.

The novelties of this paper with regard to our previous work are the following. In , we described parallel object-oriented JAVA based implementation of the explicit dynamics version of the alternating directions solver, without any residual minimization stabilization, and for two-dimensional problems only. In , we described sequential Fortran based implementation of the explicit dynamics solver, with applications of the elastic wave propagation, without implicit time integration schemes and any residual minimization stabilization. In , we described the parallel distributed memory implementation of the explicit dynamics solver, again without implicit time integration scheme and residual minimization method. In , we described the parallel shared-memory implementation of the explicit dynamics solver, with the same restrictions as before. In [13, 17] we applied the explicit dynamics solver for two and three-dimensional tumor growth simulations. In all of these papers, we did not used implicit time integration schemes, and we did not perform operator splitting on top of the residual minimization method. In , we investigate different time integration schemes for two-dimensional residual minimization method for advection-diffusion problems. We do not go for three-dimensional computations, and we do not apply parallel computations there.

In this paper, we apply the residual minimization with direction splitting for the first time in three-dimensions. We also investigate the parallel scalability of our solver, using the GALOIS framework for parallelization. For more details on the GALOIS framework itself, we refer to [21, 22, 23, 24].

The structure of this paper is the following. We start in Sect. 2 with the derivation of the isogeometric alternating direction implicit method for the advection-diffusion problem. The following Sect. 3 derives the residual minimization method formulation of the advection-diffusion problem in three-dimensions. Next, in Sect. 4, we present the linear computational cost numerical results. We summarize the paper with conclusions in Sect. 5.

## 2 Model Problem of Three-Dimensional Advection-Diffusion

Let $$\varOmega =\varOmega _{x}\times \varOmega _{y}\times \varOmega _{z}\subset \mathbb R^{3}$$ an open bounded domain and $$I=(0,T]\subset \mathbb R$$, we consider the three-dimensional linear advection-diffusion equation
\begin{aligned} \displaystyle { \left\{ \begin{aligned} u_{t}-\nabla \cdot (\alpha \nabla u)+\beta \cdot \nabla u=&\;f&\text{ in }&\;\varOmega \times I,\\ u=&\;0&\text{ on }&\;\varGamma \times I,\\ u(0)=&\;u_{0}\;&\text{ in }&\;\varOmega ,\\ \end{aligned} \right. } \end{aligned}
(6)
where $$\varOmega _{x}$$, $$\varOmega _{y}$$ and $$\varOmega _{z}$$ are intervals in $$\mathbb R$$. Here, $$u_{t}:=\partial u/\partial t$$, $$\varGamma =\partial \varOmega$$ denotes the boundary of the spatial domain $$\varOmega$$, $$f:\varOmega \times I\longrightarrow \mathbb R$$ is a given source and $$u_{0}:\varOmega \longrightarrow \mathbb R$$ is a given initial condition. We consider constant diffusivity $$\alpha$$ and constant velocity field $$\beta =[\beta _{x}\;\;\beta _{y}\;\;\beta _{z}]$$.
We split the advection-diffusion operator $$\mathcal {L}u=-\nabla \cdot (\alpha \nabla u)+\beta \cdot \nabla u$$ as $$\mathcal {L}u=\mathcal {L}_{1}u+\mathcal {L}_{2}u+\mathcal {L}_{3}u$$ where
$$\mathcal {L}_{1}u:=-\alpha \frac{\partial u}{\partial x^{2}}+\beta _{x}\frac{\partial u}{\partial x},\;\;\mathcal {L}_{2}u:=-\alpha \frac{\partial u}{\partial y^{2}}+\beta _{y}\frac{\partial u}{\partial y},\;\;\mathcal {L}_{3}u:=-\alpha \frac{\partial u}{\partial z^{2}}+\beta _{z}\frac{\partial u}{\partial z}.$$
Based on this operator splitting, we consider different Alternating Direction Implicit (ADI) schemes to discretize problem (6).
First, we perform a uniform partition of the time interval $$\bar{I}=[0,T]$$ as
$$0=t_{0}<t_{1}<\ldots<t_{N-1}<t_{N}=T,$$
and denote $$\tau :=t_{n+1}-t_{n},\;\forall n=0,\ldots ,N-1$$.
In the Douglas-Gunn scheme, we integrate the solution from time step $$t_{n}$$ to $$t_{n+1}$$ in three substeps as follows:
\begin{aligned} \displaystyle {\left\{ \begin{aligned} (1+\frac{\tau }{2}\mathcal {L}_{1})u^{n+1/3}&=\tau f^{n+1/2}+(1-\frac{\tau }{2}\mathcal {L}_{1}-\tau \mathcal {L}_{2}-\tau \mathcal {L}_{3})u^{n},\\ (1+\frac{\tau }{2}\mathcal {L}_{2})u^{n+2/3}&=u^{n+1/3}+\frac{\tau }{2}\mathcal {L}_{2}u^{n},\\ (1+\frac{\tau }{2}\mathcal {L}_{3})u^{n+1}&=u^{n+2/3}+\frac{\tau }{2}\mathcal {L}_{3}u^{n}.\\ \end{aligned} \right. } \end{aligned}
(7)
The variational formulation of scheme (7) is
\begin{aligned} \displaystyle {\left\{ \begin{aligned}&(u^{n+1/3},v)+\frac{\tau }{2}\left( \alpha \frac{\partial u^{n+1/3}}{\partial x},\frac{\partial v}{\partial x}\right) +\frac{\tau }{2}\left( \beta _{x}\frac{\partial u^{n+1/3}}{\partial x},v\right) =(u^{n},v)-\frac{\tau }{2}\left( \alpha \frac{\partial u^{n}}{\partial x},\frac{\partial v}{\partial x}\right) \\&-\frac{\tau }{2}\left( \beta _{x}\frac{\partial u^{n}}{\partial x},v\right) -\tau \left( \alpha \frac{\partial u^{n}}{\partial y},\frac{\partial v}{\partial y}\right) -\tau \left( \beta _{y}\frac{\partial u^{n}}{\partial y},v\right) -\tau \left( \alpha \frac{\partial u^{n}}{\partial z},\frac{\partial v}{\partial z}\right) \\&-\tau \left( \beta _{z}\frac{\partial u^{n}}{\partial z},v\right) +\tau (f^{n+1/2},v), (u^{n+2/3},v)+\frac{\tau }{2}\left( \alpha \frac{\partial u^{n+2/3}}{\partial y},\frac{\partial v}{\partial y}\right) +\frac{\tau }{2}\left( \beta _{y}\frac{\partial u^{n+2/3}}{\partial y},v\right) \\&=(u^{n+1/3},v)+\frac{\tau }{2}\left( \alpha \frac{\partial u^{n}}{\partial y},\frac{\partial v}{\partial y}\right) +\frac{\tau }{2}\left( \beta _{y}\frac{\partial u^{n}}{\partial y},v\right) , (u^{n+1},v)+\frac{\tau }{2}\left( \alpha \frac{\partial u^{n+1}}{\partial z},\frac{\partial v}{\partial z}\right) \\&+\frac{\tau }{2}\left( \beta _{z}\frac{\partial u^{n+1}}{\partial z},v\right) =(u^{n+2/3},v)+\frac{\tau }{2}\left( \alpha \frac{\partial u^{n}}{\partial z},\frac{\partial v}{\partial z}\right) +\frac{\tau }{2}\left( \beta _{z}\frac{\partial u^{n}}{\partial z},v\right) , \end{aligned} \right. } \end{aligned}
(8)
where $$(\cdot ,\cdot )$$ denotes the inner product of $$L^{2}(\varOmega )$$. Finally, expressing problem (8) in matrix form we have
\begin{aligned} \displaystyle {\left\{ \begin{aligned}&\left[ M^{x}+\frac{\tau }{2}(K^{x}+G^{x})\right] \otimes M^{y}\otimes M^{z}u^{n+1/3}\\&=\left[ M^{x}-\frac{\tau }{2}(K^{x}+G^{x})\right] \otimes M^{y}\otimes M^{z}u^{n}\\&-\tau M^{x}\otimes (K^{y}+G^{y})\otimes M^{z}u^{n}-\tau M^{x}\otimes M^{y}\otimes (K^{z}+G^{z})u^{n}+\tau F^{n+1/2}\\&M^{x}\otimes \left[ M^{y}+\frac{\tau }{2}(K^{y}+G^{y})\right] \otimes M^{z}u^{n+2/3}\\&=M^{x}\otimes M^{y}\otimes M^{z}u^{n+1/3}+M^{x}\otimes \frac{\tau }{2}(K^{y}+G^{y})\otimes M^{z}u^{n},\\&M^{x}\otimes M^{y}\otimes \left[ M^{z}+\frac{\tau }{2}(K^{z}+G^{z})\right] u^{n+1} \\&=M^{x}\otimes M^{y}\otimes M^{z}u^{n+2/3}+M^{x}\otimes M^{y}\otimes \frac{\tau }{2}(K^{z}+G^{z})u^{n},\\\end{aligned} \right. } \end{aligned}
(9)
where $$M^{x,y,z}$$, $$K^{x,y,z}$$ and $$G^{x,y,z}$$ are the 1D mass, stiffness and advection matrices, respectively.

## 3 Isogeometric Residual Minimization Method

In our method, in every time step we solve the problem with identical left-hand-side: Find $$u \in U$$ such that
\begin{aligned} b\left( u,v\right) =l\left( v\right) \quad \forall v\in V, \end{aligned}
(10)
\begin{aligned} b\left( u,v\right) = \left( u,v\right) + \tau /2 \left( \left( \beta _{\underline{i}} \frac{\partial u}{\partial x_{\underline{i}}}, v\right) + \alpha _{\underline{i}}\left( \frac{\partial u}{\partial x_{\underline{i}}}, \frac{\partial v}{\partial x_{\underline{i}}}\right) \right) , \end{aligned}
(11)
Here $$i \in \{1,2,3\}$$, so we have denoted here $$(x_1,x_2,x_3)=(x,y,z)$$, and $$\underline{i}$$ means that we are not using the Einstein summation here. The right-hand-side depends on the sub-step and the time integration scheme used. In the Douglas-Gunn time integration scheme, in the first, second and third sub-step the right-hand side is defined as:
\begin{aligned} \displaystyle {\left\{ \begin{aligned} l\left( w,v\right)&=(w,v)-\frac{\tau }{2}\left( \alpha \frac{\partial w}{\partial x},\frac{\partial v}{\partial x}\right) -\frac{\tau }{2}\left( \beta _{x}\frac{\partial w}{\partial x},v\right) -\tau \left( \alpha \frac{\partial w}{\partial y},\frac{\partial v}{\partial y}\right) -\tau \left( \beta _{y}\frac{\partial w}{\partial y},v\right) \\&-\tau \left( \alpha \frac{\partial w}{\partial z},\frac{\partial v}{\partial z}\right) -\tau \left( \beta _{z}\frac{\partial w}{\partial z},v\right) +\tau (f^{n+1/2},v),\\ l\left( w,v\right)&=(w,v)+\frac{\tau }{2}\left( \alpha \frac{\partial w}{\partial y},\frac{\partial v}{\partial y}\right) +\frac{\tau }{2}\left( \beta _{y}\frac{\partial w}{\partial y},v\right) ,\\ l\left( w,v\right)&=(w,v)+\frac{\tau }{2}\left( \alpha \frac{\partial w}{\partial z},\frac{\partial v}{\partial z}\right) +\frac{\tau }{2}\left( \beta _{z}\frac{\partial w}{\partial z},v\right) .\\ \end{aligned} \right. } \end{aligned}
(12)
In our advection-diffusion problem we seek the solution in space
\begin{aligned} U = V =\left\{ v: \int _{\varOmega } \left( v^2+\left( \frac{\partial v}{\partial x_{\underline{i}}}\right) ^2 \right) <\infty \right\} . \end{aligned}
(13)
where $$i=1,2,3$$ denotes the spatial directions. The inner product in V is defined as
\begin{aligned} \left( u,v\right) _V=\left( u,v\right) _{L_2}+\left( \frac{\partial u}{\partial x_{\underline{i}}},\frac{\partial v}{\partial x_{\underline{i}}}\right) _{L_2}, \end{aligned}
(14)
where $$i=1,2,3$$ depending on the sub-step index in the alternating directions method, and we do not use here the Einstein convention. For a weak problem, we define the operator
\begin{aligned} B: U \rightarrow V', \end{aligned}
(15)
such that
\begin{aligned} \langle Bu , v \rangle _{V' \times V} = b(u,v), \end{aligned}
(16)
so we can reformulate the problem as
\begin{aligned} Bu - l = 0. \end{aligned}
(17)
We wish to minimize the residual
\begin{aligned} u_h = \mathrm {argmin}_{w_h \in U_h} {1 \over 2} \Vert Bw_h - l \Vert _{V'}^2. \end{aligned}
(18)
We introduce the Riesz operator being the isometric isomorphism
\begin{aligned} R_V :V \ni v \rightarrow (v,.) \in V'. \end{aligned}
(19)
We can project the problem back to V
\begin{aligned} u_h = \mathrm {argmin}_{w_h \in U_h} {1 \over 2} \Vert R_V^{-1} (Bw_h - l) \Vert _V^2. \end{aligned}
(20)
The minimum is attained at $$u_h$$ when the Gâteaux derivative is equal to 0 in all directions:
\begin{aligned} \langle R_V^{-1} (Bu_h - l), R_V^{-1}(B\, w_h) \rangle _V = 0, \quad \forall \, w_h \in U_h. \end{aligned}
(21)
We define the error representation function $$r=R_V^{-1}(Bu_h-l)$$ and our problem is reduced to
\begin{aligned} \langle r , R_V^{-1} (B\, w_h ) \rangle = 0, \quad \forall \, w_h \in U_h, \end{aligned}
(22)
which is equivalent to
\begin{aligned} \langle Bw_h, r \rangle = 0, \quad \forall w_h \in U_h. \end{aligned}
(23)
From the definition of the residual we have
\begin{aligned} (r,v)_V=\langle B u_h-l,v \rangle , \quad \forall v\in V. \end{aligned}
(24)
Our problem reduces to the following semi-infinite problem: Find $$(r,u_h)_{V\times U_h}$$ such as
\begin{aligned} \begin{aligned} (r,v)_V - \langle B u_h,v \rangle&= \langle l,v \rangle , \quad \forall v\in V, \\ \langle Bw_h,r\rangle&= 0 \quad \forall w_h \in U_h. \end{aligned} \end{aligned}
(25)
We discretize the test space $$V_m \in V$$ to get the discrete problem: Find $$(r_m,u_h)_{V_m\times U_h}$$ such as
\begin{aligned} \begin{aligned} (r_m,v_m)_{V_m} - \langle B u_h,v_m \rangle&= \langle l,v\rangle \quad \forall v\in V_m \\ \langle Bw_h,r_m\rangle&= 0 \quad \forall w_h \in U_h. \end{aligned} \end{aligned}
(26)
Note that the residual minimization method is a Petrov-Galerkin method (test and trial spaces are different). We stabilize the problem by increasing the dimension of the test space. Notice that the residual minimization system here is of the following form
\begin{aligned} \begin{bmatrix} G &{} -B \\ B^T &{} 0 \\ \end{bmatrix} \begin{bmatrix} r \\ u \end{bmatrix} = \begin{bmatrix} l \\ 0 \end{bmatrix}, \end{aligned}
(27)
where the right-top and left-bottom matrices B and $$B^T$$ can be split according to (9), and the inner product (14) part G can be split in the following way:
\begin{aligned} G=\displaystyle {\left\{ \begin{aligned}&[\tilde{M}^{x}+\tilde{K}^{x}]\otimes \tilde{M}^{y} \otimes \tilde{M}^{z}, \\&\tilde{M}^{x}\otimes [\tilde{M}^{y}+\tilde{K}^{y}] \otimes \tilde{M}^{z} , \\&\tilde{M}^{x}\otimes \tilde{M}^{y} \otimes [\tilde{M}^{z}+\tilde{K}^{z}], \end{aligned} \right. } \end{aligned}
(28)
where we consider three different splittings for three sub-steps, and $$\tilde{M}^{x,y,z}$$, and $$\tilde{K}^{x,y,z}$$ are the 1D mass and stiffness matrices over the test space in direction xy,  or z, respectively.

Now, in the first sub-step, we approximate the solution with tensor product of one dimensional B-splines basis functions of order p, $$u_h = \sum _{i,j,k} u_{i,j,k} B^x_{i;p}(x)B^y_{j;p}(y)B^z_{k;p}(z)$$. We test with tensor product of one dimensional B-splines basis functions, where we enrich the order in the direction of the x axis from p to $$o \ge p$$, and we enrich the test space only in the direction of the alternating splitting $$v_m \leftarrow B^x_{i;o}(x)B^y_{j;p}(y)B^z_{k;p}(z)$$. We approximate the residual with tensor product of one dimensional B-splines basis functions of order p, $$r_m = \sum _{s,t,q} r_{s,t,q} B^x_{s;t}(x)B^y_{t;p}(y)B^z_{t;p}(z)$$, and we test with tensor product of 1D B-spline basis functions of order o and p, in the corresponding directions $$w_h \leftarrow B^x_{k;o}(x)B^y_{l;p}(y)B^z_{m;p}(z)$$.

Notice that we stabilize the problem by enriching the test space with respect to the trial space in the alternating direction manner. Now, in the first sub-step we have $$M^{y}=\tilde{M}^y$$, $$M^{z}=\tilde{M}^z$$ and $$M^{x} \ne \tilde{M}^x$$, and $$M^{yT}=M^{y}, M^{zT}=M^{z}$$. Now, in the first sub-step we have
\begin{aligned} \begin{pmatrix} G &{} B \\ B^T &{} 0 \\ \end{pmatrix} = \begin{pmatrix} [\tilde{M}^{x}+\tilde{K}^x] &{} \left[ M^{x}+\frac{\tau }{2}(K^{x}+G^{x})\right] \\ \left[ M^{x}+\frac{\tau }{2}(K^{x}+G^{x})\right] ^T &{} 0 \\ \end{pmatrix}\otimes M^{y} \otimes M^{z} \end{aligned}
(29)
in the second sub-step
\begin{aligned} \begin{pmatrix} G &{} B \\ B^T &{} 0 \\ \end{pmatrix} = M^{x}\otimes \begin{pmatrix} [\tilde{M}^{y}+\tilde{K}^y] &{} \left[ M^{y}+\frac{\tau }{2}(K^{y}+G^{y})\right] \\ \left[ M^{y}+\frac{\tau }{2}(K^{y}+G^{y})\right] ^T&{} 0\\ \end{pmatrix} \otimes M^{z} \end{aligned}
(30)
and in the third sub-step
\begin{aligned} \begin{pmatrix} G &{} B \\ B^T &{} 0 \\ \end{pmatrix} = M^{x}\otimes M^{y} \otimes \begin{pmatrix} [\tilde{M}^{z}+\tilde{K}^z] &{} \left[ M^{z}+\frac{\tau }{2}(K^{z}+G^{z})\right] \\ \left[ M^{z}+\frac{\tau }{2}(K^{z}+G^{z})\right] ^T &{} 0 \\ \end{pmatrix} \end{aligned}
(31)
All these matrices are the Kronecker products of three multi-diagonal sub-matrices, and they can be factorized in a linear $$\mathcal{O}(N)$$ computational cost.

## 4 Numerical Results

### 4.1 Manufactured Solution Problem

In order to verify the order and accuracy of the Douglas-Gunn time-integration schemes with IGA-FEM discretization and the direction splitting solver, we construct a time-dependent advection-diffusion problem with manufactured solution.
\begin{aligned} \frac{d u}{dt} -\nabla \cdot \left( \alpha \nabla u\right) + \beta \cdot \nabla u = f, \end{aligned}
with $$\alpha =10^{-2}$$, $$\beta =(1,0,0)$$, with zero Dirichlet boundary conditions solved on a square $$[0,1]^3$$ domain. We setup the forcing function f(xyzt) in such a way that it delivers the manufactured solution of the form $$u_{exact}(x,y,z;t)=\sin (\pi x)\sin (\pi y)\sin (\pi z)\sin (\pi t)$$ on a time interval [0, 2].

We solve the problem with residual minimization method on $$32 \times 32 \times 32$$ mesh with different time steps, as presented in Fig. 1, using the Douglas-Gunn time integration scheme and the direction splitting solver using the Kronecker product structure of the matrices.

We compute the error between the exact solution $$u_{exact}$$ and the numerical solution $$u_h$$. We present the comparisons with different time step size $$\tau$$. We compute relative error $${\Vert }u_{\text {exact}}(t) - u_{\text {h}}(t){\Vert }_{L^2} / {\Vert }u_{\text {exact}}(t){\Vert }_{L^2} \cdot 100 \%$$ and plot it in Fig. 1. The horizontal lines represent the time step size selected for the entire simulation, and the vertical lines present the numerical error with respect to the known exact solution.

The Douglas-Gunn scheme is of the second order accurate, down to the accuracy of $$10^{-5}$$.

### 4.2 Pollution Propagation Simulations

In this section, we describe the numerical simulation of three-dimensional model advection-diffusion problem over a 3D cube shape domain with dimensions $$5000\times 5000\times 5000$$ m.
\begin{aligned} \frac{d u}{dt} -\nabla \cdot \left( \alpha \nabla u\right) - \beta \cdot \nabla u = f, \end{aligned}
(32)
In our equation we have the diffusion coefficients $$\alpha =(50,50,0.5)$$. We utilize tensor products of 1D B-splines along the x, y, and z. We apply the alternating direction implicit solver with three intermediate time steps. The velocity field is $$\beta =(\beta ^x(t),\beta ^y(t),\beta ^z(t))=(\cos a(t),\sin a(t),v(t))$$ where $$a(t)=\frac{\pi }{3} (\sin (s) + \frac{1}{2} \sin (2.3 s)) + \frac{3}{8} \pi$$, $$v(t)=\frac{1}{3} \sin (s)$$, $$s=\frac{t}{150}$$. The source is given by $$f(p) = (r - 1)^2 (r + 1)^2$$, where $$r = \min (1, (|p - p_0| / 25)^2)$$, p represents the distance from the source, and $$p_0$$ is the location of the source $$p_0=(3,3,2)$$. The initial state is defined as the constant concentration of the order of $$10^{-6}$$ in the entire domain (numerical zero). Fig. 1.Numerical error for Douglas-Gunn time integration scheme on $$32\times 32\times 32$$ mesh for different time steps.

The physical meaning of this setup is the following. We model the propagation of the pollutant generated by a single source modeled by the f function, distributed by the wind blowing with changing directions, modeled by $$\beta$$ function, and the diffusion phenomena modeled by the coefficients $$\alpha$$ The computational domain unit is meter [m], the wind velocity $$\beta$$ is given in meters per second $$[\frac{m}{s}]$$, and the diffusion coefficient $$\alpha$$ is given in square meters per second $$[\frac{m^2}{s}]$$. The units for the solution are then kilograms per cube meter $$[\frac{kg}{m^3}]$$. We expect from the numerical results to observe the propagation of the pollutant as distributed by the wind and the diffusion process.

Our first numerical results concern the computational mesh with a size of $$50\times 50 \times 50$$ elements with quadratic B-splines. We employ standard Galerkin formulation here with direction splitting, without the residual minimization method. We perform 300 time steps of the numerical simulation. The snapshots presented in Fig. 2 represent time steps 100, 200 and 300. We observe unexpected “oscillations” and “reflections”. Since the simulation is supposed to model the propagation of the pollutant from a chimney by means of the advection (wind) and diffusion phenomena, the oscillations and reflections on the boundary are not expected there. Both these phenomena appear and disappear during the entire simulation; they do not cause a blowup of the entire simulations, just unexpected local behavior.

To improve the spatial stability of the simulation, we add now the residual minimization method on top of the Galerkin setup. Thus, the second simulation was performed again on the mesh size with $$50\times 50 \times 50$$ elements, with quadratic B-splines for trial and cubic B-Splines for test. The snapshots from the numerical results are presented in Fig. 2. We perform 300 time steps, and we present the snapshots in time steps 100, 200, and 300. Fig. 2.Top panel: simulation with Galerkin method. Snapshots from the time steps 10, 20, and 30 of with quadratic C1 B-splines over $$50\times 50\times 50$$ mesh, without stabilization. Bottom panel: simulation with residual minimization method. Snapshots from the time steps 100, 200 and 300 of the first problem simulation with quadratic C1 B-splines for trial and cubic C2 B-splines for test over $$50\times 50\times 50$$ mesh. Fig. 3.Speedup (top line) and efficiency (bottom line) for $$p = 1$$ for trial, $$p=2$$ for testing (first column), $$p = 2$$ for trial, $$p=3$$ for testing (second column), and $$p = 3$$ for trial, $$p=4$$ for testing (third column).

We use the implicit extension of the parallel code  for shared memory Linux cluster nodes. The total simulation time was 100 min on a laptop with i7 6700Q processor 2.6 GHz (8 cores with HT) and 16B or RAM. We emphasize that ADI is not an iterative solver. It is just a linear $$\mathcal{O}(N)$$ computational cost solver that performs Gaussian elimination for matrices having Kronecker product structure. Thus, the solution obtained by the solver is exact (up to the round-off errors). In this sense, we do not present the iterations or convergence of the ADI solver, since it is executed once per each time step. In other words, we can perform 300 Gaussian elimination, each with 1,000,000 unknowns, with the high accuracy resulting from the ADI direct solver (only round-off errors are involved), on a laptop with eight cores, with the implicit method, within 1.5 h.

## 5 Parallel Scalability

Implementation of the ideas described in the preceding sections has been created in C++ and parallelized using our code for IGA-FEM simulations with ADI solver , extended to the implicit method. We use the GALOIS framework for parallelization [21, 22, 23, 24].

The parallelization concerns mainly the algorithm of the integration of the right-hand-side vector. The cost of generation of the one-dimensional matrices and the factorization with multiple-right-hand sides is negligible in comparison to the integration of the higher-order B-splines over the three-dimensional mesh. The parallel implementation in GALOIS involves the usage of Galois::for_each, Galois::Runtime::LL::SimpleLock.
The speedup and efficiency of the code are presented in Fig. 3. We can draw the following conclusions from the presented plots:
• For linear B-splines for trial and quadratic B-splines for testing and large grids $$32\times 32\times 32$$ and $$64\times 64\times 64$$ the speedup grows up to 16 cores. It is around 10–11 for 16 cores. The corresponding efficiency for 16 cores is around 0.7. Then, for 32 cores the speedup went down since for more than 20 cores used the hyperthreading is utilized.

• For quadratic B-splines for trial and cubic B-splines and large grids $$32\,\times \,32\,\times \,32$$ and $$64\,\times \,64\,\times \,64$$ the speedup grows up to 16 cores. It is around 12–14 for 16 cores. The corresponding efficiency for 16 cores is around 0.8–0.9. Then, for 32 cores and $$32\times 32\times 32$$ mesh the speedup grows up to 17, and for $$64\times 64\times 64$$ mesh is decreases slightly since for more than 20 cores the hyperthreading is used.

• For cubic B-splines for trial and quartic B-splines for testing and large grids $$32\,\times \,32\,\times \,32$$ and $$64\,\times \,64\,\times \,64$$ the speedup grows up to 32 cores. It is around 15 for 16 cores (near perfect speedup) and around 20 for 32 cores, where we use the hyperthreading (more than 20 cores). The corresponding efficiency for 16 cores is around 0.9–1.0. Then, for 32 cores the efficiency decreases slightly down to 0.6–0.7.

• Increasing the mesh size increases the parallel scalability up to $$32\times 32 \times 32$$ mesh. Larger mesh, $$64\times 64\times 64$$ performs slightly worse than $$32\times 32 \times 32$$ mesh.

• The most interesting observation is that while increasing the B-splines order we observe the improvement of the parallel scalability. This is important from the point of view of the stabilization with the residual minimization method. The order of B-splines in the test space is increased to enforce the stabilization, and when we increase the order to obtain the stabilization, we also improve the parallel scalability.

## 6 Conclusions

We introduced an isogeometric finite element method for an implicit simulations of the advection-diffusion problem with Douglas-Gunn time-integration scheme that results in a Kronecker product structure of the matrix in every time step. The application of B-spline basis functions for the approximation of the numerical solutions results in a smooth, higher order approximation of the solution. It also enables for the residual minimization stabilization with a linear computational cost $$\mathcal{O}(N)$$ of the direct solver. The method has been verified on a three-dimensional advection-diffusion problem. Our future work will involve the extension of the model to more complicated equations and geometries. In particular, we plan to use the isogeometric alternating direction implicit solver for tumor growth simulations in two- and three-dimensions [13, 17]. Our equations can also be extended to model a pollution problem, with different chemical components, propagating and reacting together through space in time, as described in .

## References

1. 1.
Peaceman, D.W., Rachford Jr., H.H.: The numerical solution of parabolic and elliptic differential equations. J. Soc. Ind. Appl. Math. 3, 28–41 (1955)
2. 2.
Douglas, J., Gunn, J.E.: A general formulation of alternating direction methods. Numer. Math. 6(1), 428–453 (1964)
3. 3.
Birkhoff, G., Varga, R.S., Young, D.: Alternating direction implicit methods. Adv. Comput. 3, 189–273 (1962)
4. 4.
Cottrell, J.A., Hughes, T.J.R., Bazilevs, Y.: Isogeometric Analysis: Towards Unification of CAD and FEA. Wiley, Hoboken (2009)
5. 5.
Douglas, J., Rachford, H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)
6. 6.
Gao, L., Calo, V.M.: Fast isogeometric solvers for explicit dynamics. Comput. Methods Appl. Mech. Eng. 274(1), 19–41 (2014)
7. 7.
Gao, L., Calo, V.M.: Preconditioners based on the alternating-direction-implicit algorithm for the 2D steady-state diffusion equation with orthotropic heterogeneous coefficients. J. Comput. Appl. Math. 273(1), 274–295 (2015)
8. 8.
Guermond, J.L., Minev, P.: A new class of fractional step techniques for the incompressible Navier-Stokes equations using direction splitting. C.R. Math. 348(9–10), 581–585 (2010)
9. 9.
Guermond, J.L., Minev, P., Shen, J.: An overview of projection methods for incompressible flows. Comput. Methods Appl. Mech. Eng. 195, 6011–6054 (2006)
10. 10.
Keating, J., Minev, P.: A fast algorithm for direct simulation of particulate flows using conforming grids. J. Comput. Phys. 255, 486–501 (2013)
11. 11.
Gurgul, G., Woźniak, M., Łoś, M., Szeliga, D., Paszyński, M.: Open source JAVA implementation of the parallel multi-thread alternating direction isogeometric L2 projections solver for material science simulations. Comput. Methods Mater. Sci. 17, 1–11 (2017)Google Scholar
12. 12.
Łoś, M., Woźniak, M., Paszyński, M., Dalcin, L., Calo, V.M.: Dynamics with matrices possessing Kronecker product structure. Procedia Comput. Sci. 51, 286–295 (2015)Google Scholar
13. 13.
Łoś, M., Paszyński, M., Kłusek, A., Dzwinel, W.: Application of fast isogeometric L2 projection solver for tumor growth simulations. Comput. Methods Appl. Mech. Eng. 316, 1257–1269 (2017)
14. 14.
Łoś, M., Woźniak, M., Paszyński, M., Lenharth, A., Pingali, K.: IGA-ADS: isogeometric analysis FEM using ADS solver. Comput. Phys. Commun. 217, 99–116 (2017)
15. 15.
Łoś, M., Paszyński, M.: Applications of alternating direction solver for simulations of time-dependent problems. Comput. Sci. 18(2), 117–128 (2017)
16. 16.
Woźniak, M., Łoś, M., Paszyński, M., Dalcin, L., Calo, V.M.: Parallel fast isogeometric solvers for explicit dynamics. Comput. Inform. 36(2), 423–448 (2017)
17. 17.
Łoś, M., Kłusek, A., Hassam, M.A., Pingali, K., Dzwinel, W., Paszyński, M.: Parallel fast isogeometric L2 projection solver with GALOIS system for 3D tumor growth simulations. Comput. Methods Appl. Mech. Eng. 343, 1–22 (2019)
18. 18.
Oliver, A., Montero, G., Montenegro, R., Rodríguez, E., Escobar, J.M., Pérez-Foguet, A.: Adaptive finite element simulation of stack pollutant emissions over complex terrain. Energy 49, 47–60 (2013)Google Scholar
19. 19.
Wachspress, E.L., Habetler, G.: An alternating-direction-implicit iteration technique. J. Soc. Ind. Appl. Math. 8, 403–423 (1960)
20. 20.
Łoś, M., Muñoz-Matute, J., Muga, I., Paszyński, M.: Isogeometric residual minimization method (iGRM) with direction splitting for non-stationary advection-diffusion problems. Comput. Math. Appl. (2019, in press)Google Scholar
21. 21.
Pingali, K., et al.: The tao of parallelism in algorithms. SIGPLAN Not. 46(6), 12–25 (2011)Google Scholar
22. 22.
Hassaan, M.A., Burtscher, M., Pingali, K.: Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP 2011 (2011)Google Scholar
23. 23.
Lenharth, A., Nguyen, D., Pingali, K.: Priority queues are not good concurrent priority schedulers. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015. LNCS, vol. 9233, pp. 209–221. Springer, Heidelberg (2015). Google Scholar
24. 24.
Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., Chew, L.P.: Optimistic parallelism requires abstractions. ACM SIGPLAN Not. 42(6), 211–222 (2007)Google Scholar
25. 25.
Demkowicz, L.: Babuśka $$<=>$$ Brezzi, ICES-Report 0608, 2006, The University of Texas at Austin, USA. https://www.ices.utexas.edu/media/reports/2006/0608.pdf
26. 26.
Babuśka, I.: Error bounds for finite element method. Numer. Math. 16, 322–333 (1971)
27. 27.
Brezzi, F.: On the existence, uniqueness and approximation of saddle-point problems arising from Lagrange multiplier. ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique et Analyse Numérique 8.R2, pp. 129–151 (1974)Google Scholar
28. 28.
Hughes, T.J.R., Scovazzi, G., Tezduyar, T.E.: Stabilized methods for compressible flows. J. Sci. Comput. 43(3), 343–368 (2010)
29. 29.
Franca, L.P., Frey, S.L., Hughes, T.J.R.: Stabilized finite element methods: I. Application to the advective-diffusive model. Comput. Methods Appl. Mech. Eng. 95(2), 253–276 (1992)
30. 30.
Franca, L.P., Frey, S.L.: Stabilized finite element methods: II. The incompressible Navier-Stokes equations. Comput. Methods Appl. Mech. Eng. 99(2–3), 209–233 (1992)
31. 31.
Brezzi, F., Bristeau, M.-O., Franca, L.P., Mallet, M., Rogé, G.: A relationship between stabilized finite element methods and the Galerkin method with bubble functions. Comput. Methods Appl. Mech. Eng. 96(1), 117–129 (1992)
32. 32.
Demkowicz, L., Gopalakrishnan, J.: Recent developments in discontinuous Galerkin finite element methods for partial differential equations. In: Feng, X., Karakashian, O., Xing, Y. (eds.) An Overview of the DPG Method. IMA Volumes in Mathematics and its Applications, vol. 157, pp. 149–180 (2014)Google Scholar
33. 33.
Paszyński, M., Pardo, D., Calo, V.M.: Direct solvers performance on $$h$$-adapted grids. Comput. Math. Appl. 70(3), 282–295 (2015)
34. 34.
Chan, J., Evans, J.A.: A Minimum-residual finite element method for the convection-diffusion equations. ICES-Report 13-12 (2013)Google Scholar
35. 35.
Broersen, D., Dahmen, W., Stevenson, R.P.: On the stability of DPG formulations of transport equations. Math. Comput. 87, 1051–1082 (2018)
36. 36.
Broersen, D., Stevenson, R.: A robust Petrov-Galerkin discretisation of convection-diffusion equations. Comput. Math. Appl. 68(11), 1605–1618 (2014)

© Springer Nature Switzerland AG 2020

## Authors and Affiliations

• Marcin Łoś
• 1
• Judit Munoz-Matute
• 2