# Parallel Numerical Solution of a 2D Chemotaxis-Stokes System on GPUs Technology

- 156 Downloads

## Abstract

The aim of this paper is the numerical solution of a 2D chemotaxis model by a parallel numerical scheme, implemented on a GPU technology. The numerical discretization relies on the utilization of a finite difference scheme for the spatial part and the explicit Euler method for the time integration. Accuracy and stability properties are provided. The effectiveness of the approach, as well as the coherence of the results with respect to the modeled phenomenon, is provided through numerical evidence, also giving a performance analysis of the serial and the parallel implementations.

## Keywords

Chemotaxis GPU computing Parallel numerical method## 1 Introduction

Chemotaxis [3, 10, 11, 14, 18, 23, 24] is a very common phenomenon consisting in the movement of an organism in response to a chemical stimulus. For example, in order to find food, bacteria swim toward highest concentration of food molecules [10]. Another example is given by the motion of sperm towards the egg during fertilization in which chemotaxis phenomena are very crucial. Sometimes, as we can read in [23], the mechanism that allows chemotaxis phenomena in animals can be subverted; this is the case, for example, of cancer metastasis.

The model we deal with was first derived in [24] in order to describe the swimming of bacteria and oxygen transport near contact lines. Subsequently, this model was modified and completed by Cao in [3], where he described the motion of oxygen consumed by bacteria in a drop of water. The model is given by the chemotaxis-Stokes system with a rotational flux term, in a three-dimensional domain. The equations for an incompressible Navier-Stokes fluid are coupled with two parabolic equations, in which the first one presents a chemotactic term. In [3], it is proved in the two dimensional case and three dimensional case the existence and uniqueness of classical solution under a smallness assumption in the initial concentration. We will recall these results and we will give a wider description of this model in Sect. 2.

Numerical analysis plays a crucial role in computing solutions for PDEs system especially when it is quite difficult to find the analytical one or it is proved under some restrictive assumptions on the data of the problem. For this reason, our goal is to develop a numerical scheme to compute the solution of this system and to simulate it. In particular, we expect that there exists a time *t* after which the bacteria start the chemotaxis and move toward the oxygen.

Numerical solutions often require high spatial resolution to capture the detailed biophysical phenomena. As a consequence, long computational times are often required when using a serial implementation of a numerical scheme. Parallel computation can strongly improve the time efficiency of some numerical methods such as finite differences algorithms, which are relatively simple to implement and apply to a generic PDEs system. The Graphics Processing Units (GPUs) are perfect to use when we want to execute a numerical code based of a very large number of grid points, since the larger is the number of the grid points, the higher is the accuracy of the our numerical solution.

The codes used to study the performance of GPUs presented in this article were programmed using CUDA. The CUDA platform (Compute Unified Device Architecture), introduced by NVIDIA in 2007, was designed to support GPU execution of programs and focuses on data parallelism [12]. With CUDA, graphics cards can be programmed with a medium-level language, that can be seen as an extension to C/C++, without requiring a great deal of hardware expertise. We refer to [15, 19] for a comprehensive introduction to GPU-based parallel computing, including details about the CUDA programming model and the architecture of current generation NVIDIA GPUs. As regards the application of GPU computing to partial differential equations, see [1, 5, 13] and references therein.

It is important to point out that, although the model is set in the three dimensional case we will perform our numerical analysis in a two dimensional setting. This assumption is not too restrictive since this is the most treated case in the literature concerning chemotaxis models (see [11] and reference therein). Indeed in many models, because of the microscopic third dimension, without loss of generality, cells are considered bidimensional.

This paper is organized as follows. In the next section, Sect. 2, a short description of the biological phenomenon and the equations of the model are presented. We present the numerical scheme in Sect. 3.

In the Sect. 4, the analysis of consistency and stability, for our numerical scheme, is given. Moreover, a set of numerical experiments are presented in Sect. 5. Section 6 contains the comparative performance evaluation between GPUs and CPUs implementations of the numerical scheme. We summarize our work and we give some possible future developments in the final section, Sect. 7.

## 2 Mathematical Model

*n*is the density of bacteria,

*c*the oxygen’s concentration and

*u*and

*P*are the velocity and the pressure of the fluid respectively.

The equation (1)\(_1\) describes the density of bacteria. As we can see, this equation is a parabolic equation that admits a diffusion term \(\varDelta n\) and a chemotactic term \(\nabla \cdot (n S(x,n,c)\cdot \nabla c)\), that says that bacteria always move towards the higher oxygen’s areas.

*S*is a rotational tensor, that takes into account the rotations of bacteria, and the function \(\phi \) is a potential function that can be associated to an external force, therefore the term \(n\nabla \phi \) can be seen as buoyant or electric force of bacterial mass. As in [3], we assume the following regularity conditions for the tensor

*S*

### Definition 1 (Stokes operator)

The Stokes operator on \(L^p_\sigma (\varOmega )\) is defined as \(A_p = -\mathcal {P}\varDelta \) with domain \(D(A_p) = W^{2,p}(\varOmega )\cap W_0^{1,p} \cap L_\sigma ^p(\varOmega )\), where \(\mathcal {P}\) is the so-called Helmholtz projection. Since \(A_{p_1}\) and \(A_{p_2}\) coincide on the intersection of their domain for \(p_1,p_2 \in (1,\infty )\), we will drop the index *p*.

We will denote the first eigenvalue of *A* by \(\lambda _1'\), and by \(\lambda _1\) the first nonzero eigenvalue of \(-\varDelta \) on \(\varOmega \) under Neumann boundary conditions.

### Theorem 1

*S*fulfills (2) and (3). There is \(\delta _0\) with the following property: if the initial data fulfill (4) and (5), and

*n*,

*c*,

*u*,

*P*) which is bounded, and satisfies

## 3 Numerical Scheme

*x*,

*y*). Moreover, the functions \(s_{ij}\) defined in (2) are given by \(s_{i,j} = s_{i,j}(x,y,t)\), \(i,j = 1,2\). We assume that the domain \(\varOmega \) has the form \(\varOmega = [0,1] \times [0,1]\) and is discretized as follows. Given an integer

*N*, we denote by \(h=1/(N+1)\) the spatial stepsize and accordingly define the grid

*t*, we denote by \(u_{ij}\) an approximate value of \(u(x_i,y_j,t)\), with \(i,j = 0,1,\ldots ,N+1\). Then, for \(i,j = 1,2,\ldots ,N\), we obtain

*T*], in

*M*equidistant parts of length

*P*. Indeed, by the incompressibility assumption (1)\(_4\), the pressure

*P*satisfies, at any time

*t*, the equation

We observe that, as regards the boundary conditions, we will always use the Dirichlet ones in the remainder of the treatise, that will give the values of the unknown functions when \((i,j)=(0,0)\) and \((i,j)=(N+1,N+1)\).

## 4 Consistency and Stability Analysis

In this section, we want to analyze the consistency and stability of the numerical scheme introduced in the previous section. For the sake of clarity, here we distinguish the contribution to the global error arising from the spatial discretization and that coming from the time discretization. We observe that our analysis is given for problems having sufficient regularity in order to make the application of Taylor series arguments possible.

### 4.1 Analysis of the Spatial Discretization

*x*,

*y*) of the grid gives

*h*and neglected the time dependence for the sake of brevity. Expanding \(c(x+h,y)\), \(c(x-h,y)\), \(c(x,y+h)\) and \(c(x,y-h)\) in Taylor series around (

*x*,

*y*) and collecting the resulting expressions in (13), we obtain

### 4.2 Analysis of the Time Discretization

*f*of (14) in the form

*f*. In this regard, following the lines drawn in [4, 6, 7, 22], the following result holds.

### Theorem 2

*I*the identity matrix and \(F_{max}\) an upper bound for the norm of the gradient of

*F*.

### Proof

*k*, denoted by \(\widetilde{W}^{(k)}\), that is,

## 5 Simulations and Numerical Results

*S*to be the identity. Moreover, we have supposed the vector field

*u*to be null at the initial time and we have considered the following initial data for the function

*n*,

*c*and

*P*:

## 6 GPU Programming and Performance Evaluation

In this section we describe the basic logical steps required to implement the GPU codes and also the performance evaluation metrics used to evaluate the computing performance. As discussed in Sect. 3, the numerical scheme relies on a finite differences method for the spatial discretization and a time integration based on the explicit Euler method. From the programming point of view this mathematical approach leads to design a code where a *for loop* defines the clock time steps and where spatial values at each time iteration are updated in parallel by the GPU using the aforementioned numerical scheme.

This basic idea is to use the CPU (host) as owner of time clock activities and the GPU (device) as the owner of the massive computing activities related to the spatial part of the equations. This will lead to a *master-slave* model in which the CPU is the master because it controls the parallel executions on GPU and, therefore, the GPU works only on the spatial part of our scheme. The implemented code employs only the *global memory* in the CUDA kernel codes, while further optimizations related to the implentation of a code able to use the *shared memory* and/or CUDA dynamic parallelism [12, 16, 17] able to reduce the data transfer activity between host and device will be subjects of a further work.

- 1.
the CPU (host) loads the initial data from the its memory to the GPU (device) memory, global memory;

- 2.
the GPU provides the massive computing activities, that is, the GPU has to execute the code related to spatial discretization because it is the parallelizable part of the code since, at each time step, the values referring to the current step only depends on those already computed in the previous one;

- 3.
the GPU sends back to the CPU the partial/final results;

- 4.
the CPU checks the time step and according to the maximum time value defined by the user restarts/stops the parallel computing process.

We have executed the code on two distinct architectures. The first has the following specifications: HP DL 585 G7 PROLIANT, with processors 4x AMD 6128 (8 core), with clock’s frequency 2.0 GHz and RAM 64 GB, in which a GTX GeForce 1080, 8 GB RAM, is integrated. The GPU is the only difference between this architecture and the second one. Here, there are 3x GeForce GTX 670, 4 GB RAM. The operative system used is Linux CentOS 6.5. Finally, we have compiled the serial code with gcc 4.4.7 and the CUDA-C code with CUDA 9.1 in the first machine and CUDA 8.0 in the second one. In order to evaluate the performances of the two machines, it is very reasonable to compute the number of floating point operations executed for unity of time on GPUs, as a function of the dimension of the grid. Therefore, for any fixed size of the grid, if *n* is number of floating point operations and *T* is the CUDA code execution time on the GPUs, we have computed the number \(f_{op} =n/T\) of floating point operations for unity of time (seconds).

*nvprof*tool.

Computation times, in seconds, for the serial execution, parallel execution on GTX GeForce 1080, parallel execution on GTX GeForce 670 and parallel execution by using shared memory with openMP with different number of threads.

Dim | Serial kernel | GTX Force 1080 | GeForce GTX 670 |
---|---|---|---|

32 | 1.156 | 13.920 \(\times \, 10^{-6 }\) | 18.677 \(\times \, 10^{-3}\) |

64 | 4.134 | 25.120 \(\times \, 10^{-6}\) | 21.203 \(\times \, 10^{-3}\) |

128 | 18.629 | 124.06 \(\times \, 10^{-6}\) | 50.671 \(\times \, 10^{-3}\) |

256 | 75.635 | 741.16 \(\times \, 10^{-6}\) | 183.45 \(\times \, 10^{-3}\) |

Dim | OpenMP(8) | OpenMP(16) | OpenMP(32) |
---|---|---|---|

32 | 0.524 | 0.686 | 6.918 |

64 | 1.708 | 1.637 | 1.487 |

128 | 11.565 | 9.455 | 9.547 |

256 | 46.065 | 38.635 | 41.320 |

*shared memory*architecture with different number of threads, We report the corresponding graphs in Figs. 2. In particular, we can observe a good scaling of the code moving from the CPU technology to the GPU with the GTX GeForce 670 and GTX GeForce 1080 that provide reduced execution times.

## 7 Conclusions and Future Works

We have developed a parallel numerical scheme, implemented on GPU, to compute the solution of a chemotaxis system. We have made use of the central finite differences to approximate the spatial derivatives and of the explicit Euler method to discretize the time evolution of the system. We have analyzed accuracy and stability issues, implemented the code on CPU and GPU architectures and compared their performances in terms of time execution getting a good scalability for the GPU implentation. For the GPU kernel design, we have used the global memory and implemented a master-slave model, in which the CPU controls the time evolution while the GPU works exclusively on the spatial derivatives of our scheme. Future issues of this research are oriented to providing a 3D model with a deep optimized CUDA kernel code implemented by using dynamic parallelism and shared memory.

## References

- 1.Aissa, M., Verstraete, T., Vuik, C.: Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured meshes. Comput. Math. Appl.
**74**(1), 201–217 (2017)MathSciNetCrossRefGoogle Scholar - 2.Boyer, F., Fabrie, P.: Mathematical Tools for the Study of the Incompressible Navier-Stokes Equations and Related Models. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-5975-0CrossRefzbMATHGoogle Scholar
- 3.Cao, X.: Global classical solutions in chemotaxis(-Navier)-Stokes system with rotational flux term. J. Differ. Equations
**261**(12), 6883–6914 (2016)MathSciNetCrossRefGoogle Scholar - 4.Cardone, A., D’Ambrosio, R., Paternoster, B.: Exponentially fitted IMEX methods for advection-diffusion problems. J. Comput. Appl. Math.
**316**, 100–108 (2017)MathSciNetCrossRefGoogle Scholar - 5.Conte, D., D’Ambrosio, R., Paternoster, B.: GPU acceleration of waveform relaxation methods for large differential systems. Numer. Algorithms
**71**(2), 293–310 (2016)MathSciNetCrossRefGoogle Scholar - 6.D’Ambrosio, R., Moccaldi, M., Paternoster, B.: Adapted numerical methods for advection-reaction-diffusion problems generating periodic wavefronts. Comput. Math. Appl.
**74**(5), 1029–1042 (2017)MathSciNetCrossRefGoogle Scholar - 7.D’Ambrosio, R., Moccaldi, M., Paternoster, B.: Parameter estimation in IMEX-trigonometrically fitted methods for the numerical solution of reaction-diffusion problems. Comput. Phys. Commun.
**226**, 55–66 (2018)MathSciNetCrossRefGoogle Scholar - 8.D’Ambrosio, R., Paternoster, B.: Numerical solution of reaction-diffusion systems of lambda-omega type by trigonometrically fitted methods. J. Comput. Appl. Math.
**294**(C), 436–445 (2016)MathSciNetCrossRefGoogle Scholar - 9.D’Ambrosio, R., Paternoster, B.: Numerical solution of a diffusion problem by exponentially fitted finite difference methods. SpringerPlus
**3**(1), 1–7 (2014). https://doi.org/10.1186/2193-1801-3-425CrossRefGoogle Scholar - 10.de Oliveira, S., Rosowski, E.E., Huttenlocher, A.: Neutrophil migration in infection and wound repair: going forward in reverse. Nat. Rev. Immunol.
**16**(6), 378–391 (2016)CrossRefGoogle Scholar - 11.Di Francesco, M., Donatelli, D.: Singular convergence of nonlinear hyperbolic chemotaxis systems to Keller-Segel type models. Discrete Contin. Dyn. Syst. Ser. B
**13**(1), 79–100 (2010)MathSciNetzbMATHGoogle Scholar - 12.Kirk, D.B., Hwu, W.M.W.: Programming Massively Parallel Processors: A Hands-on Approach, (third ed.). Morgan Kaufmann Publishers Inc., San Francisco (2016)Google Scholar
- 13.Magee, D.J., Niemeyer, K.E.: Accelerating solutions of one-dimensional unsteady PDEs with GPU-based swept time-space decomposition. J. Comput. Phys.
**357**, 338–352 (2018)MathSciNetCrossRefGoogle Scholar - 14.Málaga, C., Minzoni, A.A., Plaza, R.G., Simeoni, C.: A chemotactic model for interaction of antagonistic microflora colonies: front asymptotics and numerical simulations. Stud. Appl. Math.
**130**(3), 264–294 (2013)MathSciNetCrossRefGoogle Scholar - 15.Nvidia CUDA C Programming Guide, Version 9.1, NVIDIA CorporationGoogle Scholar
- 16.Nvidia TechBrief Dynamic Parallelism in CUDAGoogle Scholar
- 17.Pera, D.: Parallel numerical simulations of anisotropic and heterogeneous diffusion equations with GPGPU, PhD Thesis (2013)Google Scholar
- 18.Pera, D., Málaga, C., Simeoni, C., Plaza, R.G.: On the efficient numerical simulation of heterogeneous anisotropic diffusion models for tumor invasion using GPUs. Rend. Mat. Appl.
**7**(40), 233–255 (2019)MathSciNetzbMATHGoogle Scholar - 19.Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley Professional, Boston (2010)Google Scholar
- 20.Schiesser, W.E.: The Numerical Method of Lines: Integration of Partial Differential Equations. Academic Press, San Diego (1991)zbMATHGoogle Scholar
- 21.Schiesser, W.E., Griffiths, G.W.: A Compendium of Partial Differential Equation Models: Method of Lines Analysis with Matlab. Cambridge University Press, Cambridge (2009)CrossRefGoogle Scholar
- 22.Smith, G.D.: Numerical solution of partial differential equations: Finite Difference Methods. Clarendon Press, Oxford (1985)Google Scholar
- 23.Stuelten, C.H., Parent, C.A., Montell, D.J.: Cell motility in cancer invasion and metastasis: insights from simple model organisms. Nat. Rev. Cancer
**18**(5), 296–312 (2018)CrossRefGoogle Scholar - 24.Tuval, I., Cisneros, L., Dombrowski, C., Wolgemuth, C.W., Kessler, J.O., Goldstein, R.E.: Bacterial swimming and oxygen transport near contact lines. Proc. Natl. Acad. Sci. U.S.A.
**102**(7), 2277–2282 (2005)CrossRefGoogle Scholar