Abstract
The family of PDEconstrained Large Deformation Diffeomorphic Metric Mapping (LDDMM) methods is emerging as a particularly interesting approach for physically meaningful diffeomorphic transformations. The original combination of Gauss–Newton–Krylov optimization and Runge–Kutta integration shows excellent numerical accuracy and fast convergence rate. However, its most significant limitation is the huge computational complexity, hindering its extensive use in Computational Anatomy applied studies. This limitation has been treated independently by the problem formulation in the space of bandlimited vector fields and semiLagrangian integration. The purpose of this work is to combine both in three variants of bandlimited PDEconstrained LDDMM for further increasing their computational efficiency. The accuracy of the resulting methods is evaluated extensively. For all the variants, the proposed combined approach shows a significant increment of the computational efficiency. In addition, the variant based on the deformation state equation is positioned consistently as the best performing method across all the evaluation frameworks in terms of accuracy and efficiency.
Introduction
Computational Anatomy is a powerful interdisciplinary field for the analysis of anatomical shape variability [19, 20]. This discipline is based on Sir D’Arcy Thompson’s original ideas for explaining the similarity of the anatomical shape of homologous species using the transformations existing between the anatomical structures [31]. In Computational Anatomy, shape similarity is measured from the diffeomorphic transformations estimated between the anatomies. These transformations yield a generative model for the analysis of shape variability. Diffeomorphisms are computed from the anatomical images using diffeomorphic registration methods [34].
There exists a vast literature on diffeomorphic registration methods with differences in the transformation characterization, regularizers, image similarity metrics, optimization methods, and additional constraints [29]. Although the differentiability and invertibility of the transformations constitute crucial features for Computational Anatomy applications, the diffeomorphic constraint does not necessarily guarantee that a transformation computed with a given method is physically meaningful for the clinical domain of interest. PDEconstrained Large Deformation Diffeomorphic Metric Mapping (PDELDDMM) has become relevant in the last decade for the computation of transformations under plausible physical models of interest [1, 7, 10, 13, 14, 18, 32, 33, 36].
Our work focuses on the family of PDELDDMM methods pioneered by Hart et al. [7] and leading to the relevant contributions in [9, 13, 14, 17, 22]. In this family of methods, the registration problem is approached from an optimal control perspective, where different physical models are imposed directly using the physical PDEs that are attached to the LDDMM variational problem using hard constraints. The numerical optimization is approached using gradient descent [7, 13, 22] or secondorder optimization in the form of inexact reduced Newton–Krylov methods [9, 13, 14, 17]. In particular, the combination of Gauss–Newton–Krylov for optimization, with sophisticated multilevel preconditioners, spectral methods for differentiation, and Runge–Kutta schemes for PDE integration, shows excellent numerical accuracy and an extraordinarily fast convergence rate. However, the most significant limitation of Gauss–Newton–Krylov PDELDDMM is the huge computational complexity, which hinders the extensive use in Computational Anatomy applied studies. This computational complexity is due to:

1.
The formulation of the problem in the spatial domain.

2.
The large time sampling needed for the stability of Runge–Kutta integration.
Both issues have been treated independently in the literature yielding to PDELDDMM methods with increased efficiency and an assumable cost in accuracy loss.
Computational Complexity due to Problem Formulation
The computational complexity due to the formulation of the problem in the spatial domain has been successfully reduced using the bandlimited vector field parameterization proposed in [35, 36]. LDDMM methods, and in particular PDELDDMM, involve the action of lowpass filters in the optimization update equations of the velocities. Therefore, the computation of the highfrequency components of highresolution velocity fields can be omitted since these computations result equal or nearly equal to zero by the action of the lowpass filters. The bandlimited vector field parameterization allows a reduction in the dimensionality of the problem that circumvents the highfrequency computations.
The works in [8, 9] formulate three different variants of PDELDDMM in the space of bandlimited vector fields and perform the computations in the GPU. Some configurations of these variants have been really successful, greatly outperforming the stateoftheart methods in terms of computational complexity while keeping a competitive accuracy.
Computational Complexity due to PDE Integration
Runge–Kutta methods are explicit techniques. Hence, they are only conditionally stable. This means that the time sampling should be selected enough to preserve the Courant–Friedrichs–Lewy (CFL) condition. For PDELDDMM, the time sampling values that guarantee stability are usually large. As a result, the time and memory requirements of the problem are considerably increased. In particular, the memory requirements of PDELDDMM are increased to limits that hinder the execution on limited memory devices such as the GPU. In addition, one can experience that the time sampling needed for the nonstationary parameterization is much higher than for the stationary parameterization, increasing the complexity of an already not particularly memory efficient configuration. On the other side, when stability is satisfied, the accuracy of PDELDDMM is high [8, 9, 13, 14].
SemiLagrangian methods are semiimplicit techniques that are unconditionally stable. Therefore, the time sampling can be selected according to accuracy rather than stability considerations. SemiLagrangian methods were originally proposed in the 90’s in the context of modeling weather predictions [30]. In the context of diffeomorphic registration, the original LDDMM method proposed in [3] already used semiLagrangian integration for the solution of the transport equation. The combination of semiLagrangian integration with Runge–Kutta has been recently proposed for solving some timedependent PDEs. Runge–Kutta has shown to increase the accuracy of firstorder schemes in semiLagrangian integration [6].
The computational complexity in [13, 14] due to the use of Runge–Kutta schemes for PDE integration has been successfully reduced using semiLagrangian Runge–Kutta integration [15, 17] for the stationary parameterization of diffeomorphisms. For PDELDDMM, the selected time sampling is usually much smaller than the time sampling typically selected with explicit schemes, yielding to a considerable reduction in the computational complexity of the problem. On the other hand, the expected accuracy of PDELDDMM is lower than with explicit schemes.
Beyond the computational complexity improvement through numerical schemes, Mang et al. proposed an efficient implementation of PDELDDMM that exploits massive CPU based parallel computing architectures [16]. The source code has been recently released with [17]. A GPU optimized implementation of the method is being proposed in the ArXiv paper [4].
Our Contribution
The purpose of this work is to further increase the computational efficiency of BL PDELDDMM by combining the two independent methodological approaches of circumventing the huge computational complexity of PDELDDMM and to extensively analyze the accuracy of the resulting methods. We have implemented the bandlimited methods in [8, 9] with the semiLagrangian Runge–Kutta integration scheme originally proposed in [15] for the stationary and the nonstationary parameterization of diffeomorphisms. The resulting methods have been evaluated in five different datasets following the evaluation frameworks in [9, 12, 25]. To our knowledge, this is the first time that semiLagrangian Runge–Kutta integration is implemented in the space of bandlimited vector fields. It is also the first time that semiLagrangian Runge–Kutta integration is used in PDELDDMM with the nonstationary parameterization. Moreover, our work first provides the position achieved by benchmark PDELDDMM methods in the ranking of Klein et al. evaluation. The best performing method of our work coincides with the best performing variant in [9], PDELDDMM based on the deformation state equation. The semiLagrangian Runge–Kutta scheme proposed in this work has shown to outperform the Runge–Kutta scheme in [9] in terms of computational efficiency and accuracy. Indeed, the best performing PDELDDMM variant in this work has recently reached the highest sensitivity (97% vs a baseline of 88%) in the classification of stable versus progressive mild cognitive impaired conversors in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database using convolutional neural networks [23].
Manuscript Organization
In the following, Sect. 2 reviews the foundations of PDELDDMM, with particular emphasis on the bandlimited vector field parameterization. Section 3 presents the proposed semiLagrangian Runge–Kutta integration method. Next, Sect. 4 details the experimental setup. Section 5 shows the results and Sect. 6 discusses the most important highlights. Finally, Sect. 7 gathers the most remarkable conclusions of our work.
PDEConstrained LDDMM Methods
Parameterization in the Spatial Domain
Let \(\varOmega \subseteq \mathbb {R}^d\) be the image domain. Let \(\hbox {Diff}(\varOmega )\) be the LDDMM Riemannian manifold of diffeomorphisms and V the tangent space at the identity element. \(\hbox {Diff}(\varOmega )\) is a Lie group, and V is the corresponding Lie algebra [3]. The Riemannian metric of \(\hbox {Diff}(\varOmega )\) is defined from the scalar product in V
where \(L\) is the invertible selfadjoint differential operator associated with the differential structure of \(\hbox {Diff}(\varOmega )\). In traditional LDDMM methods, \(L= (Id  \alpha \varDelta )^s, \alpha >0, s \in \mathbb {N}\) [3]. This is the operator used in this work.
Let \(I_0\) and \(I_1\) be the source and the target images. LDDMM is formulated from the minimization of the variational problem
The LDDMM variational problem [3] is posed in the space of timevarying smooth flows of velocity fields, \({v} \in L^2([0,1],V)\). Given the smooth flow \({v}:[0,1] \rightarrow V\), \(v_t:\varOmega \rightarrow \mathbb {R}^{d}\), the solution at time \(t=1\) to the evolution equation
with initial condition \((\phi _0^{v})^{1} = id\) is a diffeomorphism, \((\phi ^{v}_{1})^{1} \in \hbox {Diff}(\varOmega )\). The transformation \((\phi ^{v}_{1})^{1}\), computed from the minimum of E(v), is the diffeomorphism that solves the LDDMM registration problem between \(I_0\) and \(I_1\). The problem can be straightforwardly restricted to the space of steady flows of velocity fields [11].
LDDMM can be formulated from a dynamical systems point of view. Thus, PDELDDMM arises from LDDMM as an optimal control approach to diffeomorphic registration [7, 13]. PDELDDMM is formulated from the minimization of the PDEconstrained variational problem
subject to the state equation with state variable m(t)
with initial condition \(m(0) = I_0\).
Using optimal control terminology, the flow v is the control of the dynamical system. The system dynamics are driven by the state equation which determines the evolution of the state variable m of the dynamical system. The minimization of Eq. 4 aims at finding the optimal control subject to the system dynamics on the initial state \(I_0\).
The state equation constraint in Eq. 5 can be imposed in two more different manners, yielding three different variants of PDELDDMM [9]. Variant I corresponds with the variational formulation presented above and proposed in [7, 13]. The second variant (Variant II) is formulated from the minimization of Eq. 4, where
and \(\phi \) is computed from the deformation state equation
with initial condition \(\phi (0)=id\). The third variant (Variant III) is formulated from the minimization of Eq. 4 subject to the deformation state equation, Eq. 7. It should be noticed that \(\phi (t)\) was used in Hart et al. [7] for referring to Beg et al. [3] diffeomorphism path \(\phi _{t,0}\), which corresponds to \((\phi _t^{v})^{1}\) in Eqs. 2 and 3.
The advantage of the optimal control approach is that the complex dependence between \(I_0 \circ (\phi ^{v}_{1})^{1}\) and v is removed from E(v) and translated to the system dynamics. Thus, the original LDDMM variational formulation is transformed into a PDEconstrained formulation. The optimization is approached using the adjoint method, where the computation of the optimality conditions of the system is performed using the method of Lagrange multipliers, yielding a set of state and adjoint differential equations. The solutions to these equations arise in the expressions of the gradient and the Hessian of the augmented energy used for the update of the control variable. Indeed, the optimal control approach allows imposing different system dynamics in the set of state equations providing a straightforward approach to obtain different families of physically meaningful diffeomorphisms [13, 14].
The best optimization method from among the algorithms tested for PDELDDMM is Gauss–Newton–Krylov [9, 13]. The expressions of the gradient and the Hessian vector product are derived from the augmented Lagrangian of the energy functional subject to the state or the deformation state equations, respectively. The expressions of the augmented Lagrangian, the gradient \(\nabla _v E_{\mathrm{aug}}(v)\), and the Hessian vector product \(H_v E_{\mathrm{aug}}( v ) \delta v\) for each variant are found in appendix.
The update equation has the form
where \(\epsilon \) is the update length and \(\delta v^n\) is computed from preconditioned conjugate gradient (PCG) on the system
with preconditioner \(L^{1}\).
Parameterization in the Space of BandLimited Vector Fields
Let \(\widetilde{\varOmega }\) be the discrete Fourier domain truncated with frequency bounds \(K_1,\) \(\dots ,\) \(K_d\). We denote with \(\widetilde{V}\) the space of discretized bandlimited vector fields on \(\varOmega \) with these frequency bounds. The elements in \(\widetilde{V}\) are represented in the Fourier domain as \(\tilde{v}: \widetilde{\varOmega } \rightarrow \mathbb {C}^d\), \(\tilde{v}(k_1, \dots , k_d)\). The application \(\iota :\widetilde{V} \rightarrow V\) denotes the natural inclusion mapping of \(\widetilde{V}\) in V. The application \(\pi : V \rightarrow \widetilde{V}\) denotes the projection of V onto \(\widetilde{V}\) [35, 36].
The space \(\widetilde{V}\) of bandlimited vector fields has a finitedimensional Lie algebra structure using the truncated convolution \(\star \) in the definition of the Lie bracket [36]. We denote with \(\hbox {Diff}(\widetilde{\varOmega })\) to the finitedimensional Riemannian manifold of diffeomorphisms on \(\widetilde{\varOmega }\) with corresponding Lie algebra \(\widetilde{V}\). The Riemannian metric in \(\hbox {Diff}(\widetilde{\varOmega })\) is defined from the scalar product
where \(\tilde{L}\) is the projection of operator \(L\) in the truncated Fourier domain. Similarly, we will denote with \(\tilde{*}\) to the projection in the truncated Fourier domain of the differential operators \(*\) involved in the differential equations.
The bandlimited PDEconstrained variational problem is given by the minimization of
The bandlimited version of Variant I is formulated from the minimization of Eq. 11 subject to
with initial condition \(m(0) = I_0\). For Variant II, the diffeomorphism is computed from \(\phi (t) = id  \iota (\tilde{u})(t)\) where \(\tilde{u}(t)\) is computed from the deformation state equation formulated in displacement field form
Variant III is formulated analogously to the spatial case from the minimization of Eq. 11 subject to the deformation state equation, Eq. 13 [8, 9].
The optimization is approached using Gauss–Newton–Krylov methods in \(\widetilde{V}\) with preconditioner \(\tilde{L}^{1}\). The update equation has the form
where \(\delta \tilde{v}^n\) is computed from
In the next section, we provide the expressions of the gradient and the Hessian for each variant.
BL PDEConstrained LDDMM Equations
Original BL PDEConstrained LDDMM (Variant I)
Originally proposed BL PDELDDMM uses the state equation in the augmented Lagrangian for the derivation of the state and adjoint equations and their incremental counterparts [8]
where \(\lambda \) and \(\eta \) are the Lagrangian multipliers associated with the state equation (Eq. 12) and its initial condition.
The gradient and the Gauss–Newton approximation of the Hessian vector product are computed from the first and secondorder optimality conditions, derived from vanishing the formal computations of \(\delta E_\text {aug}\) and \(\delta ^2 E_\text {aug}\)
where the projected state variable \(\tilde{m}\) and the projected adjoint variable \(\tilde{\lambda }\) are computed from \(\pi (m)\) and \(\pi (\lambda )\), and m and \(\lambda \) are computed from
The incremental counterparts \(\delta \tilde{m}\) and \(\delta \tilde{\lambda }\) are the solutions of
in the BL domain. The initial conditions are, respectively, \(m(0) = I_0\), \(\lambda (1) = \frac{2}{\sigma ^2}(m(1)I_1)\), \(\delta \tilde{m}(0) = 0\), \(\delta \tilde{\lambda }(1) = \frac{2}{\sigma ^2} \delta \tilde{m}(1)\).
Algorithm 1 shows the pseudocode for Variant I.
BL PDEConstrained LDDMM Based on the State Equation (Variant II)
This method departs from the original BL PDELDDMM by using \(m(t) = I_0 \circ \phi (t)\) and \(\lambda (t) = J(t) \lambda (1) \circ \psi (t)\), where \(\phi \) is the direct map, \(\psi \) is the inverse map, and J is the Jacobian determinant of \(\psi \) [7, 9]. The transformations \(\phi \) and \(\psi \) and the scalar field J are computed from the inclusion of the truncated displacement fields (\(\tilde{u}(t)\) and \(\tilde{\tau }(t)\)) and the corresponding Jacobian
where
with initial conditions \(\tilde{u}(0)= 0\), \(\tilde{\tau }(1) = 0\), and \(\tilde{U}(1) = 0\).
The incremental state and adjoint variables are computed from the differential of the expressions given for m(t) and \(\lambda (t)\)
where the precedence of \(\nabla \), \(\delta \), and \(\circ \) operators reads \(\nabla I_0 \circ \phi (t) = (\nabla I_0) \circ \phi (t)\) and \(\delta \tilde{\lambda }(1) \circ \psi (t) = (\delta \tilde{\lambda }(1)) \circ \psi (t)\). The incremental expressions \(\delta \tilde{u}\) and \(\delta \tilde{\tau }\) are computed from the differentiation of Eqs. 26 and 27, yielding
with initial conditions \(\delta \tilde{u}(0) = 0\) and \(\delta \tilde{\tau }(1) = 0\).
Algorithm 1 also shows the pseudocode for Variant II, which shares the steps with Variant I.
BL PDEConstrained LDDMM Based on the Deformation State Equation (Variant III)
The third method is formulated from the minimization of Eq. 11 subject to the truncated displacement state equation (Eq. 13) [9]
In this variant, the augmented Lagrangian is given by
where \(\tilde{\rho }\) and \(\tilde{\mu }\) are the Lagrangian multipliers associated with the BL deformation state equation (Eq. 33) and its initial condition.
The gradient and the Hessian vector product, computed from the first and secondorder optimality conditions, are given by the equations
where the displacement state variable \(\tilde{u}\), the adjoint variable \(\tilde{\rho }\), and their incremental counterparts \(\delta \tilde{u}\) and \(\delta \tilde{\rho }\) are computed from
with initial conditions \(\tilde{u}(0) = 0\), \(\tilde{\rho }(1)= \pi ( \frac{2}{\sigma ^2} (m(1)I_1) \nabla m(1))\), \(\delta \tilde{u}(0) = 0\), and \(\delta \tilde{\rho }(1) = \pi ( \frac{2}{\sigma ^2} \delta m(1) \nabla m(1))\).
Algorithm 2 shows the pseudocode for Variant III.
SemiLagrangian Runge–Kutta Integration
SemiLagrangian Integration in a Spatial Domain
SemiLagrangian (SL) integration methods [30] allow solving transport equations of the general form
where \(u: \varOmega ^d \times [0,1] \rightarrow \mathbb {R}\) is a scalar or a vector function varying in time, and
SL methods combine the most interesting properties of Eulerian and Lagrangian schemes. On the one hand, SL methods involve following the characteristic lines of the differential equation, similarly to Lagrangian approaches. On the other hand, the equation is solved on the regular grid, similarly to Eulerian approaches. As a result, SL methods are unconditionally stable as Lagrangian schemes. This means that the time sampling can be selected according to accuracy considerations rather than stability considerations. SL methods allow selecting a number of time steps usually much smaller than Eulerian methods yielding a sensible reduction in the computational complexity.
SL schemes involve two steps. First, the departure points are computed solving the characteristic equation
with initial condition \(X(0) = x\). The direction of the time integration can be forward or backward, depending on the direction of the time integration of the transport equation. From several methods proposed in the literature for solving the characteristic equation, we use the approach given by Mang et al. in [15]
Second, the transport equation (Eq. 41) is solved in the Eulerian grid
along the characteristic line X. The use of Runge–Kutta (RK) integration has been recently proposed in this step, yielding a higherorder accurate SLRK method [6]. The velocity field needs to be estimated at points that do not belong to the Eulerian grid. Therefore, an interpolator is needed. Cubic interpolation is the method of choice for SL schemes [24].
SemiLagrangian Integration in a BandLimited Domain
In \(\tilde{\varOmega }\), the transport equations are of the general form
where
The characteristics are computed from
and the transport equation is solved from
SemiLagrangian Runge–Kutta Integration in PDELDDMM
In this work, SLRK integration has been implemented in \(\varOmega \) and \(\widetilde{\varOmega }\) for the spatial and bandlimited versions of the three PDELDDMM variants. To be able to apply SL integration, the differential equations need to be written in the shape of Eqs. 46 or 50, respectively. We focus on the derivation for the BL domain \(\widetilde{\varOmega }\). The derivation for the spatial domain can be performed analogously and it is provided in appendix.
The state equations, the deformation state equations, and their incremental counterparts (Eqs. 19, 26, 21, 31) are already in the shape of Eq. 50 by just moving to the righthand side of the equation a remaining term. For the adjoint and the incremental adjoint equations (Eqs. 20, 38, 22, 40), we use the identity
and move the divergence term to the righthand side of the transformed equation. Table 1 gathers the expressions of the resulting differential equations, needed for the implementation of BL PDELDDMM methods in SL form. For SLRK, the righthand side expressions can be directly plugged into an RK differential solver. Algorithm 3 shows the pseudocode for SLRK integration.
Experimental Setup
In this work, we evaluate the performance of SLRK integration in all the variants of PDELDDMM (see Table 2). The evaluation has been performed consistently with our previous work [8, 9], in order to show the improvement in the proposed integration method over RK integration. In addition, we have performed an extensive evaluation of the most memory efficient stationary methods in the frameworks of Klein et al. [12] and Rohlfing et al. [25] in order to establish the position achieved by PDELDDMM methods in these evaluation rankings. Finally, we show some complementary experiments justifying the selection of SLRK as integration scheme for PDELDDMM.
Datasets
We have used five different databases in our evaluation:
NIREP16 contains 16 skullstripped brain images with the segmentation of 32 gray matter structures. The dimension of the images is 256 \(\times \) 300 \(\times \) 256 with a voxel size of \( 0.7 \times 0.7 \times 0.7\) mm. The acquisition and postprocessing details can be found at the web page (http://www.nirep.org). The most remarkable features of this dataset are the excellent image quality and the ventricle sizes that are usually small. The geometry of the segmentations provides a specially challenging framework for deformable registration evaluation.
LPBA40 contains 40 skullstripped brain images without the cerebellum and the brain stem. LPBA40 is provided with the segmentation of 50 gray matter structures together with the caudate, putamen, and hippocampus. LPBA40 protocols can be found at: http://www.loni.ucla.edu/Protocols/LPBA40.
The image quality in LPBA40 is, overall, acceptable. The variability of the ventricle sizes is high.
IBSR18 contains 18 brain images with the segmentation of 96 cerebral structures. The masks for skullstripping are available with the dataset. In addition, the release IBSR_V2.0 skullstripped NIFTI [25] contains 18 skullstripped brain images with the segmentation of 62 cerebral structures. This dataset provides the segmentation of brain structures of interest for the evaluation of image registration methods. The image quality is low. For example, most of the images show motion artifacts. The variability of the ventricle sizes is high.
CUMC12 contains 12 full brain images with the segmentation of 130 cerebral structures. The masks for skullstripping are available with the dataset. Overall, the image quality is acceptable, although some of the images are noisy. The contrast of the images is low. The variability of the ventricle sizes is high.
MGH10 contains 10 full brain images with the segmentation of 106 cerebral structures. The masks for skullstripping are available with the dataset. Overall, the image quality is acceptable, although some of the images are noisy. The contrast of the images is low. Ventricle sizes are usually all big.
Image Registration Pipeline
The evaluation consistent with our previous work was performed in a subsampled NIREP16 database. The registrations were carried out from the first subject to every other subject in the database, yielding to a total of 15 registrations per method. The subsampled NIREP16 database was obtained from the resampling of the images into volumes of size \(180 \times 210 \times 180\) with a voxel size of \(1.0 \times 1.0 \times 1.0\) mm after the alignment to a common coordinate system using affine transformations. The images were scaled between 0 and 1. The affine alignment and subsampling were performed using the Insight Toolkit (ITK). The PDEconstrained registration methods were executed directly on this dataset. For benchmarking, we run single and multiresolution versions of the SyN version of ANTS diffeomorphic registration [2] with \(L^2\) image similarity (ANTSSSD).
The evaluation in the framework of Klein et al. was performed in NIREP16, LPBA40, IBSR18, CUMC12, and MGH10 databases. The IBSR18, CUMC12, and MGH10 images normalized with respect to the MNI152 space were used as input data. The registrations were carried out from every subject to every other subject in each database yielding to a total of 2328 registrations per method. The evaluation in the framework of Rohlfing et al. was performed in IBSR18 database, with a total of 306 registrations per method. The NIREP16, LPBA40, IBSR18, CUMC12, and MGH10 images were preprocessed similarly to [12]. In the first place, N4 bias field correction and histogram matching were applied to all the images. To perform these preprocessing steps we used the algorithms available in ITK. The images were scaled between 0 and 1. Next, we performed an affine registration between all the image pairs. Instead of using the affine registered images as input of our nonrigid registration methods, we used the affine transformation as input, and it was included in the parameterization of the diffeomorphic transformations.
Subsampled NIREP16 experiments were run on a cluster equipped with one NVidia Titan RTX with 24 GBS of video memory and an Intel Core i7 with 64 GBS of DDR3 RAM. NIREP16, LPBA40, IBSR18, CUMC12, and MGH10 experiments were run on a cluster equipped with four NVidia GeForce GTX 1080 ti with 11 GBS of video memory and an Intel Core i7 with 64 GBS of DDR3 RAM. The codes were developed in the GPU with MATLAB 2017a and Cuda 8.0. Since MATLAB lacks a 3D GPU cubic interpolator, we implemented in a Cuda MEX file the GPU cubic interpolator with prefiltering proposed in [26].
Parameter Configuration
Regularization parameters were selected from a search of the optimal parameters in the registration experiments performed in our previous work [9]. We selected the parameters \(\sigma ^2 = 1.0\), \(\alpha = 0.0025\), and \(s = 2\) and a unitdomain discretization of the image domain \(\varOmega \) [3].
The optimization was run a maximum of 10 iterations with the stopping conditions used in [13]. The maximum number of PCG iterations was selected equal to 5. These parameters were selected as optimal in our previous work since the methods achieved state of the art accuracy at a reasonable amount of time [8].
The experiments were performed with band sizes of \(32 \times 32 \times 32\) for BL PDELDDMM based on the state and on the deformation state equations (Variant II and III), and band sizes of \(40 \times 40 \times 40\) for original BL PDELDDMM (Variant I). This selection was found as optimal for each method in our previous work [8, 9].
For SLRK integration, the number of time steps \(n_t\) was selected equal to 5 for all the methods. For RK integration, \(n_t\) was selected equal to 25 for the BL PDELDDMM based on the state and on the deformation state equations, and 50 for the spatial methods due to stability issues. In the evaluation with LPBA40, IBSR18, CUMC12, and MGH10 datasets, \(n_t=25\) showed stability issues in a considerable number of experiments and it was raised to 50.
ANTSSSD was run with the following parameters for the singleresolution experiments
$synconvergence="[50,1e6,10]",
$synshrinkfactors="1",
and $synsmoothingsigmas="3vox".
For the multiresolution experiments the parameters were set to
$synconvergence="[50x50x50,1e6,10]",
$synshrinkfactors="4x2x1",
and $synsmoothingsigmas="3x2x1vox".
The selection of the number of iterations was in agreement with the number of outer \(\times \) inner iterations used in Gauss–Newton–Krylov optimization.
Results
Subsampled NIREP16 Evaluation Results
Convergence Analysis
Table 3 shows, averaged by the number of experiments, the mean and standard deviation of the total, regularization, and image similarity energies after registration (\(E_\text {total}\), \(E_\text {reg}\), and \(E_\text {img}\)), the relative image similarity error,
and the relative gradient magnitude,
obtained with PDELDDMM in the subsampled NIREP16 dataset. In addition, Table 4 shows the mean and the standard deviation of the extrema of the Jacobian determinant.
Overall, the worstperforming methods show high values for the relative gradient in Table 3, which indicate the stagnation of the convergence. For most of the BL methods, the relative gradient was reduced to average values ranging from 0.02 to 0.04, which means that the optimization was stopped in acceptable energy values. All the Jacobians remained above zero.
Next, we group the analysis of Table 3 by integration scheme, variant, image domain, and diffeomorphism parameterization:

RK versus SLRK. In absolute terms, the \(E_\text {total}\) values obtained with SLRK methods tend to be greater than those achieved by RK integration. Both \(E_\text {reg}\) and \(E_\text {img}\) values contribute to the greater \(E_\text {total}\) values for the SLRK methods. However, in relative terms, the \(\hbox {MSE}_{\mathrm{rel}}\) values achieved by SLRK methods at convergence are close or even improve RK methods. It drives our attention to the bad performance of Variant II for the nonstationary parameterization and RK integration which is indeed improved by SLRK integration.

Variants. The best performing variant, with the best \(E_\text {total}\), \(E_\text {img}\), and \(\hbox {MSE}_{\mathrm{rel}}\) values, is Variant III. This result is persistent for different integration schemes, the spatial or BL parameterization, and the diffeomorphic parameterization.

SP versus BL. Due to the highfrequency suppression property of the BL parameterization, the \(E_\text {reg}\) values are all smaller for the BL methods. The \(\hbox {MSE}_{\mathrm{rel}}\) values obtained with the spatial methods are slightly degraded by the BL methods as expected. The degradation is only shown in some cases.

St. versus NSt. The \(E_\text {reg}\) values for the stationary parameterization are greater than for the nonstationary parameterization. On the one hand, the stationary parameterization yields oneparameter subgroups that are not geodesics due to the nonbiinvariance of the metric. On the other hand, the minimizing \(E_\text {reg}\) property of geodesics is expected for the solutions with the nonstationary parameterization. Therefore, the obtained \(E_\text {reg}\) results are consistent with these two facts. The nonstationary methods do not outperform the stationary methods in a consistent manner. The (out)performance depends on the variant.
Evaluation
The evaluation is based on the accuracy of the registration results for templatebased segmentation. The Dice Similarity Coefficient (DSC) is selected as the evaluation metric. Given two segmentations S and T, the DSC is defined as
This metric provides the value of 1 if S and T exactly overlap and gradually decreases toward 0 depending on the overlap of the two volumes.
Figure 1 shows, in the shape of box and whisker plots, the statistical distribution of the DSC values obtained after the registration across the 32 segmented structures. For the singleresolution experiments, the performance of the benchmark method ANTSSSD was under 50%. For the multiresolution experiments, the average DSC value achieved by ANTSSSD equals to 55.59%. We have selected this value as a baseline of good registration accuracy for methods with \(L^2\)based image similarity.
A great number of PDELDDMM methods showed similar values or even improved ANTSSSD performance. The best performing variant was our PDELDDMM based on the deformation state equation, Variant III (boxes in pink tones). This variant showed similar results for RK and SLRK integration regardless the image domain and diffeomorphism parameterization.
For the variant associated with the PDEconstrained benchmark methods [13, 15], Variant I (boxes in blue tones), RK integration slightly outperformed SLRK integration for the stationary parameterization. For the nonstationary parameterization the median DSC value for RK integration was under the value for SLRK integration. Similarly, RK slightly outperformed SLRK integration for Variants I and II of stationary BL PDELDDMM. On the contrary, SLRK integration greatly outperformed RK integration for the nonstationary parameterization.
Variant II (boxes in green tones) performed similarly to benchmark Variant I for the stationary parameterization for the same integration scheme. However, it is remarkable the low performance achieved by this variant for the nonstationary parameterization and RK integration in both image domains.
Table 5 shows the results of the analysis of variance (ANOVA) for the effects of variant (I, II, III), integration scheme (RK, SLRK), domain (SP, BL), and parameterization (St, NSt) selection on the distribution of the DSC values obtained in the subsampled NIREP16 evaluation experiments. The tests showed statistical significance in all the considered factors except for the domain factor. This means that the accuracy of the methods using the spatial domain is statistically indistinguishable from the accuracy of the corresponding methods using the bandlimited domain. From the analysis for each separated variant of the effects of integration scheme, domain, and parameterization, the tests showed statistical significance for Variants I and II. For the best performing variant (Variant III), no factor showed any statistical significance.
Figure 2 shows the p values of pairwise righttailed Wilcoxon ranksum tests for the assessment of the statistical significance of the difference of medians for the distribution of the DSC values obtained in the registration experiments. The alternative hypothesis is that the median of the first distribution is higher than the median of the second one. For increasing the interpretability of the tests, we have grouped the comparisons into the spatial methods, the bandlimited methods, and Variant III methods. The figure shows statistical significance for the better performance of Variant III methods over ANTSSSD and the combinations of Variants I and II underperforming ANTSSSD. For Variant III, the differences in the distribution of the DSC for the different combinations of integration and parameterization are not statistically significant.
Computational Complexity
The analysis of the memory complexity performed in [9] for RK integration reported an O(TN) for Variant I, and an \(O(TN^d)\) for Variants II and III, where T represents the time sampling selected for PDE integration, N is the size of the discretized domain, and d is the dimensionality of the image domain (3D). The memory complexity analysis holds for SLRK integration. Therefore, it is still expected a reduction in the VRAM usage since the \(n_t\) for SLRK is considerably smaller than the \(n_t\) for RK.
The time complexity for RK integration is \(O(T N \log N)\) for Variant I, and \(O(T N^d \log N)\) for Variants II and III. The time complexity analysis also holds for SLRK integration. Despite the extra computations of the departure points, the cubic interpolation of the righthandside values of the PDEs, and the extra projectioninclusion from \(\widetilde{V}\) to V needed in the BL version of the variants, it is expected a reduction in the computation time for SLRK with respect to RK integration due to the dependence of the complexity with T.
Table 6 shows the VRAM peak memory reached through the computations, and the average and standard deviation of the total computation time in the subsampled NIREP16 experiments. For the spatial methods, SLRK integration achieved a substantial time and memory reduction over RK integration, as expected. The time and memory reduction achieved by SLRK over RK integration was also considerable for the BL parameterized methods. For the stationary parameterization, the BL parameterization further decreased the complexity of spatial SLRK integration methods, as expected. However, SLRK integration did not reduced the VRAM memory usage for the nonstationary parameterization. The total computation time was effectively reduced.
Qualitative Assessment
For a qualitative assessment of the proposed registration methods, we show the registration results obtained by Mang et al. benchmark methods [13, 18] (Variant I), and PDELDDMM based on the deformation state equation (Variant III) in a selected experiment representative of a difficult deformable registration problem. For the nonstationary parameterization the images for a qualitative assessment were similar to the stationary parameterization. Figure 3 shows the warped images, the difference between the warped and the target images after registration, and the velocity field. All the methods provide visually acceptable results.
NIREP16, LPBA40, IBSR18, CUMC12, and MGH10 Evaluation Results
From the evaluation measurements used in Klein et al. framework, we focus on the accuracy of the registration results for templatebased segmentation. Since [12], this has been adopted as a widely extended criterion for nonrigid registration evaluation. From the metrics proposed in [12], we select the Dice Similarity Coefficient (DSC) as evaluation metric. Figures 4, 5 and 6 show the statistical distribution of the DSC values obtained after the registration across the manually segmented structures for the five databases. For the NIREP16 dataset, we show the results obtained with ANTSSSD. For the remaining databases, we include the results reported in [12] for affine registration (FLIRT), and three diffeomorphic registration methods: Diffeomorphic Demons, SyN, and Dartel.
The results from NIREP16 show how PDELDDMM based on the deformation state equation (Variant III) outperformed the other variants of PDELDDMM. The distribution of the method parameterized in the BL domain resulted almost identical to the distribution of the method parameterized in the spatial domain. As happened with subsampled NIREP16 evaluation, ANTSSSD was among the worst performing methods. The bandlimited versions of Variant I and II with RK integration exceeded the maximum VRAM capacity of our GPUs for this dataset.
The results obtained from LPBA40, IBSR18, CUMC12, and MGH10 show that, from the PDELDDMM methods, the best performing method was BL PDELDDMM based on the deformation state equation (Variant III) and SLRK integration. The performance of the method with RK integration was slightly lower. The performance of the spatial versions of Variant I and II and their bandlimited versions was significantly lower in IBSR18, CUMC12, and MGH10 databases. For these methods, RK integration performed slightly better than SLRK integration. These results are consistent with NIREP16 evaluation results.
In IBSR18, CUMC12, and MGH10 databases, our PDELDDMM methods were not able to reach SyN or Dartel performance. This is probably because the image similarity metrics used in these methods (CrossCorrelation and multinomial model, respectively) favor the accuracy in templatebased segmentation. In contrast, PDELDDMM uses SSD, which is known to restrict the performance in templatebased segmentation. However, in LPBA40 databases, our best performing PDELDDMM methods overpass Dartel and almost reached SyN performance, while showing a significantly reduced number of outliers.
Our methods significantly outperformed FLIRT and Diffeomorphic Demons, where the third quartile in the distribution of our best performing method was close to the median of Demons for the four databases. It should be noticed that Diffeomorphic Demons also uses SSD as image similarity metric.
IBSR18 V2.0 Evaluation Results
Figure 7 shows the statistical distribution of the DSC values obtained by our proposed registration methods in the regions of interest of Rohlfing et al. evaluation framework [25]. Consistently with the rest of the evaluation results, the best performing method was BL PDELDDMM based on the deformation state equation (Variant III), which significantly outperformed the others in the great majority of regions.
Some Insights of Runge–Kutta and SemiLagrangian Integration
Finally, we show some interesting experiments justifying the selection of SLRK as integration scheme for PDELDDMM beyond the evaluation based on the accuracy for templatebased segmentation shown in this experimental section. The experiments have been performed with the spatial version of Variant III.
Euler Versus RK Versus SLRK Integration
Euler integration is one of the simplest ODE integration methods. Compared with more sophisticated RK schemes, it is noticeably less accurate. The local truncation error of Euler method is \(h^2\), while for RK method, it is \(h^5\). This does not seem to be a problem for LDDMM methods based on gradient descent. For example, Euler integration has been extensively used even in geodesic methods based on the solution of the EPDiff equation [36]. However, PDELDDMM with Gauss–Newton–Krylov optimization shows convergence problems when combined with Euler method. Table 7 compares the registration results obtained with Euler, RK, and SLRK integration in a selected subsampled NIREP16 experiment. With Euler integration and \(n_t = 10\), the optimization gets stagnated in the initial iterations. For \(n_t = 25\), \(n_t = 30\), and even \(n_t = 50\), PCG detects a definite negative Hessian. With RK integration and \(n_t = 10\), PCG detects a definite negative Hessian. With RK integration and \(n_t = 25\) and \(n_t\) = 30, the method shows an appropriate convergence behavior. With SLRK integration and \(n_t = 5\), the method shows an appropriate convergence behavior. The best performing method is SLRK.
Small into Big Parallelepiped Experiment
Figure 8 shows the results of a simulated experiment consisting in the registration of a small into a big parallelepiped noisy image. This example is provided as test images with Mermaid software (http://mermaid.readthedocs.io).
The \(\hbox {MSE}_{\mathrm{rel}}\) reached by RK integration after the registration was equal to \(18.37\%\) while, for SLRK, it was \(2.77\%\). Both integration schemes do not seem to have problems with noise. In the figure, it can be appreciated that RK integration shows problems to adjust the diffeomorphic warp in the corners of the structure while SLRK integration is much more accurate in these difficult locations. Therefore, in this particular experiment, SLRK integration overpasses RK in accuracy.
Stability Beyond t = 1
In LDDMM and PDELDDMM literature, the time domain is typically selected to be [0, 1]. This domain is discretized in a number of time samplings enough to achieve the stability in the numerical solvers involved in the computations. However, there are applications where it may be of interest to extend the time domain beyond \(t=1\). For example, time extrapolation may be interesting to predict the anatomical evolution of subjects across time beyond the time limits of temporal deformation models [27]. Integrating beyond \(t = 1\) will eventually lead to instabilities of the advected magnitudes. For example, in the deformation state equation, it would mean reaching nondiffeomorphic solutions. This experiment is intended to show the feasibility of using RK and SLRK integration for time extrapolation.
Figures 9 and 10 show the results of composing the source image with the transformations resulting from the integration of the deformation state equation beyond \(t = 1\). For RK integration, we observe that for small extensions of the temporal domain such as \(t = 1.25\), the equation has developed instabilities leading to unacceptable results at \(t = 1.5\). On the contrary, SLRK is able to integrate beyond \(t=1\), reaching \(t=3\) with recognizable warped images. This may be due to the unconditionally stable property of SL schemes. Therefore, SLRK may be appropriate for extrapolation in temporal deformation models.
Discussion
The increase in the computational efficiency achieved by the combined over the split BL and SLRK approaches was significant in terms of computation time and memory. The reduction in the memory requirements allowed us to perform the evaluation of the SLRK PDELDDMM methods extensively, even in the highestresolution level of NIREP16.
In all the evaluation frameworks, BL PDELDDMM based on the deformation state equation with SLRK integration resulted our best performing method. This method achieved an identical DSC distribution compared with RK integration in the NIREP16 database. The method greatly outperformed ANTSSSD in this database. In LPBA40, IBSR18, CUMC12, and MGH10 databases, the method outperformed Diffeomorphic Demons. In addition, the evaluation results in the regions of interest of Rohlfing et al. corroborated its excellent performance.
For Mang et al. benchmark PDELDDMM methods [13, 15], our evaluation results reported a significative loss of accuracy between RK and SLRK integration. Interestingly, this loss of accuracy was not observed for our best performing method.
In IBSR18, CUMC12, and MGH10 databases, our PDELDDMM methods were not able to reach SyN or Dartel performance. This is because SSD image similarity metric restricts the performance of the methods in templatebased segmentation. This problem will be tackled in future work by formulating the PDEconstrained problem with other image similarity metrics such as Normalized CrossCorrelation, local Normalized CrossCorrelation, or Mutual Information [21]. We expect that this change in the formulation of the problem will increase the performance results of PDELDDMM. In addition, it will allow us to apply these methods to other clinical applications involving multimodal registration.
Simultaneously to the development of this work, Mang et al. released Claire software [17]. The software is intended to exploit massive CPU based parallel computing architectures to accelerate the computation time of PDELDDMM. The codes implement original PDELDDMM (Variant I) with a variational extension to nearly incompressible fluids and include \(H^1\) and \(H^2\) regularization terms. The software is restricted to the stationary parameterization of diffeomorphisms. The PDE integration scheme is SLRK. The software includes a sophisticated multilevel preconditioner that shows to improve the convergence of PCG with respect to the original proposal in [13]. The massive computation allows increasing the number of inner and outer iterations and use the norm of the gradient as stopping condition for achieving an extraordinary accuracy at convergence in a simulated experiment.
In contrast with Claire, our BL methods are intended to run completely in the VRAM of commodity GPUs (< 4GBS). We have limited the variational formulation to the one proposed in [13], although it is straightforwardly extendible to the nearly incompressible fluid problem. We have limited our study to the traditional LDDMM regularizer. Our software works for the stationary and the nonstationary parameterization of diffeomorphisms. We have limited the preconditioner to the one proposed in [13] since we are interested on the comparison of the three different variational variants. We used the stopping conditions suggested in [21] and used for PDELDDMM in [8,9,10, 13, 18]. The variety of methods, the extensive evaluation conducted in this work, and our modest hardware capacity hindered us the use of the inner and outer iteration values needed for achieving the stopping conditions based on the norm of the gradient suggested in [17]. In fact, we observed in a selected NIREP experiment that increasing the number of inner iterations in PCG resulted into a faster initial convergence that finally stagnated in greater \(\hbox {MSE}_{\mathrm{rel}}\) values and lower DSC scores than our considered stopping conditions. This stagnation was also reported in [17] for the simulated experiment. Instead, our selected inner and outer values consumed a reasonable amount of time while obtaining state of the art results for the evaluation metrics.
Comparing Claire and our results, we believe that it would be of interest to implement our best performing variant as a part of Claire’s software. In the other direction, it would be very interesting to adopt the multilevel preconditioners in our methods.
With the arise of FlowNet architecture for learning the optical flow in [5], there has been an explosion of methods for nonrigid registration based on deeplearning. These datadriven approaches learn how to build a generative model of deformations from the source and target images. Mermaid (mermaid.readthedocs.io) provides a library of methods where datadriven solutions are based on optimaltransport loss functions, highly related to Variants I and II of PDELDDMM formulation. We believe that Mermaid methods may benefit from the parameterization of the problem in the space of bandlimited vector fields and the semiLagrangian RungeKutta schemes proposed in this work. In the other direction, PDELDDMM may also benefit from the ingredients of the loss functions defined within these datadriven approaches [28].
Conclusions
In this work, we have proposed to combine the two different methodological approaches used to circumventing the huge computational complexity of Gauss–Newton–Krylov PDELDDMM. In particular, we have included semiLagrangian Runge–Kutta integration [15] in the variants of bandlimited PDELDDMM proposed in [8, 9] for further increasing the computational efficiency of these methods. The resulting methods have been extensively evaluated in five different datasets following three different evaluation frameworks. To our knowledge, this is the first time that SLRK integration is implemented in the framework of PDELDDMM for the nonstationary parameterization and in the space of bandlimited vector fields. Moreover, our work first provides the position achieved by PDELDDMM methods in the ranking of Klein et al. evaluation.
This study positions the formulation of BL PDELDDMM based on the deformation state equation and SLRK integration as the best performing among all PDELDDMM methods in terms of accuracy and efficiency. The proposed method has reached the highest sensitivity in the classification of stable versus progressive mild cognitive impaired conversors in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database using convolutional neural networks. This result has been recently published in [23].
In future work, we will extend this formulation to other relevant physically meaningful LDDMM approaches such as the nearly incompressible method in [14], and the geodesic shooting approach in [10]. We will explore the advantages of using the multilevel preconditioner in [17]. We will adapt our methods for the use of alternative image similarity metrics that usually outperform SSD in registration evaluation rankings. We will try to bridge the gap between constrained variational approaches and datadriven solutions based on optimaltransport loss functions. Finally, we will work in the understanding of which of the features of PDELDDMM allow the exceptional classification rates related with Alzheimer’s disease conversion shown in [23].
Abbreviations
 PDE:

Partial differential equation
 LDDMM:

Large deformation diffeomorphic metric mapping
 SP:

Spatial
 BL:

Band limited
 RK:

Runge–Kutta
 SL:

SemiLagrangian
 PCG:

Preconditioned conjugate gradient
 DSC:

Dice similarity coefficient
 SSD:

Sum of squared differences
 CPU:

Central processing unit
 GPU:

Graphics processing unit
 VRAM:

Video random access memory
References
 1.
Ashburner, J., Friston, K.J.: Diffeomorphic registration using geodesic shooting and Gauss–Newton optimisation. Neuroimage 55(3), 954–967 (2011)
 2.
Avants, B.B., Epstein, C.L., Grossman, M., Gee, J.C.: Symmetric diffeomorphic image registration with crosscorrelation: Evaluating automated labeling of elderly and neurodegenerative brain. Med. Image Anal. 12, 26–41 (2008)
 3.
Beg, M.F., Miller, M.I., Trouve, A., Younes, L.: Computing large deformation metric mappings via geodesic flows of diffeomorphisms. Int. J. Comput. Vis. 61(2), 139–157 (2005)
 4.
Brunn, M., Himthani, N., Biros, G., Mehl, M.: Fast gpu 3d diffeomorphic image registration. ArXiv (2020)
 5.
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., v.d. Smagt, P., Cremers, D., Brox, T.: FlowNet: learning optical flow with convolutional networks (2015)
 6.
Guo, D.X.: A SemiLagrangian Runge–Kutta method for timedependent partial differential equations. J. Appl. Anal. Comput. 3(3), 251–263 (2013)
 7.
Hart, G.L., Zach, C., Niethammer, M.: An optimal control approach for deformable registration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09) (2009)
 8.
Hernandez, M.: Bandlimited stokes large deformation diffeomorphic metric mapping. IEEE J. Biomed. Health Inform. 23(1), 362–373 (2019)
 9.
Hernandez, M.: A comparative study of different variants of Newton–Krylov PDEconstrained StokesLDDMM parameterized in the space of bandlimited vector fields. SIAM J. Imaging Sci. 12, 1038–1070 (2019)
 10.
Hernandez, M.: PDEconstrained LDDMM via geodesic shooting and inexact Gauss–Newton–Krylov optimization using the incremental adjoint Jacobi equations. Phys. Med. Biol. 64(2), 025002 (2019)
 11.
Hernandez, M., Bossa, M.N., Olmos, S.: Registration of anatomical images using paths of diffeomorphisms parameterized with stationary vector field flows. Int. J. Comput. Vis. 85(3), 291–306 (2009)
 12.
Klein, A., et al.: Evaluation of 14 nonlinear deformation algorithms applied to human brain MRI registration. Neuroimage 46(3), 786–802 (2009)
 13.
Mang, A., Biros, G.: An inexact Newton–Krylov algorithm for constrained diffeomorphic image registration. SIAM J. Imaging Sci. 8(2), 1030–1069 (2015)
 14.
Mang, A., Biros, G.: Constrained H1 regularization schemes for diffeomorphic image registration. SIAM J. Imaging Sci. 9(3), 1154–1194 (2016)
 15.
Mang, A., Biros, G.: A semiLagrangian twolevel preconditioned Newton–Krylov solver for constrained diffeomorphic image registration. SIAM J. Sci. Comput. 39(6), B1064–B1101 (2017)
 16.
Mang, A., Gholami, A., Biros, G.: Distributedmemory largedeformation diffeomorphic 3D image registration. In: Proceedings of ACM/IEEE Super Computing conference (SC16) (2016)
 17.
Mang, A., Gholami, A., Davatzikos, C., Biros, G.: Claire: a distributedmemory solver for constrained large deformation diffeomorphic image registration. SIAM J. Sci. Comput. 41(5), C548–C584 (2019)
 18.
Mang, A., Ruthotto, L.: A Lagrangian Gauss–Newton–Krylov solver for mass and intensitypreserving diffeomorphic image registration. SIAM J. Sci. Comput. 39(5), B860–B885 (2017)
 19.
Miller, M.I.: Computational anatomy: shape, growth, and atrophy comparison via diffeomorphisms. Neuroimage 23, 19–33 (2004)
 20.
Miller, M.I., Qiu, A.: The emerging discipline of computational functional anatomy. Neuroimage 45(1), 16–39 (2009)
 21.
Modersitzki, J.: FAIR: Flexible Algorithms for Image Registration. SIAM, Philadelphia (2009)
 22.
Polzin, T., Niethammer, M., Heinrich, M.P., Handels, H., Modersitzki, J.: Memory efficient LDDMM for lung CT. In: Proc. of the 19th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI’16), Lecture Notes in Computer Science, pp. 28–36 (2014)
 23.
RamonJulvez, U., Hernandez, M., Mayordomo, E., ADNI: Analysis of the influence of diffeomorphic normalization in the prediction of stable vs progressive MCI conversion with convolutional neural networks. In: Proceedings of the 17th IEEE International Symposium on Biomedical Imaging: From Nano to Macro (ISBI’20) (2020)
 24.
Riishojgaard, L.P., Cohn, S.E., Li, Y., Menard, R.: The use of spline interpolation in semiLagrangian transport models. Mon. Weather Rev. 126(7), 2008–2016 (1998)
 25.
Rohlfing, T.: Image similarity and tissue overlaps as surrogates for image registration accuracy: widely used but unreliable. IEEE Trans. Med. Imaging 31(2), 153–163 (2012)
 26.
Ruijters, D., Thevenaz, P.: GPU prefilter for accurate cubic Bspline interpolation. Comput. J. 55(1), 15–20 (2012)
 27.
Schiratti, J.B., Allassonniere, S., Colliot, O., Durrleman, S.: Learning spatiotemporal trajectories from manifoldvalued longitudinal data. Adv. Neural Inf. Process. Syst. 28, 2404–2412 (2015)
 28.
Shen, Z., Vialard, F.X., Niethammer, M.: Regionspecific diffeomorphic metric mapping. In: Advances in Neural Information Processing Systems (NIPS 2019) (2019)
 29.
Sotiras, A., Davatzikos, C., Paragios, N.: Deformable medical image registration: a survey. IEEE Trans. Med. Imaging 32(7), 1153–1190 (2013)
 30.
Staniforth, A., Cote, J.: SemiLagrangian integration schemes for atmospheric models—a review. Mon. Weather Rev. 119, 2206–2223 (1991)
 31.
Thompson, D.W.: On Growth and Form. Cambridge University Press, Cambridge (1917)
 32.
Vialard, F.X., Risser, L., Rueckert, D., Cotter, C.J.: Diffeomorphic 3D image registration via geodesic shooting using an efficient adjoint calculation. Int. J. Comput. Vis. 97(2), 229–241 (2011)
 33.
Younes, L.: Jacobi fields in groups of diffeomorphisms and applications. Q. Appl. Math. 65, 113–134 (2007)
 34.
Younes, L.: Shapes and Diffeomorphisms. Springer, Berlin (2010)
 35.
Zhang, M., Fletcher, P.T.: Finitedimensional Lie algebras for fast diffeomorphic image registration. In: Proceedings of International Conference on Information Processing and Medical Imaging (IPMI’15), Lecture Notes in Computer Science (2015)
 36.
Zhang, M., Fletcher, T.: Fast diffeomorphic image registration via FourierApproximated Lie algebras. Int. J. Comput. Vis. 127, 61–73 (2018)
Acknowledgements
The author would like to acknowledge the anonymous reviewers for their valuable revision of the manuscript. The author would like to give special thanks to Wen Mei Hwu from the University of Illinois for interesting ideas in the GPU implementation of the methods, and Nacho Navarro and Rosa Badia from the Barcelona Supercomputing Center (BSC) for their help. This work was partially supported by the National Research Grant TIN201680347R (DIAMOND Project), PID2019104358RBI00 (DLAging Project), and Government of Aragon Group Reference \(T64\_20R\) (COS2MOS research group). In addition, this work was supported by NVIDIA through the Polytechnical University of Catalonia/Barcelona Supercomputing Center (UPC/BSC) GPU Center of Excellence.
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was partially supported by the National Research Grants PID2019104358RBI00 (DLAging Project) and TIN201680347R (DIAMOND Project)
Appendix
Appendix
Appendix gathers the expressions of the gradient and the Hessian for the PDELDDMM variants defined in the spatial domain and the method for SLRK integration.
A.1 Original PDEConstrained LDDMM (Variant I)
Let E(v) be the PDEconstrained variational problem given in Eq. 4. Let us define the Lagrange multipliers \(\lambda :\varOmega \times [0,1] \rightarrow \mathbb {R}\) and \(\eta :\varOmega \rightarrow \mathbb {R}\) associated with the state equation (Eq. 5) and its initial condition. The augmented Lagrangian corresponds with the expression
The first and secondorder optimality conditions are derived from the formal computations of
and
The details of the formal derivations can be found in [9].
Since \(\delta E_\text {aug}\) needs to vanish for any dv, dm, and \(d\eta \), we get the necessary firstorder optimality conditions for Variant I. In particular, the expression of the gradient is given by
where m and \(\lambda \) are computed from the state and the adjoint equations
with their corresponding initial conditions \(m(0) = I_0\) and \(\lambda (1) = \frac{2}{\sigma ^2}(m(1)I_1)\).
The necessary secondorder optimality conditions are obtained vanishing \(\delta ^2 E_\text {aug}\) for any dv, dm, and \(d\eta \). Thus, the Gauss–Newton approximation of the Hessian vector product is given by
where \(\delta \lambda \) is computed from the incremental adjoint equation
with initial condition \(\delta \lambda (1) = \frac{2}{\sigma ^2} \delta m(1)\), where \(\delta m(1)\) is computed from the incremental state equation
with initial condition \(\delta m(0) = 0\).
A.2 PDEConstrained LDDMM Based on the State Equation (Variant II)
Variant II consists in replacing the computation of the state and the adjoint variables, m and \(\lambda \), from the solution of the state and adjoint PDEs to the identities \(m(t) = I_0 \circ \phi (t)\) and \(\lambda (t) = J(t) \lambda (1) \circ \psi (t)\), where \(\phi (t)\) is the direct map, \(\psi (t)\) is the inverse map, and J is the Jacobian determinant of \(\psi \). As a result, Variants I and II are two theoretically but not numerically equivalent formulations of the original PDELDDMM problem.
For Variant II, the derivation of the gradient and the Hessian vector product proceeds as for Variant I. However, the computation of the state and adjoint variables is performed using their identities, transferring PDE resolution to the deformation state equation for \(\phi \) and \(\psi \)
with initial condition \(\phi (0) = id\) and \(\psi (1) = id\), and the Jacobian equation for J
with initial condition \(J(1)=1\). The incremental state and adjoint variables are computed from the incremental expression of the identities
and, again, the PDE resolution is transferred to the incremental deformation state equations for \(\delta \phi \) and \(\delta \psi \)
A.3 PDEConstrained LDDMM Based on the Deformation State Equation (Variant III)
For Variant III, the Lagrange multipliers are \(\rho :\varOmega \times [0,1] \rightarrow \mathbb {R}^d\), associated with the deformation state equation (Eq. 7), and \(\mu :\varOmega \rightarrow \mathbb {R}^d\), associated with its initial condition. The augmented Lagrangian corresponds with
The first and secondorder optimality conditions are derived from the formal computations of
and
The necessary first and secondorder optimality conditions are obtained from the need to vanish \(\delta E_\text {aug}\) and \(\delta ^2 E_\text {aug}\) for any dv, \(d\phi \), \(d\rho \), and \(d\mu \), yielding
where \(\phi \) is computed from the deformation state equation, \(\rho \) from the deformation adjoint equation, and \(\delta \rho \) from the incremental deformation adjoint equation
with initial conditions \(\phi (0) = id\), \(\rho (1) = \lambda (1) \nabla m(1)\), \(\delta \rho (1) = \delta \lambda (1) \nabla m(1)\). It should be noticed that the divergence operator acting on tensors operates rowwise.
A.4 SemiLagrangian Runge–Kutta Integration
As we mentioned in Sect. 3, to be able to apply SL integration, the differential equations for different spatial variants need to be written in the shape of Eq. 46. The state equations, the deformation state equations, and their incremental counterparts (Eqs. 57, 62, 61, 67) are already in the shape of Eq. 46 by just moving to the righthand side of the equation a remaining term. For the adjoint and the incremental adjoint equations (Eqs. 58, 74, 60, 75), we use the identity
and move the divergence term to the righthand side the transformed equation. Table 8 gathers the expressions of the resulting differential equations, needed for the implementation of PDELDDMM methods in SL form. For SLRK, the righthand side expressions can be directly plugged into an RK differential solver. Algorithm A1 shows the pseudocode for SLRK integration.
Rights and permissions
About this article
Cite this article
Hernandez, M. Combining the BandLimited Parameterization and SemiLagrangian Runge–Kutta Integration for Efficient PDEConstrained LDDMM. J Math Imaging Vis (2021). https://doi.org/10.1007/s10851021010164
Received:
Accepted:
Published:
Keywords
 Physically meaningful diffeomorphic registration
 PDEconstrained LDDMM
 Gauss–Newton–Krylov
 Optimal control optimization
 Bandlimited vector fields
 SemiLagrangian Runge–Kutta integration