1 Introduction

Point distribution models (PDMs) of target organs play important roles in the segmentation of 3D medical images (e.g. [1, 2]). A PDM that represents statistical variety of the shape of a target organ is constructed from a set of training data and the model is registered to given new images for segmenting the target organ region in each of the images. In this study, employing a PDM for representing the statistical variety of the shapes, the authors focus on the improvement of the generalization performance of the model.

A PDM represents a target organ surface in an image with a set of N points on the surface. Let the three-dimensional coordinates of the i-th point in an image be denoted by \({{\varvec{x}}}_j = [x_j, y_j, z_j]^T\) (\(j=1, 2, \dots , N\)). Then an 3N-vector, \({\varvec{X}} = [{\varvec{x}}_1^T, {\varvec{x}}_2^T,\dots {\varvec{x}}_N]^T\) denotes the location and the whole shape of the target organ. A statistical shape model of the target organ represents the prior probability distribution of these 3N variables \(p({\varvec{X}}) = p({\varvec{x}}_1, {\varvec{x}}_2, \dots , {\varvec{x}}_N)\) and a Gaussian distribution is widely employed implicitly and explicitly for this representation. A Gaussian distribution has two kinds of parameters, the mean \({\varvec{\mu }}\) and the covariance \({\mathbf{\Sigma }}\), and you can represent the prior probability distribution with a Gaussian one by estimating the values of these parameters from a given training data set. You can estimate these parameter values straightforwardly from a set of training data but the estimated values are often degraded by outliers in the data, which are not well described by a Gaussian distribution, and often overfit to the training data when the number of the data is not large enough. These cause poor generalization performance for the resultant model. The objective of this study is not to estimate the true prior probability distribution \({\varvec{X}}\) but to improve the generalization performance.

For example, an active shape model (ASM) [3, 4], which is one of the most popular model used for the target organ segmentation in given medical images, implicitly employs a Gaussian distribution for representing the statistical variety of the shape of a target organ. Let a covariance matrix empirically estimated from a given training data set be denoted by \(\hat{{\mathbf{\Sigma }}}\) and let its eigenvectors and the corresponding eigenvalues be denoted by \({\varvec{u}}_1, {\varvec{u}}_2, \dots \) and \(\lambda _1, \lambda _2, \dots \), respectively, where the eigenvalues are in decreasing order. Then, you can represent the statistical variety as follows:

$$\begin{aligned} X({\varvec{\theta }}) = \hat{{\varvec{\mu }}} + {\mathbf{U}}{\varvec{\theta }}, \end{aligned}$$
(1)

where \(\hat{{\varvec{\mu }}}\) denotes the estimated mean vector, \({\mathbf{U}} = [{\varvec{u}}_1|{\varvec{u}}_2|\dots |{\varvec{u}}_r]\) is a \(3N\times r\) matrix where \(r<3N\), and \({\varvec{\theta }}\) is a r-vector that controls the shape described by the model. X in (1) obeys a Gaussian distribution, of which the mean is \(\hat{\varvec{\mu }}\) and the covariance matrix is \({\mathbf{\Sigma }} = {\mathbf{U}}{\mathbf{\Lambda }} {\mathbf{U}}^T\) where \({\mathbf{\Lambda }} = \mathrm{diag}(\lambda _1, \lambda _2, \dots , \lambda _r)\), when \({\varvec{\theta }}\) obeys a r-dimensional Gaussian distribution with zero mean and unit covariance. When you register the ASM to a given image, you firstly detect N points, \(\tilde{{\varvec{x}}}_j\) (\(j=1, 2, \dots , N\)), from the image that correspond to \({\varvec{x}}_j\) of the model and then estimate \(\hat{{\varvec{\theta }}}\) in (1) so that \(X(\hat{{\varvec{\theta }}})\) fits to the measured points, \(\tilde{{\varvec{X}}} = [\tilde{{\varvec{x}}}_1^T, \tilde{{\varvec{x}}}_2^T, \dots , \tilde{{\varvec{x}}}_N^T]^T\). You can estimate \(\hat{{\varvec{\theta }}}\) by minimizing a cost function, \(C({\varvec{\theta }})\), such that

$$\begin{aligned} C({\varvec{\theta }}) = \lambda \Vert {\varvec{X}}({\varvec{\theta }}) - \tilde{{\varvec{X}}} \Vert ^2 + \Vert {\varvec{\theta }}\Vert ^2. \end{aligned}$$
(2)

The cost function of \({\varvec{\theta }}\) in (2) comes from \(-\log \{p(\tilde{{\varvec{X}}} |{\varvec{\theta }})p({\varvec{\theta }})\}\), where the prior probability distribution, \(p({\varvec{\theta }})\), is represented by the Gaussian with zero mean and unit variance and where the likelihood, \(p(\tilde{{\varvec{X}}} | {\varvec{\theta }})\), is also the Gaussian of which the mean is zero and the variance is \(\lambda ^{-1}I\). It is a MAP estimation to minimize \(C({\varvec{\theta }})\) in (2). You need to estimate the covariance matrix, \(\hat{{\mathbf{\Sigma }}}\), for constructing the model in (1) and the estimated covariance matrix often overfits to training data especially when the number of the data is not large [5]. This overfitting worsens the generalization performance [6].

You have at least two approaches to improve the generalization performance: In one approach, some regularizers or priors are introduced to the estimation of \(p({\varvec{X}})\)[6]. In other approach, a model function of which the generalization performance would be the best is selected from a set of functions prepared in advance. In this study, the latter approach is employed: The proposed method selects an appropriate probability distribution model from q-exponential distribution, which can represent both a Gaussian distribution and a student’s t-distribution, so that the generalization performance is improved. For the model selection, the authors employ an Akaike’s information criterion. The q-exponential distribution and the AIC are described in the next section.

2 Proposed Method

Let the surface of a target organ in the i-th training image be denoted by \(\mathcal{S}^i\) (\(i=1, 2, \dots , M\)) and let the three-dimensional coordinates of the j-th corresponding point generated on \(\mathcal{S}^i\) be denoted by \({\varvec{x}}^i_j\) (\(j=1, 2, \dots , N\)). Given a set of the training data, \(\{{\varvec{x}}^i_j | i = 1, 2, \dots , M; j = 1, 2, \dots , N\}\), we estimate the prior probability distribution, \(p({\varvec{x}}_1, {\varvec{x}}_2, \dots , {\varvec{x}}_N)\), as a model.

2.1 Graphical Model Representation

In this study, the prior probability distribution is represented as shown in (3). You can represent the probability distribution model (3) with a directed graphical model (DGM).

$$\begin{aligned} p({\varvec{x}}_1, {\varvec{x}}_2, \dots , {\varvec{x}}_N) = \prod _j p({\varvec{x}}_j)\prod _{(j,k)\in \mathcal{E}} p({\varvec{x}}_j | {\varvec{x}}_k), \end{aligned}$$
(3)

where each node in the corresponding graph represents \({\varvec{x}}_j\) (\(j=1, 2, \dots , N\)) and \(\mathcal E\) denotes a set of directed edges in the graph. Using the model shown in (3), you can register the model to a given image by computing the posterior probability or by maximizing it. You can of course employ an undirected graphical model for the representation such that

$$\begin{aligned} p({\varvec{x}}_1, {\varvec{x}}_2, \dots , {\varvec{x}}_N) = \prod _j p({\varvec{x}}_j)\prod _{(j,k)\in \mathcal{E}^\prime } p({\varvec{x}}_j, {\varvec{x}}_k), \end{aligned}$$
(4)

where \(\mathcal{E}^\prime \) denotes a set of undirected edges in the corresponding graph. A directed graphical model is often employed when there exist causal relationships between the variables and it looks natural to employ an undirected one for representing a PDM [6, 7] because the points on an organ surface do not have any causality. The authors, though, employ a DGM because it is not straightforward to compute a conditional probability distribution, \(p({\varvec{x}}_j | {\varvec{x}}_k)\), from a simultaneous one, \(p({\varvec{x}}_j, {\varvec{x}}_k)\) in case the simultaneous distribution is represented by a q-exponential distribution. You need to compute \(p({\varvec{x}}_j | {\varvec{x}}_k)\) when you infer the posterior probability distribution on the graphical model by using, e.g. a belief propagation or MCMC. If you employ a q-exponential distribution, the pairwise term, \(p({\varvec{x}}_j, {\varvec{x}}_k)\), in (4) is represented by a six-dimensional q-exponential distribution but the conditional probability distribution, \(p({\varvec{x}}_j | {\varvec{x}}_k)\), computed from the pairwise term is not represented by a q-exponential distribution. When you employ q-exponential distribution, it is not easy to represent the prior with an undirected graphical model as shown in (4).

Each unary term, \(p({\varvec{x}}_j)\) in (3), is estimated from a set of the training data, \(\{{\varvec{x}}_j^i | i = 1, 2, \dots , M\}\). The pairwise term, \(p({\varvec{x}}_j | {\varvec{x}}_k)\), is assumed to satisfy \(p({\varvec{x}}_j | {\varvec{x}}_k) = p({\varvec{x}}_j - {\varvec{x}}_k)\) in this study and is estimated from a set of the data, \(\{{\varvec{d}}_{jk}^i| i = 1, 2, \dots , M ; (j,k)\in \mathcal{E}\}\), where \({\varvec{d}}_{jk}^i = {\varvec{x}}_j^i - {\varvec{x}}_k^i\). The q-exponential distribution, which has three different kinds of parameters, is employed for representing these terms, and the parameters except q are estimated by using an EM algorithm. The last parameter, q, controls the shape of the distribution and is determined based on an AIC.

2.2 q-Exponential Distribution

A d-dimensional q-exponential distribution is represented as follows:

$$\begin{aligned} p_q({\varvec{x}}; {\varvec{\mu }}, {\mathbf{\Sigma }}) = Z \left[ 1 + \frac{1}{\nu }({\varvec{x}} - {\varvec{\mu }})^T{\mathbf{\Sigma }}^{-1}({\varvec{x}} - {\varvec{\mu }}) \right] ^{1/1-q}, \end{aligned}$$
(5)

where

$$\begin{aligned} Z = \frac{\mathrm{\Gamma }((\nu + d)/2)}{(\pi \nu )^{d/2}\mathrm{\Gamma }(\nu /2)|{\mathbf{\Sigma }}|^{1/2}}, \end{aligned}$$

\(\nu = -d - 2/(1-q)\), \({\varvec{\mu }}\) is a d-vector and denotes the mean, and \({\mathbf{\Sigma }}\) is a \({d\times d}\) symmetry matrix and corresponds to the covariance matrix of a Gaussian distribution. The distribution is integrable if \(q < 1 + 2/d\) and converges to a Gaussian distribution when \(q\rightarrow 1\). Figure 1 shows graphs of one-dimensional q-exponential distributions that have identical values of \({\varvec{\mu }}\) and \({\mathbf{\Sigma }}\) and have different values of q. As you can see, the distributions with larger values of q have heavier tails.

Fig. 1.
figure 1

The shapes of the q-exponential distributions with different values of q. The distributions with larger values of q have heavier tails. The distribution is identical with a Gaussian one when \(q=1\).

2.3 EM Algorithm for Parameter Estimation

Fixing the value of q, you can estimate the values of \({\varvec{\mu }}\) and \({\mathbf{\Sigma }}\) that maximize the likelihood from a given training data by using an EM algorithm. The likelihood of \({\varvec{\mu }}\) and \({\mathbf{\Sigma }}\) to a given set of training data, \(\{{\varvec{x}}^i | i = 1, 2, \dots , M\}\), is represented as shown in (6)[8].

$$\begin{aligned} L_q({\varvec{\mu }}, {\mathbf{\Sigma }}) = M\log (Z) + \frac{1}{1-q}\sum _{i=1}^M \log \left( \frac{\nu + s^i}{\nu } \right) , \end{aligned}$$
(6)

where \(s^i = ({\varvec{x}}^i -{\varvec{\mu }})^T {\mathbf{\Sigma }}^{-1}({\varvec{x}}^i - {\varvec{\mu }})\). Maximizing \(L_q\) in (6), you obtain the following equations:

$$\begin{aligned} \hat{{\varvec{\mu }}}_\mathrm{ML} = \frac{\sum _i w_i {\varvec{x}}^i}{\sum _i w_i}, \end{aligned}$$
(7)
$$\begin{aligned} \hat{{\mathbf{\Sigma }}}_\mathrm{ML} = \frac{1}{M}\sum _i w_i ({\varvec{x}}^i - \hat{{\varvec{\mu }}}_\mathrm{ML})({\varvec{x}}^i - \hat{{\varvec{\mu }}}_\mathrm{ML})^T, \end{aligned}$$
(8)

where \(w_i = (\nu + d)/(\nu + s^i)\). As shown in the above equations, you need to compute the values of weights, \(w_i\), in order for the estimation. Interpreting \(w_i\) as latent variables, you can estimate \(\hat{{\varvec{\mu }}}_\mathrm{ML}\) and \(\hat{{\mathbf{\Sigma }}}_\mathrm{ML}\) with an EM algorithm [8]. The outline of the algorithm is described below.

  1. 1.

    Set initial values of \({\varvec{\mu }}\) and \({\mathbf{\Sigma }}\) as \({\varvec{\mu }}^{(0)}\) and \({\mathbf{\Sigma }}^{(0)}\), respectively and set \(m = 0\).

  2. 2.

    E-step:

    Calculate the weight, \(w_i^{(m)}\), as follows:

    $$\begin{aligned} w_i^{(m)} = \frac{\nu +d}{\nu + ({\varvec{x}}^i - {\varvec{\mu }}^{(m)})^T ({\mathbf{\Sigma }}^{(m)})^{-1} ({\varvec{x}}^i - {\varvec{\mu }}^{(m)})}. \end{aligned}$$
    (9)
  3. 3.

    M-step:

    Update the values of the parameters as follows:

    $$\begin{aligned} {\varvec{\mu }}^{(m+1)} = \frac{\sum _i w_i^{(m)}{\varvec{x}}^i}{\sum _i w_i^{(m)}}, \end{aligned}$$
    (10)
    $$\begin{aligned} {\mathbf{\Sigma }}^{(m+1)} = \frac{1}{M}\sum _i w_i^{(m)}({\varvec{x}}^i - {\varvec{\mu }}^{(m+1)})({\varvec{x}}^i - {\varvec{\mu }}^{(m+1)})^T. \end{aligned}$$
    (11)
  4. 4.

    Check if the values of the parameters are converged. If not, increase m as \(m\leftarrow m+1\) and back to E-step. Otherwise output the final values as \(\hat{{\varvec{\mu }}}_\mathrm{ML} = {\varvec{\mu }}^{(m)}\) and \(\hat{{\mathbf{\Sigma }}}_\mathrm{ML} = {\mathbf{\Sigma }}^{(m)}\).

Replacing \({\varvec{x}}^i\) with \({\varvec{d}}^i_{jk}\), you can estimate \(p({\varvec{x}}_j | {\varvec{x}}_k)\) in the same manner.

2.4 Model Selection with AIC

The shape of the q-exponential distribution changes with respect to q. Given an identical set of training data, you obtain different estimations, \(\hat{{\varvec{\mu }}}_\mathrm{ML}\) and \(\hat{{\mathbf{\Sigma }}}_\mathrm{ML}\) depending on the value of q and the generalization performance of the resultant statistical model changes with respect to q. In the proposed method, an appropriate value of q, which will minimize the generalization error, is selected for each of the distributions, \(p({\varvec{x}}_j)\) and \(p({\varvec{x}}_j | {\varvec{x}}_k)\), based on an AIC.

An Akaike’s Information Criterion (AIC) is derived from a KL-divergence between an employed probability distribution model and the true distribution and is used for selecting a model that will minimize the generalization error from a set of models [9]. The AIC of the q-exponential distribution is a function of q:

$$\begin{aligned} \mathrm{AIC}(q) = -2L_q(\hat{{\varvec{\mu }}}_\mathrm{ML}, \hat{{\mathbf{\Sigma }}}_\mathrm{ML}) + 2l, \end{aligned}$$
(12)

where l denotes the number of the parameters. The q-exponential distributions with different values of q have identical number of the parameters and the second term of the right hand side in (12) is constant with respect to q. You can hence select the best model that minimizes AIC by selecting the value of q at which the likelihood, \(L_q(\hat{{\varvec{\mu }}}_\mathrm{ML}, \hat{{\mathbf{\Sigma }}}_\mathrm{ML})\), is maximum with respect to q. An algorithm for the selection of q for each of the distributions, \(p({\varvec{x}}_j)\) and \(p({\varvec{x}}_j | {\varvec{x}}_k)\), is as follows:

  1. 1.

    Set \(q = 1\) and set a small positive step size \(\varDelta _q\).

  2. 2.

    Estimate \(\hat{{\varvec{\mu }}}_\mathrm{ML}\) and \(\hat{{\varvec{{\mathbf{\Sigma }}}}}_\mathrm{ML}\) by using an EM-algorithm and estimate the likelihood \(L_q(\hat{{\varvec{\mu }}}_\mathrm{ML}, \hat{{\mathbf{\Sigma }}}_\mathrm{ML})\) by means of a cross-validation.

  3. 3.

    Update q as \(q\leftarrow q + \varDelta _q\). If \(q < 1 + 2/d\), back to 2.

  4. 4.

    Select the value \(q^*\), which maximizes \(L_q\).

3 Experimental Results

Sets of artificial data were used for evaluating if the proposed method improved the generalization performance and a set of CT images was used for constructing a PDM of the liver and for studying what values of q were selected for the probability distributions in (3).

3.1 Experiments with Artificial Data

Two families of data were generated by drawing them from an identical d-dimensional Gaussian probability distribution for examining if the model selection improved the generalization performance. The former family consisted of sets of training data and the latter one consisted of a set of data for evaluating the constructed models. Multiple sets of training data were generated in order to investigate the relationships between the number of training data and the selected value of q.

Let \(\mathcal{X}_{n_\mathrm{T}}^m\) denote a set of \(n_\mathrm{T}\) training data: \(\mathcal{X}_{n_\mathrm{T}}^m = \{{\varvec{x}}_1^m, {\varvec{x}}_2^m, \dots , {\varvec{x}}_{n_\mathrm{T}}^m\}\) where \({\varvec{x}}_n^m\) (\(n=1, 2, \dots , n_\mathrm{T}\)), are d-vectors and denote the training data. For statistically evaluating the constructed models, we generated \(M_\mathrm{T}\) sets of \(n_\mathrm{T}\) training data, \(\mathcal{X}_{n_\mathrm{T}}^1, \mathcal{X}_{n_\mathrm{T}}^2, \dots , \mathcal{X}_{n_\mathrm{T}}^{M_\mathrm{T}}\), by drawing the data from an identical Gaussian distribution. By changing the value of \(n_\mathrm{T}\) as \(n_\mathrm{T} = n_\mathrm{T}^1, n_\mathrm{T}^2, \dots , n_\mathrm{T}^{N_\mathrm{T}}\), we generated \(N_\mathrm{T}\times M_\mathrm{T}\) sets of training data. The proposed method was then applied to each of the training data sets: \(\hat{{\varvec{\mu }}}_\mathrm{ML}\) and \(\hat{{\mathbf{\Sigma }}}_\mathrm{ML}\) were estimated at each q and then \(q^*\) at which the AIC was minimum was selected.

The generalization performances of the constructed models, \(p_q({\varvec{x}} | \hat{{\varvec{\mu }}}_\mathrm{ML}, \hat{{\mathbf{\Sigma }}}_\mathrm{ML})\), were evaluated by using the set of the data for the evaluation. Let the set for the evaluation be denoted by \(Y = \{{\varvec{y}}_1, {\varvec{y}}_2, \dots , {\varvec{y}}_{N_\mathrm{E}}\}\), where \({\varvec{y}}_i\) (\(i = 1, 2, \dots , N_\mathrm{E}\)) are d-vectors and are drawn from the Gaussian distribution. The generalization error is evaluated as follows:

$$\begin{aligned} E(q) = -\sum _{i=1}^{N_\mathrm{E}} \log p_q({\varvec{y}}_i |\hat{{\varvec{\mu }}}_\mathrm{ML}, \hat{{\mathbf{\Sigma }}}_\mathrm{ML}). \end{aligned}$$
(13)

The error shown in (13) comes from a KL-divergence between a true probability distribution from which the data are derived and a model probability distribution. Let the true probability distribution be denoted by \(\bar{p}_\mathrm{T}({\varvec{x}})\), which is assumed to be a Gaussian distribution in the experiments, and let the model probability distribution be denoted by \(p_{q}({\varvec{x}})\). The KL-divergence between \(\bar{p}_\mathrm{T}({\varvec{x}})\) and \(p_{q}({\varvec{x}})\) is given as

$$\begin{aligned} \mathrm{KL}[\bar{p}_\mathrm{T}({\varvec{x}})||p_{q}({\varvec{x}})] = \int \bar{p}_\mathrm{T}({\varvec{x}})\log \bar{p}_\mathrm{T}({\varvec{x}}) d{\varvec{x}}- \int \bar{p}_\mathrm{T}({\varvec{x}})\log p_{q}({\varvec{x}}) d{\varvec{x}}. \end{aligned}$$
(14)

The first term in the right hand side in (14) is independent of the model and cannot be evaluated in general. The second term can be approximately evaluated by using a set of observed data, \(\{{\varvec{y}}_i | i = 1, 2, \dots ,N_\mathrm{E} \}\), which are derived from \(\bar{p}_\mathrm{T}({\varvec{x}})\), as \(-\sum _i \log p_{q}({\varvec{y}}_i)\). The generalization error shown in (13) hence decreases when the KL divergence between the true distribution, \(\bar{p}_\mathrm{T}({\varvec{x}})\), and the model \(p_q({\varvec{x}})\) is smaller.

The experimental results demonstrated in this subsection were obtained when \(d=3\) and when the mean and the covariance matrix of the Gaussian were \(\bar{{\varvec{\mu }}} = [0,0,0]^T\) and \(\bar{{\mathbf{\Sigma }}} = \mathrm{diag}(10, 10, 5)\), respectively. The number of the data used for the evaluation of (13) was set as \(N_\mathrm{E} = 1000\) and the step size of q was set as \(\varDelta _q = 1/100\). The graphs in Fig. 2 show the relationships between the value of q and the generalization error, \(E(\hat{{\varvec{\mu }}}_\mathrm{ML}, \hat{{\mathbf{\Sigma }}}_\mathrm{ML})\), obtained when \(n_\mathrm{T}= 10,20,40,\) and 50. The bars in the graphs demonstrate one-sigma of the generalization errors evaluated by using \(N_\mathrm{E} = 1000\) evaluation data. \(E(q=1)\) indicates the generalization errors of a (classical) Gaussian distribution model. The green dots indicate the value of \(q=\bar{q}\) at which the generalization error, E, is minimum. The blue dots indicate the average value of \(q^*\)s that were selected from \(M_\mathrm{T}=100\) training data sets based on the AIC. The average value was not identical with \(\bar{q}\) but the selected model (\(q=q^*\)) had better generalization performance than the Gaussian one (\(q=1\)) even though all of the data obeyed the Gaussian probability distribution. As shown in Fig. 2, \(\bar{q}\) and \(q^*\) decreased to one as the number of the training data, \(n_\mathrm{T}\), increased. The proposed method successfully selected a model that had the generalization performance better than a Gaussian one.

Fig. 2.
figure 2

Results of the simulation experiments. In each graph, the horizontal axis shows the value of q and the vertical one shows the evaluated generalization error, E in (13). The green dot indicates \(\bar{q}\) where the error is minimum and the blue dot indicates the average value of \(q^*\).

3.2 Experiments with CT Images

A PDM of the liver in a CT image was constructed from a set of 40 training CT images each in which the liver region was manually labeled. The statistical variety of the liver was represented by a directed graphical model as shown in (3) and \(q^*\) was selected for each of the unary terms and the pairwise ones, \(p({\varvec{x}}_j)\) and \(p({\varvec{x}}_j|{\varvec{x}}_k)\) based on the AIC. The step size of q was set as \(\varDelta _q = 1/1000\).

The locations and the shapes of the patients’ bodies in the training images were normalized in advance by firstly automatically detecting 198 landmarks from each training image by a MCMC-based method [10, 11] and by secondly deforming each image so that the detected landmarks were registered to their average locations. Figure 3 shows an example of the locations of the landmarks. Then, a set of 1300 corresponding points was generated on the surface of each labeled liver region in the deformed images by using a generalized multi-dimensional scaling (GMDS)[12] (see Fig. 4). Let \({\varvec{x}}_j^i\) denote the j-th corresponding point generated on the surface in the i-th training image. It should be noted that you will obtain a different statistical model from an identical set of the training images if you normalize the images in a different way or if you employ a different method for the generation of the corresponding points.

Fig. 3.
figure 3

The locations of the landmarks used for the image normalization (red dots)

Fig. 4.
figure 4

Examples of the corresponding points generated by GMDS [12]. The colors indicate the correspondence between different surfaces.

In this study, the structure of the directed graphical model was determined after \(q^*\) for each unary term, \(p({\varvec{x}}_j)\), was selected. A pairwise term, \(p({\varvec{x}}_j | {\varvec{x}}_k)\), is represented by a directed edge that links from \({\varvec{x}}_k\) to \({\varvec{x}}_j\) and the set of the edges, \(\mathcal E\) in (3), was determined by the Delaunay diagram of the points on the average surface. You can obtain the average surface by computing the average of the training data of each point, \(\hat{{\varvec{x}}}_j = (\sum _{i=1}^N{\varvec{x}}_j^i)/N\) and two nodes in the graphical model were linked by an edge if the corresponding two points in the PDM were linked by an edge in the Delaunay diagram. The direction of each edge was in the increasing order of \(q^*\). Then, you obtain an acyclic directed graph. Once you determine \(\mathcal E\), you can estimate \(p({\varvec{x}}_j | {\varvec{x}}_k)\) (\([j,k]\in \mathcal E\)) from a set of training data, \(\{{\varvec{d}}_{jk}^i | i = 1, 2, \dots , M \}\), where \({\varvec{d}}_{jk}^i = {\varvec{x}}_j^i - {\varvec{x}}_k^i\).

Fig. 5.
figure 5

\(q^*\) selected for the unary terms (A) and for the pairwise terms (B). The training data set for each point marked by a circle is shown in Figs. 6 and 7

Fig. 6.
figure 6

Examples of the training data sets for the two points marked in Fig. 5. \(q^* = 1.000\) was selected from (A) and \(q^* = 1.351\) from (B).

Fig. 7.
figure 7

Examples of the training data sets for the two edges marked in Fig. 5. \(q^* = 1.000\) was selected from (A) and \(q^* = 1.478\) from (B).

Figure 5 shows the experimental results: The panel (A) demonstrates \(q^*\) selected for each single point on the liver surface and the panel (B) demonstrates \(q^*\) for each edge in the directed graph. As shown in the panel (A), larger values of q were selected at the points in the posterior portion of the liver: The shape of the probability distribution selected at each point in the posterior liver portion had heavier tails. Figure 6 shows two examples of the training data sets. The panel (A) shows a training data set of a point in the anterior portion of the liver at which \(q^* = 1.000\) and the panel (B) shows a training data set in the posterior portion at which \(q^* = 1.351\). As shown, some outliers (circled in the figure) were found in the training data set when larger \(q^*\) was selected. It should be noted that the location of the anterior portion of the liver is constrained more strongly than that of the posterior portion because of the locations of the landmarks used for the image normalization. The anterior portion of the liver is located near the anterior portion of the abdominal cavity, of which location in a normalized image is more strongly constrained by a set of the landmarks detected at the rib cage and the navel. As shown in Fig. 5(B), probability distributions with heavier tails were selected for representing the pairwise terms in the right portion of the liver. Two examples of the training data for the pairwise terms are shown in Fig. 7. Larger values of q were selected when the training data sets include outliers, of which the Mahalanobis distance from the mean are large.

4 Conclusion

The authors propose to represent the statistical variety of a target organ with a directed graphical model and to select an appropriate model from q-exponential distribution for representing the unary terms and the pairwise ones in order to improve the generalization performance. For the model selection, the AIC is employed. Simulation experiments demonstrated that the model selection improved the generalization performance and the experiments with clinical images showed that the distributions with heavier tails were selected for the representation of the unary terms in the posterior portion of the liver. The future works include to implement a method for registering the PDM constructed in this study and to compare it with a Gaussian model in regard to the registration performance.