Online Statistical Inference for Large-Scale Binary Images

Chung, Moo K.; Chuang, Ying Ji; Vorperian, Houri K.

doi:10.1007/978-3-319-66185-8_82

Online Statistical Inference for Large-Scale Binary Images

Moo K. Chung^21,22,
Ying Ji Chuang²² &
Houri K. Vorperian²²

Conference paper
First Online: 04 September 2017

9222 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10434))

Abstract

We present a unified online statistical framework for quantifying a collection of binary images. Since medical image segmentation is often done semi-automatically, the resulting binary images may be available in a sequential manner. Further, modern medical imaging datasets are too large to fit into a computer’s memory. Thus, there is a need to develop an iterative analysis framework where the final statistical maps are updated sequentially each time a new image is added to the analysis. We propose a new algorithm for online statistical inference and apply to characterize mandible growth during the first two decades of life.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

The typical implementation of statistical inference in medical imaging requires that all the images are available in advance. That is the usual premise of existing medical image analysis tools such as Image J, SPM, SFL and AFNI. However, there are many situations where the entire imaging dataset is not available and parts of imaging data are obtained in a delayed sequential manner. This is a common problem in medical imaging, where not every subject is scanned and processed at the same time.

When the image size is large, it may not be possible to fit all of the imaging data in a computer’s memory, making it necessary to perform the analysis by adding one image at a time in a sequential manner. In another situation, the imaging dataset may be so large that it is not practical to use all the images in the dataset but use a subset of the dataset. In this situation, we need to incrementally add stratified datasets one at a time to see if we are achieving reasonable statistical results. In all the above situations, we need a way to incrementally update the analysis result without repeatedly running the entire analysis whenever new images are added.

An online algorithm is one that processes its inputted data in a sequential manner [9]. Instead of processing the entire set of imaging data from the start, an online algorithm processes one image at a time. That way, we can bypass the memory requirement, reduce numerical instability and increase computational efficiency. Online algorithms and machine learning are both concerned with problems of making decisions about the present based on knowledge of the past [3]. Thus, online algorithms are often encountered in machine learning literature but there is very limited number of studies in medical imaging possibly due to the lack of problem awareness. With the ever-increasing amount of large scale medical imaging databases such as Alzheimer’s disease neuroimaging initiative (ADNI) and human connectome project (HCP), the development of various online algorithm is warranted. In this study, we propose to develop online statistical inference procedures. The methods are then used in characterizing the mandible growth using binary mandible segmentations from CT.

2 Probabilistic Model of Binary Segmentation

Let p(x) be the probability of voxel x belonging to some region of interest (ROI) $\mathcal {M}$. Let $\mathbf{1}_{\mathcal {M}}$ be an indicator function defined as $\mathbf{1}_{\mathcal {M}} (x)=1$ if $x \in \mathcal {M}$ and 0 otherwise. We assume that the shape of $\mathcal {M}$ is random (due to noise) and we associate it with probability p(x):

$$\begin{aligned} P(x \in \mathcal {M})= & {} p(x),\quad P(x \notin \mathcal {M}) = 0. \end{aligned}$$

The volume of $\mathcal {M}$ given by $ vol (\mathcal {M} ) = \int _{\mathbb {R}^3} \mathbf{1}_{\mathcal {M}}(x) \; dx.$ is also random. The mean volume of $\mathcal {M}$ is then

$$\begin{aligned} \mathbb {E} \;vol(\mathcal {M}) = \int _{\mathbb {R}^3} \mathbb {E} \;\mathbf{1}_{\mathcal {M}}(x) \; dx = \int _{\mathbb {R}^3} p(x) \; dx. \end{aligned}$$

The integral of the probability map thus can be used as an estimate for the volume of ROI. Unfortunately, medical images often have holes and cavities that have to be patched topologically for accurate volume estimation (Fig. 1a). Such topological defects can be easily patched by Gaussian kernel smoothing without resorting to advanced topology correction methods (Fig. 1b) [5].

Consider a 3-dimensional Gaussian kernel $K( x )= \frac{1}{(2\pi )^{3/2}}\exp (-\Vert x\Vert ^2/2),$ where $\Vert \cdot \Vert $ is the Euclidean norm of $x \in \mathbb {R}^3$. The rescaled kernel $K_{t}$ is defined as $K_{t} (x) = K (x/t) /t^3.$ Gaussian kernel smoothing applied to the probability map p(x) is given by

$$\begin{aligned} K_{t}*p (x) =\int _{\mathbb {R}^3} K_{t}(x-y) p( y) \; d y. \end{aligned}$$

The volume estimate is invariant under smoothing:

$$\begin{aligned} \mathbb {E} \;vol(\mathcal {M})= & {} \int _{\mathbb {R}^3} p(y) \;dy\\= & {} \int _{\mathbb {R}^3} \int _{\mathbb {R}^3} K_{t}(x-y)p(y)\;dy \; dx =\int _{\mathbb {R}^3} K_{t}*p(x) \; dx .\end{aligned}$$

Here, we used the fact that Gaussian kernel is a probability density, i.e.,

$$\int _{\mathbb {R}^3} K_{t}(x,y) \; dx = 1$$

for any $y \in \mathbb {R}^3$. Thus, the smoothed probability map $K_t*p(x)$ can be taken as a more robust probability map of whether a voxel belongs to a mandible and can be used as a response variable in modeling the growth of mandible.

3 Online Algorithm for t-Test

Given smoothed images $x_1, \cdots , x_m$, an online algorithm for computing the sample mean image $\mu _m$ is given by

$$\begin{aligned} \mu _m= & {} \frac{1}{m} \sum _{i=1}^m x_i = \mu _{m-1} + \frac{1}{m}( x_m - \mu _{m-1}) \end{aligned}$$

(1)

for any $m \ge 1$. The algorithm updates the previous mean $\mu _{m-1}$ with new image $x_m$. This algorithm avoids accumulating large sums and tend to be numerically more stable [7].

An online algorithm for computing the sample variance map $\sigma _m^2$ is algebraically involved [4, 11]. After lengthy derivation, it can be shown that

$$\begin{aligned} \sigma _m^2 = \frac{1}{m-1} \sum _{i=1}^m (x_i - \mu _m)^2 = \frac{m-2}{m-1} \sigma _{m-1}^2 + \frac{1}{m} (x_m - \mu _{m-1})^2 \end{aligned}$$

for $m \ge 2$. The algorithm starts with the initial value $\sigma _1^2 =0$. Figure 1 displays the results of mean and variance computation using the online algorithms.

For comparing a collection of images between groups, two-sample t-statistic can be used. Given measurements $x_1, \cdots , x_m \sim N(\mu ^1, (\sigma ^1)^2)$ in one group and $y_1, \cdots , y_n \sim N(\mu ^1, (\sigma ^2)^2)$ in the other group, the two-sample t-statistic for testing $H_0: \mu ^1= \mu ^2$ at each voxel x is given by

$$\begin{aligned} T_{m,n}(x)= \frac{ \mu ^1_m - \mu ^2_n - (\mu ^1 - \mu ^2)}{\sqrt{ (\sigma ^1)^2_m/m + (\sigma ^2)^2_n/n}}, \end{aligned}$$

(2)

where $ \mu ^1_m, \mu ^2_n$, $(\sigma ^1)^2_m, (\sigma ^2)^2_m$ are sample means and variances in each group estimated using the online algorithm. $T_{m,n}$ is then sequentially computed as

$$T_{1,0} \rightarrow T_{2,0} \rightarrow \cdots \rightarrow T_{m,0} \rightarrow T_{m,1} \rightarrow \cdots \rightarrow T_{m,n}$$

in $m+n$ steps.

4 Online Algorithm for Linear Regression

The online algorithm for linear regression is itself useful but additionally more useful in constructing an online algorithm for F-tests in the next section. Given data vector $\mathbf{y}_{m-1}=(y_1, \cdots , y_{m-1})'$ and design matrix $Z_{m-1}$, consider linear model

$$\begin{aligned} \mathbf{y}_{m-1} = Z_{m-1} \varvec{\lambda }_{m-1} \end{aligned}$$

(3)

with unknown parameter vector $\varvec{\lambda }_{m-1} =(\lambda _1, \lambda _2, \cdots , \lambda _k)'$. Multiplying $Z_{m-1}'$ on the both sides we have

$$\begin{aligned} Z_{m-1}'{} \mathbf{y}_{m-1} = Z_{m-1}'Z_{m-1} \varvec{\lambda }_{m-1} \end{aligned}$$

(4)

Let $W_{m-1} = Z_{m-1}'Z_{m-1},$ which is a $k \times k$ matrix. In most applications, there are substantially more data than the number of parameters, i.e., $m \gg k$, and $W_{m-1}$ is invertible. The least squares estimation (LSE) of $\varvec{\lambda }_{m-1}$ is given by

$$\varvec{\lambda }_{m-1} = W_{m-1}^{-1}Z_m'{} \mathbf{y}_{m-1}.$$

When new data $y_m$ is introduced to the linear model (3), the model is updated to

$$\left( \begin{array}{c} \mathbf{y}_{m-1}\\ y_m \end{array}\right) = \left( \begin{array}{c} {Z}_{m-1}\\ z_m \end{array}\right) \varvec{\lambda }_{m},$$

where $z_m$ is $1 \times k$ row vector. Subsequently, we have

$$\begin{aligned} W_{m-1}'\varvec{\lambda }_{m-1} + z_m'y_m= & {} (W_{m-1} + z_m'z_m ) \varvec{\lambda }_{m}. \end{aligned}$$

Using Woodbury formula [6],

$$\begin{aligned} (W_{m-1} + z_m'z_m )^{-1} = W_{m-1} ^{-1} - c_m W_{m-1}^{-1} z_m',\end{aligned}$$

where $c_m= 1/(1 + z_m W_{m-1} z_m')$ is scalar. Then we have the explicit online algorithm for updating the parameter vector:

$$\begin{aligned} \varvec{\lambda }_{m} = (I - W_{m-1}^{-1}z_m'y_m - c_m W_{m-1}^{-1}z_m'W_{m-1}' )\varvec{\lambda }_{m-1} - c_m W_{m-1}^{-1}z_m'z_m'y_m, \end{aligned}$$

(5)

where I is the identity matrix of size $k \times k$. Since the algorithm requires $W_{m-1}$ to be invertible, the algorithm must start from

$$\varvec{\lambda }_{k} \rightarrow \varvec{\lambda }_{k+1} \rightarrow \cdots \rightarrow \varvec{\lambda }_{m}.$$

A similar online algorithm for fitting a general linear model (GLM) was introduced for real-time fMRI [2], where the Cholesky factorization was used to invert the covariance matrix in solving GLM. Our approach based on Woodbury formula does not require the factorization or inversion of matrices and thus more efficient.

5 Online Algorithm for F-Test

Let $y_i$ be the i-th image, $\mathbf{x}_i=(x_{i1},\cdots , x_{ip})'$ to be the variables of interest and $\mathbf{z}_i=(z_{i1}, \cdots , z_{ik})'$ to be nuisance covariates corresponding to the i-th image. We assume there are $m-1$ images to start with. Consider a general linear model

$$ \mathbf{y}_{m-1} = Z_{m-1} {\varvec{\lambda }}_{m-1}+ X_{m-1}{\varvec{\beta }}_{m-1},$$

where $Z_{m-1} = (z_{ij})$ is $(m-1) \times k$ design matrix, $X_{m-1} = (x_{ij})$ is $(m-1) \times p$ design matrix. ${\varvec{\lambda }}_{m-1} = (\lambda _1,\cdots ,\lambda _k)'$ and ${\varvec{\beta }}_{m-1} = (\beta _1,\cdots ,\beta _p)'$ are unknown parameter vectors to be estimated at the $(m-1)$-th iteration. Consider hypotheses

$$H_0: {\varvec{\beta }} =0 \text{ vs. } H_1: {\varvec{\beta }} \ne 0.$$

The reduced null model when ${\varvec{\beta }} =0 $ is $ \mathbf{y}_{m-1} = Z_{m-1} {\varvec{\lambda }}^0_{m-1}.$ The goodness-of-fit of the null model is measured by the sum of the squared errors (SSE):

$$\begin{aligned} \text{ SSE }_{m-1}^0 = (\mathbf{y}_{m-1} - Z_{m-1} {\varvec{\lambda }}^0_{m-1})'(\mathbf{y}_{m-1} - Z_{m-1} {\varvec{\lambda }}^0_{m-1}), \end{aligned}$$

where ${\varvec{\lambda }}_{m-1}^0$ is estimated using the online algorithm (5). This provide the sequential update of SSE under $H_0$:

$$\text{ SSE }_{k}^0 \rightarrow \text{ SSE }_{k+1}^0 \rightarrow \cdots \rightarrow \text{ SSE }_{m}^0.$$

Similarly the fit of the alternate full model is measured by

$$\begin{aligned} \text{ SSE }^1_{m-1}= & {} (\mathbf{y}_{m-1} - \mathbb {Z}_{m-1} {\upgamma }_{m-1}^1)'(\mathbf{y}_{m-1} - \mathbb {Z}_{m-1} {\upgamma }_{m-1}^1), \end{aligned}$$

where $\mathbb {Z}_{m-1} = [Z_{m-1} X_{m-1}]$ is the combined design matrix and

$${\upgamma }^1_{m-1} = \left( \begin{array}{c} \varvec{\lambda }_{m-1}^1 \\ \varvec{\beta }_{m-1}^1 \\ \end{array} \right) $$

is the combined parameter vector. Similarly SSE under $H_1$ is given as

$$\text{ SSE }_{k+p}^1 \rightarrow \text{ SSE }_{k+1}^1 \rightarrow \cdots \rightarrow \text{ SSE }_{m}^1.$$

Under $H_0$, the test statistic at the m-th iteration $f_m$ is given by

$$\begin{aligned} f_m= \frac{(\text{ SSE }_0 - \text{ SSE }_1)/p}{\text{ SSE }_0/(m-p-k)}\sim F_{p, m-p-k}, \end{aligned}$$

(6)

which is the F-statistic with p and $m-p-k$ degrees of freedom.

6 Random Field Theory

Since the statistic maps are correlated over voxels, it is necessary to correct multiple comparisons using the random field theory [12], which is based on the expected Euler characteristic (EC) approach. Given statistic maps S such as t- or F-test maps, for sufficiently high threshold h, we have

$$\begin{aligned} P \Big (\sup _{x \in \mathcal {M}} S(x) > h \Big ) = \sum _{d=0}^N \mu _d(\mathcal {M})\rho _d(h), \end{aligned}$$

(7)

where $\mu _d(\mathcal {M})$ is the d-th Minkowski functional or intrinsic volume of $\mathcal {M}$. $\rho _d(h)$ is the EC-density of S. The explicit formulas for $\mu _d$ and $\rho _d$ are given in [12].

7 Application

Subjects. The dataset consisted of 290 typically developing individuals ranging in age from birth to 20 years old. Only CT images showing the full mandible without any motion or any other artifacts were selected though minimal dental artifacts were tolerated. The age distribution of the subjects is $9.66\,\pm \,6.34$ years. The minimum age was 0.17 years and maximum age was 19.92 years. A total of 160 male and 130 female subjects were divided into 3 groups. Group I (age below 7) contained 130 subjects. Group II (between 7 and 13) contained 48 subjects. Group III (between 13 and 20) contained 112 subjects. The main biological question of interest was whether there were localized regions of growth between these age groups. The same grouping was used in the previous study [5].

Image preprocessing. CT images were visually inspected and determined to capture the whole mandible geometry. The mandibles in CT were semi-automatically segmented using an in-house processing pipeline that involves image intensity thresholding using the Analyze software package (AnalyzeDirect, Inc., Overland Park, KS). Each of the processed mandibles were examined visually and edited manually by raters. The segmented binary images were then affine registered to the mandible labeled as F226-15-04-002-M (Fig. 1). The mandible F226-15-04-002-M served as the template. Due to the lack of existing prior map in the field, we simply used the normalized binary segmentation results as the probability map p(x).

CT images are inherently noisy due to errors associated with image acquisition. Compounding the image acquisition errors, there are errors caused by image registration and semiautomatic segmentation. So it is necessary to smooth out the affine registered segmented images. We smoothed the binary images with Gaussian kernel with bandwidth $\sigma =20$ voxels (Fig. 1). Since CT image resolution is 0.35 mm, 20 voxel wide bandwidth is equivalent to 7 mm. The bandwidth was chosen to reflect the size of missing teeth and cavities. Any smaller filter size will not mask large missing teeth and cavity. The average of all 290 smoothed binary images was computed and used as the final template. For visualization, the statistical maps are projected onto the surface of template.

Age effects. We performed the t-test to assess age effects between the groups. The resulting t-statistic maps are displayed in Fig. 2-top. Voxels above or below ±4.41 were considered significant in the t-statistic between age groups I and II at the 0.05 level after the multiple comparisons correction. Similarly for other age group comparisons, voxels above or below ±4.43 (between II and III) or 4.37 (between I and III) were considered significant at the 0.05 level. These regions are colored dark red or dark blue. The dark red regions show positive growth (bone deposition) and dark blue regions show negative growth (bone resorption). The findings are consistent with previous studies based on 2D surface deformation [5] and landmarks [10].

Sex effects. Within each group, we tested the significance of sexual dimorphism by performing the two-sample t-test between males and females. The resulting t-statistic maps are displayed in Fig. 2-bottom. Any region above or below ± 4.37, 4.89 and 4.50 (for group I, II and III respectively) were considered significant at 0.05 level after the multiple comparisons correction. In group I and II, there is no gender differences. In group III, the statistical significance is localized in the regions between Condyle and Gonion in the both sides. Such findings are consistent with general findings on sexual dimorphism that become evident during puberty.

8 Discussion

The image processing and analysis somewhat resembles the voxel-based morphometry (VBM) widely used in modeling the gray and white matter tissue probability maps in structural brain magnetic resonance imaging studies [1]. VBM does not necessarily require very accurate nonlinear registration. The shape difference is implicitly encoded in tissue density maps. If perfect registration is done, the tissue density maps will be identical across subjects and we will not detect any difference. Thus, in our study, only affine registration is used. Previously we used a diffeomorphic surface shape model to a similar dataset [5], where we also obtained getting the similar pattern of wide spread growth in almost identical places.

In VBM, the posterior probability map is estimated using the prior probability map. However, there is no such prior map in mandibles CT studies yet. Our 290 subject average probability map is distributed as a potential prior map for other Bayesian shape modeling [8]: http://www.stat.wisc.edu/~mchung/VBM.

References

Ashburner, J., Friston, K.: Why voxel-based morphometry should be used. NeuroImage 14, 1238–1243 (2001)
Article Google Scholar
Bagarinao, E., Nakai, T., Tanaka, Y.: Real-time functional MRI: development and emerging applications. Magn. Reson. Med. Sci. 5, 157–165 (2006)
Article Google Scholar
Blum, A.: On-line algorithms in machine learning. In: Fiat, A., Woeginger, G.J. (eds.) Online Algorithms. LNCS, vol. 1442, pp. 306–325. Springer, Heidelberg (1998). doi:10.1007/BFb0029575
Chapter Google Scholar
Chan, T.F., Golub, G.H., LeVeque, R.J.: Algorithms for computing the sample variance: analysis and recommendations. Am. Stat. 37, 242–247 (1983)
MathSciNet MATH Google Scholar
Chung, M.K., Qiu, A., Seo, S., Vorperian, H.K.: Unified heat kernel regression for diffusion, kernel smoothing and wavelets on manifolds and its application to mandible growth modeling in CT images. Med. Image Anal. 22, 63–76 (2015)
Article Google Scholar
Deng, C.Y.: A generalization of the Sherman-Morrison-Woodbury formula. Appl. Math. Lett. 24, 1561–1564 (2011)
Article MathSciNet Google Scholar
Finch, T.: Incremental calculation of weighted mean and variance. University of Cambridge, 4:11–4:15 (2009)
Google Scholar
Jaakkola, T., Jordan, M.: A variational approach to bayesian logistic regression models and their extensions. In: Sixth International Workshop on Artificial Intelligence and Statistics, vol. 82 (1997)
Google Scholar
Karp, R.M.: On-line algorithms versus off-line algorithms: how much is it worth to know the future? In: IFIP Congress, vol. 12, pp. 416–429 (1992)
Google Scholar
Kelly, M.P., Vorperian, H.K., Wang, Y., Tillman, K.K., Werner, H.M., Chung, M.K., Gentry, L.R.: Characterizing mandibular growth using three-dimensional imaging techniques and anatomic landmarks. Arch. Oral Biol. 77, 27–38 (2017)
Article Google Scholar
Knuth, D.: The Art of Computing, Volume 2: Seminumerical Algorithms. Addison-Wesley, Boston (1981)
MATH Google Scholar
Worsley, K.J., Cao, J., Paus, T., Petrides, M., Evans, A.C.: Applications of random field theory to functional connectivity. Hum. Brain Mapp. 6, 364–367 (1998)
Article Google Scholar

Download references

Acknowledgements

This work was supported by NIH Research Grants R01 DC6282, P-30 HD03352, UL1TR000427 and R01 EB022856.

Author information

Authors and Affiliations

Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, USA
Moo K. Chung
Vocal Tract Development Laboratory, Waisman Center, University of Wisconsin, Madison, USA
Moo K. Chung, Ying Ji Chuang & Houri K. Vorperian

Authors

Moo K. Chung
View author publications
You can also search for this author in PubMed Google Scholar
Ying Ji Chuang
View author publications
You can also search for this author in PubMed Google Scholar
Houri K. Vorperian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moo K. Chung .

Editor information

Editors and Affiliations

Université de Sherbrooke, Sherbrooke, QC, Canada
Maxime Descoteaux
DKFZ, Heidelberg, Germany
Lena Maier-Hein
Ulm University of Applied Sciences, Ulm, Germany
Alfred Franz
Université de Rennes 1, Rennes, France
Pierre Jannin
McGill University, Montreal, QC, Canada
D. Louis Collins
Université Laval, Québec, QC, Canada
Simon Duchesne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chung, M.K., Chuang, Y.J., Vorperian, H.K. (2017). Online Statistical Inference for Large-Scale Binary Images. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D., Duchesne, S. (eds) Medical Image Computing and Computer-Assisted Intervention − MICCAI 2017. MICCAI 2017. Lecture Notes in Computer Science(), vol 10434. Springer, Cham. https://doi.org/10.1007/978-3-319-66185-8_82

Download citation

DOI: https://doi.org/10.1007/978-3-319-66185-8_82
Published: 04 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66184-1
Online ISBN: 978-3-319-66185-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)