Skip to main content

Generative Method to Discover Genetically Driven Image Biomarkers

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9123))

Abstract

We present a generative probabilistic approach to discovery of disease subtypes determined by the genetic variants. In many diseases, multiple types of pathology may present simultaneously in a patient, making quantification of the disease challenging. Our method seeks common co-occurring image and genetic patterns in a population as a way to model these two different data types jointly. We assume that each patient is a mixture of multiple disease subtypes and use the joint generative model of image and genetic markers to identify disease subtypes guided by known genetic influences. Our model is based on a variant of the so-called topic models that uncover the latent structure in a collection of data. We derive an efficient variational inference algorithm to extract patterns of co-occurrence and to quantify the presence of heterogeneous disease processes in each patient. We evaluate the method on simulated data and illustrate its use in the context of Chronic Obstructive Pulmonary Disease (COPD) to characterize the relationship between image and genetic signatures of COPD subtypes in a large patient cohort.

N.K. Batmanghelich and A. Saeedi—equal contribution.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.: Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012)

    Article  Google Scholar 

  2. Batmanghelich, K.N., Cho, M., Jose, R.S., Golland, P.: Spherical topic models for imaging phenotype discovery in genetic studies. In: Cardoso, M.J., Simpson, I., Arbel, T., Precup, D., Ribbens, A. (eds.) BAMBI 2014. LNCS, vol. 8677, pp. 107–117. Springer, Heidelberg (2014)

    Google Scholar 

  3. Batmanghelich, N.K., Dalca, A.V., Sabuncu, M.R., Golland, P.: Joint modeling of imaging and genetics. In: Gee, J.C., Joshi, S., Pohl, K.M., Wells, W.M., Zöllei, L. (eds.) IPMI 2013. LNCS, vol. 7917, pp. 766–777. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  4. Bush, W.S., Moore, J.H.: Genome-wide association studies. PLoS Comput. Biol. 8(12), e1002822 (2012)

    Article  Google Scholar 

  5. Castaldi, P.J., et al.: Genome-wide association identifies regulatory loci associated with distinct local histogram emphysema patterns. Am. J. Respir. Crit. Care Med. 190(4), 399–409 (2014)

    Article  Google Scholar 

  6. Castaldi, P.J., San José Estépar, R., Mendoza, C.S., Hersh, C.P., Laird, N., Crapo, J.D., Lynch, D.A., Silverman, E.K., Washko, G.R.: Distinct quantitative computed tomography emphysema patterns are associated with physiology and function in smokers. Am. J. Respir. Crit. Care Med. 188(9), 1083–1090 (2013)

    Article  Google Scholar 

  7. Cho, M.H., et al.: Risk loci for chronic obstructive pulmonary disease: a genome-wide association study and meta-analysis. Lancet Respir. Med. 2(3), 214–225 (2014)

    Article  Google Scholar 

  8. Guan, Y., Dy, J.G., Niu, D., Ghahramani, Z.: Variational inference for nonparametric multiple clustering. In: MultiClust Workshop, KDD 2010 (2010)

    Google Scholar 

  9. Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14(1), 1303–1347 (2013)

    MATH  MathSciNet  Google Scholar 

  10. Mendoza, C.S., et al.: Emphysema quantification in a multi-scanner hrct cohort using local intensity distributions. In: 2012 9th IEEE International Symposium on Biomedical Imaging (ISBI), pp. 474–477. IEEE (2012)

    Google Scholar 

  11. Regan, E.A., Hokanson, J.E., Murphy, J.R., Make, B., Lynch, D.A., Beaty, T.H., Curran-Everett, D., Silverman, E.K., Crapo, J.D.: Genetic epidemiology of copd (copdgene) study design. COPD: J. Chronic Obstructive Pulm. Dis. 7(1), 32–43 (2011)

    Article  Google Scholar 

  12. Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: EMNLP-CoNLL, vol. 7, pp. 410–420. Citeseer (2007)

    Google Scholar 

  13. Satoh, K., Kobayashi, T., Misao, T., Hitani, Y., Yamamoto, Y., Nishiyama, Y., Ohkawa, M.: CT assessment of subtypes of pulmonary emphysema in smokers. CHEST J. 120(3), 725–729 (2001)

    Article  Google Scholar 

  14. Sivic, J., Zisserman, A.: Efficient visual search of videos cast as text retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 591–606 (2009)

    Article  Google Scholar 

  15. Song, Y., Cai, W., Zhou, Y., Feng, D.D.: Feature-based image patch approximation for lung tissue classification. IEEE Trans. Med. Imaging 32(4), 797–808 (2013)

    Article  Google Scholar 

  16. Sorensen, L., Shaker, S.B., De Bruijne, M.: Quantitative analysis of pulmonary emphysema using local binary patterns. IEEE Trans. Med. Imaging 29(2), 559–569 (2010)

    Article  Google Scholar 

  17. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by NIH NIBIB NAMIC U54- EB005149, NIH NCRR NAC P41-RR13218 and NIH NIBIB NAC P41-EB015902, NHLBI R01HL089856, R01HL089897, K08HL097029, R01HL113264, 5K25HL104085, 5R01HL116931, and 5R01HL116473. The COPDGene study (NCT00608764) is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, Novartis, Pfizer, Siemens, GlaxoSmithKline and Sunovion.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nematollah K. Batmanghelich .

Editor information

Editors and Affiliations

Appendix: Variational Bayes Inference Procedure

Appendix: Variational Bayes Inference Procedure

Combining all components of the model defined in Sect. 2, we construct the joint distribution of all variables in the model (Fig. 6):

$$\begin{aligned}&p(\mathcal {D}, \mathcal {S}, \mathcal {P}) = \underbrace{\prod _{k=1}^{K}{ p(\mu _k,\varSigma _k;\eta ^I)\, p(\beta _k;\eta ^G)\, p(v_k;\omega )}}_{ \text {population-level topics}} \times \\&\prod _{s=1}^{S}\prod _{t=1}^{T}{ \underbrace{p(c_{st}|v_k) p(\pi _{st};\alpha )}_{\text {topics for subject { s}}} } \prod _{n=1}^{N}{ \underbrace{p(z_{sn}^I | \pi _{st})}_{\text {image topic}} \underbrace{p(I_{sn} | z_{sn}^I, c_{st}, \{ \mu _k,\varSigma _k \} )}_{\text {image likelihood}} }\\&\prod _{m=1}^{M}{ \underbrace{p(z_{sm}^G | \pi _{st})}_{\text {genetic topic}} \underbrace{p(G_{sm} | z_{sm}^G, c_{st},\beta _k)}_{\text {genetic likelihood}} }, \end{aligned}$$

where N and M are the number of supervoxels and minor alleles, respectively, identified for subject s.

Fig. 6.
figure 6

Left: Graphical model that represents the joint distribution. The open gray and white circles correspond to the observed and the latent random variables, respectively. The full circles represent fixed hyper-parameters. Superscript I and G denote image and genetic parts of the model respectively. Right: Update rules for the variational parameters.

We choose a factorization for the distribution q that captures most model assumptions and yet is computationally tractable:

$$\begin{aligned} q(\mathcal {S}, \mathcal {P})&= \underbrace{\prod _{k=1}^{K} \text {NIW}(\mu _k,\varSigma _k; \tilde{\eta }^I_k) \, \text {Dir}( \beta _k;\tilde{\eta }^G_k )\, \text {Beta}(v_k; \tilde{\omega }_k)}_{\text {population-level topics}} \times \\&\prod _{s=1}^{S} \prod _{t=1}^{T} \underbrace{\text {Cat}(c_{st}; {\xi }_{st}) \, \text {Beta}(\pi _{st}; \tilde{\alpha }_{st})}_{ \text {topics for subject { s}}} \prod _{n=1}^{N} \underbrace{\text {Cat}(z_{sn}^I; {\phi }^I_{sn})}_{\text {image topic}} \prod _{m=1}^{M} \underbrace{\text {Cat}(z_{sm}^G; {\phi }^G_{sm})}_{\text {genetic topic}}, \nonumber \end{aligned}$$

where we choose an appropriate approximating distribution for each latent variable and use \(\tilde{}\) to denote parameters of the approximating distributions. The optimization is defined in the space of the variational parameters \(\left\{ \tilde{\eta }^I,\tilde{\eta }^G,\tilde{\omega },{\xi }, \tilde{\alpha }, {\phi }^I,{\phi }^G \right\} \). We omit the derivation of the updates due to space constraints; Algorithm 1 provides pseudocode for the resulting updates. We run the algorithm five times starting from different random initializations and report the result with the highest lower bound F(q) .

Once the algorithm converges, we estimate the population-level quantities of interest as means of the corresponding approximating distributions:

$$\begin{aligned} \hat{\mu }_k = \mathbb {E} \left[ \mu _k | \mathcal {D} \right] \approx \mathbb {E}_q \left[ \mu _k; \tilde{\eta }^I_k\right]&, \quad \hat{\varSigma }_k = \mathbb {E} \left[ \varSigma _k | \mathcal {D} \right] \approx \mathbb {E}_q \left[ \varSigma _k^I; \tilde{\eta }^I_k \right] ,\\ \hat{\beta }_k = \mathbb {E}\left[ \beta _k | \mathcal {D} \right] \approx \mathbb {E}_q \left[ \beta _k^G ; \tilde{\eta }^G_k\right] . \end{aligned}$$

Each expectation above can be easily evaluated from the parameters of the corresponding distribution. In addition, we construct spatial maps that display the posterior probability of each population topic for each supervoxel in a particular subject s to visually evaluate the disease structure in that subject.

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Batmanghelich, N.K., Saeedi, A., Cho, M., Estepar, R.S.J., Golland, P. (2015). Generative Method to Discover Genetically Driven Image Biomarkers. In: Ourselin, S., Alexander, D., Westin, CF., Cardoso, M. (eds) Information Processing in Medical Imaging. IPMI 2015. Lecture Notes in Computer Science(), vol 9123. Springer, Cham. https://doi.org/10.1007/978-3-319-19992-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19992-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19991-7

  • Online ISBN: 978-3-319-19992-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics