Skip to main content

A MCMC Approach for Learning the Structure of Gaussian Acyclic Directed Mixed Graphs

  • Conference paper
  • First Online:

Abstract

Graphical models are widely used to encode conditional independence constraints and causal assumptions, the directed acyclic graph (DAG) being one of the most common families of models. However, DAGs are not closed under marginalization: that is, if a distribution is Markov with respect to a DAG, several of its marginals might not be representable with another DAG unless one discards some of the structural independencies. Acyclic directed mixed graphs (ADMGs) generalize DAGs so that closure under marginalization is possible. In a previous work, we showed how to perform Bayesian inference to infer the posterior distribution of the parameters of a given Gaussian ADMG model, where the graph is fixed. In this paper, we extend this procedure to allow for priors over graph structures.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Barbieri, M. M., & Berger, J. (2004). Optimal predictive model selection. The Annals of Statistics, 32, 870–897.

    Article  MathSciNet  MATH  Google Scholar 

  • Bartholomew, D., Knott, M. & Moustaki, I. (2011). Latent variable models and factor analysis: a unified approach. Wiley Series in Probability and Statistics.

    Google Scholar 

  • Bollen, K. (1989). Structural equations with latent variables. New York: Wiley.

    MATH  Google Scholar 

  • Carvalho, C., & Polson, N. (2010). The horseshoe estimator for sparse signals. Biometrika, 97, 465–480.

    Article  MathSciNet  MATH  Google Scholar 

  • Drton, M., & Richardson, T. (2004). Iterative conditional fitting for Gaussian ancestral graph models. In Proceedings of the 20th conference on uncertainty in artificial intelligence, (pp. 130–137). AUAI Press, Arlington, Virginia.

    Google Scholar 

  • Grzebyk, M., Wild, P., & Chouaniere, D. (2004). On identification of multi-factor models with correlated residuals. Biometrika, 91, 141–151.

    Article  MathSciNet  MATH  Google Scholar 

  • Jones, B., Carvalho, C., Dobra, A., Hans, C., Carter, C., & West, M. (2005). Experiments in stochastic computation for high-dimensional graphical models. Statistical Science, 20, 388–400.

    Article  MathSciNet  MATH  Google Scholar 

  • Lauritzen, S. (1996). Graphical models. Oxford: Oxford University Press.

    Google Scholar 

  • Richardson, T. (2003). Markov properties for acyclic directed mixed graphs. Scandinavian Journal of Statistics, 30, 145–157.

    Article  MATH  Google Scholar 

  • Richardson, T., & Spirtes, P. (2002). Ancestral graph Markov models. Annals of Statistics, 30, 962–1030.

    Article  MathSciNet  MATH  Google Scholar 

  • Sadeghi, K., & Lauritzen, S. (2012). Markov properties for mixed graphs. 247 arXiv:1109.5909v4.

    Google Scholar 

  • Silva, R., & Ghahramani, Z. (2009). The hidden life of latent variables: Bayesian learning with mixed graph models. Journal of Machine Learning Research, 10, 1187–1238.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo Silva .

Editor information

Editors and Affiliations

Appendix

Appendix

We describe the parameters referred to in the sampler of Sect. 3. The full derivation is based on previous results described by Silva and Ghahramani (2009). Let HH be the statistic \(\sum _{n=1}^{d}\mathbf{H}_{\mathit{sp}(i)}^{(d)}{\mathbf{H}_{\mathit{sp}(i)}^{(d)}}^{T}\). Likewise, let \(\mathbf{YH} \equiv \sum _{n=1}^{d}Y _{i}^{(d)}\mathbf{H}_{\mathit{sp}(i)}^{(d)}\) and \(\mathbf{YY} \equiv \sum _{n=1}^{d}{Y _{i}^{(d)}}^{2}\). Recall that the hyperparameters for the \(\mathcal{G}\)-inverse Wishart are δ and U, as given by Eq. (1) and as such we are computing a “conditional normalizing constant” for the posterior of V integrating over only one of the row/columns of V.

First, let

$$\displaystyle{ \begin{array}{rcl} \mathbf{A}_{i}& \equiv &\mathbf{V}_{\mathit{sp}(i),\mathit{nsp}(i)}\mathbf{V}_{\mathit{nsp}(i),\mathit{nsp}(i)}^{-1} \\ \mathbf{M}_{i}& \equiv &{(\mathbf{U}_{\setminus i,\setminus i})}^{-1}\mathbf{U}_{\setminus \mathit{i,i}} \\ \mathbf{m}_{i}& \equiv &(\mathbf{U}_{\mathit{ss}} -\mathbf{A}_{i}\mathbf{U}_{\mathit{ns}})\mathbf{M}_{\mathit{sp}(i)} + (\mathbf{U}_{\mathit{sn}} -\mathbf{A}_{i}\mathbf{U}_{\mathit{nn}})\mathbf{M}_{\mathit{nsp}(i)}\\ & & \\ \mathbf{K}_{\mathcal{B}}^{-1} & \equiv &\mathbf{U}_{\mathit{ss}} -\mathbf{A}_{i}\mathbf{U}_{\mathit{ns}} -\mathbf{U}_{\mathit{sn}}\mathbf{A}_{i}^{T} + \mathbf{A}_{i}\mathbf{U}_{\mathit{nn}}\mathbf{A}_{i}^{T} \\ \mu _{\mathcal{B}}&\equiv &\mathbf{K}_{\mathcal{B}}\mathbf{m}_{i}\\ \end{array} }$$
(11)

where

$$\displaystyle\begin{array}{rcl} & & \left [\begin{array}{cc} \mathbf{U}_{\mathit{ss}} & \mathbf{U}_{\mathit{sn}} \\ \mathbf{U}_{\mathit{ns}} & \mathbf{U}_{\mathit{nn}} \end{array} \right ] \equiv \left [\begin{array}{cc} \mathbf{U}_{\mathit{sp}(i),\mathit{sp}(i)} & \mathbf{U}_{\mathit{sp}(i),\mathit{nsp}(i)} \\ \mathbf{U}_{\mathit{nsp}(i),\mathit{sp}(i)} & \mathbf{U}_{\mathit{nsp}(i),\mathit{nsp}(i)} \end{array} \right ]{}\end{array}$$
(12)

Moreover, let

$$\displaystyle\begin{array}{rcl} & & \begin{array}{rcl} \mathcal{U}_{i}& \equiv &\mathbf{M}_{i}^{T}\mathbf{U}_{\setminus i,\setminus i}\mathbf{M}_{i} -\mathbf{m}_{i}^{T}\mathbf{K}_{i}\mathbf{m}_{i} \\ u_{\mathit{ii}.\setminus i}& \equiv &\mathbf{U}_{\mathit{ii}} -\mathbf{U}_{i,\setminus i}{(\mathbf{U}_{\setminus i,\setminus i})}^{-1}\mathbf{U}_{\setminus \mathit{i,i}} \\ & & \\ \alpha _{i}& \equiv &\left (\delta +p - 1 + \#\mathit{nsp}(i)\right )/2 \\ \beta _{i}& \equiv &\left(u_{\mathit{ii}.\setminus i} + \mathcal{U}_{i}\right )/2 \\ \mathbf{T}& \equiv &\mathbf{K}_{\mathcal{B}}^{-1} + \mathbf{HH} \\ \mathbf{q}& \equiv &\mathbf{YH} + \mathbf{K}_{\mathcal{B}}^{-1}\mu _{\mathcal{B}} \end{array} {}\end{array}$$
(13)

where #nsp(i) is the number of non-spouses of Y i (i.e., \((p - 1) -\sum _{j=1}^{p}z_{\mathit{ij}}\)).

Finally,

$$\displaystyle\begin{array}{rcl} & & \begin{array}{rcl} \alpha _{i}^{\prime}& \equiv &\frac{N} {2} +\alpha _{i}, \\ \beta _{i}^{\prime}& \equiv &\frac{\mathbf{YY} +\mu _{ \mathcal{B}}^{T}\mathbf{K}_{\mathcal{B}}^{-1}\mu _{\mathcal{B}}-{\mathbf{q}}^{T}{\mathbf{T}}^{-1}\mathbf{q}} {2} +\beta _{i} \end{array} {}\end{array}$$
(14)

Notice that each calculation of A i (and related products) takes \(\mathcal{O}({p}^{3})\) steps (assuming the number of non-spouses is \(\mathcal{O}(p)\) and the number of spouses is \(\mathcal{O}(1)\), which will be the case in sparse graphs). For each vertex Y i , an iteration could take \(\mathcal{O}({p}^{4})\) steps, and a full sweep would take prohibitive \(\mathcal{O}({p}^{5})\) steps. In order to scale this procedure up, some tricks can be employed. For instance, when iterating over each candidate spouse for a fixed Y i , the number of spouses increases or decreases by 1: this means fast matrix update schemes can be implemented to obtain a new A i from its current value. However, even in this case the cost would still be \(\mathcal{O}({p}^{4})\). More speed-ups follow from solving for \(\mathbf{V}_{\mathit{sp}(i),\mathit{nsp}(i)}\mathbf{V}_{\mathit{nsp}(i),\mathit{nsp}(i)}^{-1}\) using sparse matrix representations, which should cost less than \(\mathcal{O}({p}^{3})\) (but for small to moderate p, sparse matrix inversion might be slower than dense matrix inversion). Moreover, one might not try to evaluate all pairs \(Y _{i} \leftrightarrow Y _{j}\) if some pre-screening is done by looking only at pairs where the magnitude of corresponding correlation sampled in the last step lies within some interval.

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Silva, R. (2013). A MCMC Approach for Learning the Structure of Gaussian Acyclic Directed Mixed Graphs. In: Giudici, P., Ingrassia, S., Vichi, M. (eds) Statistical Models for Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00032-9_39

Download citation

Publish with us

Policies and ethics