A MCMC Approach for Learning the Structure of Gaussian Acyclic Directed Mixed Graphs

Silva, Ricardo

doi:10.1007/978-3-319-00032-9_39

A MCMC Approach for Learning the Structure of Gaussian Acyclic Directed Mixed Graphs

Ricardo Silva⁴

Conference paper
First Online: 01 January 2013

5059 Accesses
5 Citations

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Abstract

Graphical models are widely used to encode conditional independence constraints and causal assumptions, the directed acyclic graph (DAG) being one of the most common families of models. However, DAGs are not closed under marginalization: that is, if a distribution is Markov with respect to a DAG, several of its marginals might not be representable with another DAG unless one discards some of the structural independencies. Acyclic directed mixed graphs (ADMGs) generalize DAGs so that closure under marginalization is possible. In a previous work, we showed how to perform Bayesian inference to infer the posterior distribution of the parameters of a given Gaussian ADMG model, where the graph is fixed. In this paper, we extend this procedure to allow for priors over graph structures.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Barbieri, M. M., & Berger, J. (2004). Optimal predictive model selection. The Annals of Statistics, 32, 870–897.
Article MathSciNet MATH Google Scholar
Bartholomew, D., Knott, M. & Moustaki, I. (2011). Latent variable models and factor analysis: a unified approach. Wiley Series in Probability and Statistics.
Google Scholar
Bollen, K. (1989). Structural equations with latent variables. New York: Wiley.
MATH Google Scholar
Carvalho, C., & Polson, N. (2010). The horseshoe estimator for sparse signals. Biometrika, 97, 465–480.
Article MathSciNet MATH Google Scholar
Drton, M., & Richardson, T. (2004). Iterative conditional fitting for Gaussian ancestral graph models. In Proceedings of the 20th conference on uncertainty in artificial intelligence, (pp. 130–137). AUAI Press, Arlington, Virginia.
Google Scholar
Grzebyk, M., Wild, P., & Chouaniere, D. (2004). On identification of multi-factor models with correlated residuals. Biometrika, 91, 141–151.
Article MathSciNet MATH Google Scholar
Jones, B., Carvalho, C., Dobra, A., Hans, C., Carter, C., & West, M. (2005). Experiments in stochastic computation for high-dimensional graphical models. Statistical Science, 20, 388–400.
Article MathSciNet MATH Google Scholar
Lauritzen, S. (1996). Graphical models. Oxford: Oxford University Press.
Google Scholar
Richardson, T. (2003). Markov properties for acyclic directed mixed graphs. Scandinavian Journal of Statistics, 30, 145–157.
Article MATH Google Scholar
Richardson, T., & Spirtes, P. (2002). Ancestral graph Markov models. Annals of Statistics, 30, 962–1030.
Article MathSciNet MATH Google Scholar
Sadeghi, K., & Lauritzen, S. (2012). Markov properties for mixed graphs. 247 arXiv:1109.5909v4.
Google Scholar
Silva, R., & Ghahramani, Z. (2009). The hidden life of latent variables: Bayesian learning with mixed graph models. Journal of Machine Learning Research, 10, 1187–1238.
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

University College London, Gower Street, London, WC1E 6BT, UK
Ricardo Silva

Authors

Ricardo Silva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ricardo Silva .

Editor information

Editors and Affiliations

Department of Economics, and Management, University of Pavia, Via San Felice 7, Pavia, 27100, Italy
Paolo Giudici
Department of Economics, and Business, University of Catania, Corso Italia 55, Catania, 95129, Italy
Salvatore Ingrassia
, Department of Statistics, University of Rome "La Sapienza", Piazzale Aldo Moro 5, Rome, 00185, Italy
Maurizio Vichi

Appendix

We describe the parameters referred to in the sampler of Sect. 3. The full derivation is based on previous results described by Silva and Ghahramani (2009). Let HH be the statistic $\sum _{n=1}^{d}\mathbf{H}_{\mathit{sp}(i)}^{(d)}{\mathbf{H}_{\mathit{sp}(i)}^{(d)}}^{T}$. Likewise, let $\mathbf{YH} \equiv \sum _{n=1}^{d}Y _{i}^{(d)}\mathbf{H}_{\mathit{sp}(i)}^{(d)}$ and $\mathbf{YY} \equiv \sum _{n=1}^{d}{Y _{i}^{(d)}}^{2}$. Recall that the hyperparameters for the $\mathcal{G}$-inverse Wishart are δ and U, as given by Eq. (1) and as such we are computing a “conditional normalizing constant” for the posterior of V integrating over only one of the row/columns of V.

First, let

$$\displaystyle{ \begin{array}{rcl} \mathbf{A}_{i}& \equiv &\mathbf{V}_{\mathit{sp}(i),\mathit{nsp}(i)}\mathbf{V}_{\mathit{nsp}(i),\mathit{nsp}(i)}^{-1} \\ \mathbf{M}_{i}& \equiv &{(\mathbf{U}_{\setminus i,\setminus i})}^{-1}\mathbf{U}_{\setminus \mathit{i,i}} \\ \mathbf{m}_{i}& \equiv &(\mathbf{U}_{\mathit{ss}} -\mathbf{A}_{i}\mathbf{U}_{\mathit{ns}})\mathbf{M}_{\mathit{sp}(i)} + (\mathbf{U}_{\mathit{sn}} -\mathbf{A}_{i}\mathbf{U}_{\mathit{nn}})\mathbf{M}_{\mathit{nsp}(i)}\\ & & \\ \mathbf{K}_{\mathcal{B}}^{-1} & \equiv &\mathbf{U}_{\mathit{ss}} -\mathbf{A}_{i}\mathbf{U}_{\mathit{ns}} -\mathbf{U}_{\mathit{sn}}\mathbf{A}_{i}^{T} + \mathbf{A}_{i}\mathbf{U}_{\mathit{nn}}\mathbf{A}_{i}^{T} \\ \mu _{\mathcal{B}}&\equiv &\mathbf{K}_{\mathcal{B}}\mathbf{m}_{i}\\ \end{array} }$$

(11)

where

$$\displaystyle\begin{array}{rcl} & & \left [\begin{array}{cc} \mathbf{U}_{\mathit{ss}} & \mathbf{U}_{\mathit{sn}} \\ \mathbf{U}_{\mathit{ns}} & \mathbf{U}_{\mathit{nn}} \end{array} \right ] \equiv \left [\begin{array}{cc} \mathbf{U}_{\mathit{sp}(i),\mathit{sp}(i)} & \mathbf{U}_{\mathit{sp}(i),\mathit{nsp}(i)} \\ \mathbf{U}_{\mathit{nsp}(i),\mathit{sp}(i)} & \mathbf{U}_{\mathit{nsp}(i),\mathit{nsp}(i)} \end{array} \right ]{}\end{array}$$

(12)

Moreover, let

$$\displaystyle\begin{array}{rcl} & & \begin{array}{rcl} \mathcal{U}_{i}& \equiv &\mathbf{M}_{i}^{T}\mathbf{U}_{\setminus i,\setminus i}\mathbf{M}_{i} -\mathbf{m}_{i}^{T}\mathbf{K}_{i}\mathbf{m}_{i} \\ u_{\mathit{ii}.\setminus i}& \equiv &\mathbf{U}_{\mathit{ii}} -\mathbf{U}_{i,\setminus i}{(\mathbf{U}_{\setminus i,\setminus i})}^{-1}\mathbf{U}_{\setminus \mathit{i,i}} \\ & & \\ \alpha _{i}& \equiv &\left (\delta +p - 1 + \#\mathit{nsp}(i)\right )/2 \\ \beta _{i}& \equiv &\left(u_{\mathit{ii}.\setminus i} + \mathcal{U}_{i}\right )/2 \\ \mathbf{T}& \equiv &\mathbf{K}_{\mathcal{B}}^{-1} + \mathbf{HH} \\ \mathbf{q}& \equiv &\mathbf{YH} + \mathbf{K}_{\mathcal{B}}^{-1}\mu _{\mathcal{B}} \end{array} {}\end{array}$$

(13)

where #nsp(i) is the number of non-spouses of Y _i (i.e., $(p - 1) -\sum _{j=1}^{p}z_{\mathit{ij}}$).

Finally,

$$\displaystyle\begin{array}{rcl} & & \begin{array}{rcl} \alpha _{i}^{\prime}& \equiv &\frac{N} {2} +\alpha _{i}, \\ \beta _{i}^{\prime}& \equiv &\frac{\mathbf{YY} +\mu _{ \mathcal{B}}^{T}\mathbf{K}_{\mathcal{B}}^{-1}\mu _{\mathcal{B}}-{\mathbf{q}}^{T}{\mathbf{T}}^{-1}\mathbf{q}} {2} +\beta _{i} \end{array} {}\end{array}$$

(14)

Notice that each calculation of A _i (and related products) takes $\mathcal{O}({p}^{3})$ steps (assuming the number of non-spouses is $\mathcal{O}(p)$ and the number of spouses is $\mathcal{O}(1)$, which will be the case in sparse graphs). For each vertex Y _i, an iteration could take $\mathcal{O}({p}^{4})$ steps, and a full sweep would take prohibitive $\mathcal{O}({p}^{5})$ steps. In order to scale this procedure up, some tricks can be employed. For instance, when iterating over each candidate spouse for a fixed Y _i, the number of spouses increases or decreases by 1: this means fast matrix update schemes can be implemented to obtain a new A _i from its current value. However, even in this case the cost would still be $\mathcal{O}({p}^{4})$. More speed-ups follow from solving for $\mathbf{V}_{\mathit{sp}(i),\mathit{nsp}(i)}\mathbf{V}_{\mathit{nsp}(i),\mathit{nsp}(i)}^{-1}$ using sparse matrix representations, which should cost less than $\mathcal{O}({p}^{3})$ (but for small to moderate p, sparse matrix inversion might be slower than dense matrix inversion). Moreover, one might not try to evaluate all pairs $Y _{i} \leftrightarrow Y _{j}$ if some pre-screening is done by looking only at pairs where the magnitude of corresponding correlation sampled in the last step lies within some interval.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Silva, R. (2013). A MCMC Approach for Learning the Structure of Gaussian Acyclic Directed Mixed Graphs. In: Giudici, P., Ingrassia, S., Vichi, M. (eds) Statistical Models for Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00032-9_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-00032-9_39
Published: 22 May 2013
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00031-2
Online ISBN: 978-3-319-00032-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Abstract

Buying options

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation