Abstract
Graphical models are widely used to encode conditional independence constraints and causal assumptions, the directed acyclic graph (DAG) being one of the most common families of models. However, DAGs are not closed under marginalization: that is, if a distribution is Markov with respect to a DAG, several of its marginals might not be representable with another DAG unless one discards some of the structural independencies. Acyclic directed mixed graphs (ADMGs) generalize DAGs so that closure under marginalization is possible. In a previous work, we showed how to perform Bayesian inference to infer the posterior distribution of the parameters of a given Gaussian ADMG model, where the graph is fixed. In this paper, we extend this procedure to allow for priors over graph structures.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Barbieri, M. M., & Berger, J. (2004). Optimal predictive model selection. The Annals of Statistics, 32, 870–897.
Bartholomew, D., Knott, M. & Moustaki, I. (2011). Latent variable models and factor analysis: a unified approach. Wiley Series in Probability and Statistics.
Bollen, K. (1989). Structural equations with latent variables. New York: Wiley.
Carvalho, C., & Polson, N. (2010). The horseshoe estimator for sparse signals. Biometrika, 97, 465–480.
Drton, M., & Richardson, T. (2004). Iterative conditional fitting for Gaussian ancestral graph models. In Proceedings of the 20th conference on uncertainty in artificial intelligence, (pp. 130–137). AUAI Press, Arlington, Virginia.
Grzebyk, M., Wild, P., & Chouaniere, D. (2004). On identification of multi-factor models with correlated residuals. Biometrika, 91, 141–151.
Jones, B., Carvalho, C., Dobra, A., Hans, C., Carter, C., & West, M. (2005). Experiments in stochastic computation for high-dimensional graphical models. Statistical Science, 20, 388–400.
Lauritzen, S. (1996). Graphical models. Oxford: Oxford University Press.
Richardson, T. (2003). Markov properties for acyclic directed mixed graphs. Scandinavian Journal of Statistics, 30, 145–157.
Richardson, T., & Spirtes, P. (2002). Ancestral graph Markov models. Annals of Statistics, 30, 962–1030.
Sadeghi, K., & Lauritzen, S. (2012). Markov properties for mixed graphs. 247 arXiv:1109.5909v4.
Silva, R., & Ghahramani, Z. (2009). The hidden life of latent variables: Bayesian learning with mixed graph models. Journal of Machine Learning Research, 10, 1187–1238.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
We describe the parameters referred to in the sampler of Sect. 3. The full derivation is based on previous results described by Silva and Ghahramani (2009). Let HH be the statistic \(\sum _{n=1}^{d}\mathbf{H}_{\mathit{sp}(i)}^{(d)}{\mathbf{H}_{\mathit{sp}(i)}^{(d)}}^{T}\). Likewise, let \(\mathbf{YH} \equiv \sum _{n=1}^{d}Y _{i}^{(d)}\mathbf{H}_{\mathit{sp}(i)}^{(d)}\) and \(\mathbf{YY} \equiv \sum _{n=1}^{d}{Y _{i}^{(d)}}^{2}\). Recall that the hyperparameters for the \(\mathcal{G}\)-inverse Wishart are δ and U, as given by Eq. (1) and as such we are computing a “conditional normalizing constant” for the posterior of V integrating over only one of the row/columns of V.
First, let
where
Moreover, let
where #nsp(i) is the number of non-spouses of Y i (i.e., \((p - 1) -\sum _{j=1}^{p}z_{\mathit{ij}}\)).
Finally,
Notice that each calculation of A i (and related products) takes \(\mathcal{O}({p}^{3})\) steps (assuming the number of non-spouses is \(\mathcal{O}(p)\) and the number of spouses is \(\mathcal{O}(1)\), which will be the case in sparse graphs). For each vertex Y i , an iteration could take \(\mathcal{O}({p}^{4})\) steps, and a full sweep would take prohibitive \(\mathcal{O}({p}^{5})\) steps. In order to scale this procedure up, some tricks can be employed. For instance, when iterating over each candidate spouse for a fixed Y i , the number of spouses increases or decreases by 1: this means fast matrix update schemes can be implemented to obtain a new A i from its current value. However, even in this case the cost would still be \(\mathcal{O}({p}^{4})\). More speed-ups follow from solving for \(\mathbf{V}_{\mathit{sp}(i),\mathit{nsp}(i)}\mathbf{V}_{\mathit{nsp}(i),\mathit{nsp}(i)}^{-1}\) using sparse matrix representations, which should cost less than \(\mathcal{O}({p}^{3})\) (but for small to moderate p, sparse matrix inversion might be slower than dense matrix inversion). Moreover, one might not try to evaluate all pairs \(Y _{i} \leftrightarrow Y _{j}\) if some pre-screening is done by looking only at pairs where the magnitude of corresponding correlation sampled in the last step lies within some interval.
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Silva, R. (2013). A MCMC Approach for Learning the Structure of Gaussian Acyclic Directed Mixed Graphs. In: Giudici, P., Ingrassia, S., Vichi, M. (eds) Statistical Models for Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00032-9_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-00032-9_39
Published:
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00031-2
Online ISBN: 978-3-319-00032-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)