Abstract
This work provides a general overview of structure learning of Bayesian networks (BNs), and goes on to explore the feasibility of applying an information-geometric approach to the task of learning the topology of a BN from data. An information-geometric scoring function based on the Minimum Description Length Principle is described. The info-geometric score takes into account the effects of complexity due to both the number of parameters in the BN, and the geometry of the statistical manifold on which the parametric family of probability distributions of the BN is mapped. The paper provides an introduction to information geometry, and lays out a theoretical framework supported by empirical evidence that shows that this info-geometric scoring function is at least as efficient as applying BIC (Bayesian information criterion); and that, for certain BN topologies, it can drastically increase the accuracy in the selection of the best possible BN.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Friedman, N., Nachman, I., Peer, D.: Learning Bayesian Network Structures from Massive Datasets: The Sparse Candidate Algorithm. In: Proceedings of the Fifteenth Conference on Uncertainty in Articial Intelligence (UAI 1999), pp. 206–215 (1999)
Neapolitan, R.: Learning Bayesian Networks. Artificial Intelligence, Prentice-Hall, Englewood Cliffs (2003)
Cooper, G., Herskovits, E.: A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning 9(4), 309–347 (1992)
Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E.: Equation of state calculations by fast computing machines. Journal of Chemical Physics 21, 1087–1092 (1953)
Kirkpatrick, S., Gelatt, D., Vecchi, M.: Optimization by simulated annealing. Science 220, 671–680 (1983)
Madigan, D., York, J.: Bayesian Graphical Methods for Discrete Data. International Statistical Review 63(2) (1995)
Friedman, N., Koller, D.: Being Bayesian About Network Structure: A Bayesian Approach to Structure Discovery in Bayesian Networks. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI) (2000)
Friedman, N.: Learning Bayesian networks in the presence of missing values and hidden variables. In: Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence (UAI) (1997)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J.R. Statist. Soc. B 39, 1–38 (1977)
Kass, R.E., Tierney, L., Kadane, J.B.: The validity of posterior asymptotic expansions based on Laplace’s method. In: Geisser, S., Hodges, J.S., Press, S.J., Zellner, A. (eds.) Bayesian and Likelihood Methods in Statistics and Econometrics. North Holland, New York (1990)
Kass, R., Raftery, A.E.: Bayes factors and model uncertainty. Journal of the American Statistical Association 90, 773–795 (1995)
Schwarz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461–464 (1978)
Heckermann, D.: A tutorial on learning with Bayesian Networks. In: Jordan, M. (ed.) Learning in graphical models. MIT Press, Cambridge (1999)
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
Rissanen, J.: Modeling by the shortest data description. Automatica J. IFAC 14, 465–471 (1978)
Shannon, C.: A Mathematical Theory of Communication. The Bell System Technical Journal 27, 379–423, 623–656 (1948)
VitĂ¡nyi, P., Ming, L.: Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity. IEEE Transactions on Information Theory 46(2) (2000)
Solomonoff, R.J.: A formal theory of inductive inference. Inform. Contr. pt. 1, 2, 7, 224–254 (1964)
Kolmogorov, A.N.: Three approaches to the quantitative definition of information. Probl. Inform. Transm. 1(1), 1–7 (1965)
Chaitin, G.J.: A theory of program size formally identical to information theory. J. ACM 22, 329–340 (1975)
Hansen, M., Yu, B.: Model Selection and the Principle of Minimum Description Length. JASA 96(454), 746–774 (2001)
Rissanen, J.: Stochastic Complexity and Modeling. Annals of Statistics 14(3), 1080–1100 (1986)
Lipschultz, M.: Differential Geometry. Schaum Series. McGraw-Hill, New York (1969)
Kreyszig, E.: Differential Geometry. Dover Publications (1991)
RodrĂguez, C.: Entropic priors, Tech. Report, SUNY Albany, Department of Mathematics and Statistics (1991)
Amari, S.I.: Differential Geometrical Methods in Statistics. Springer, Heidelberg (1985)
Amari, S.I., Nagaoka, H.: Methods of Information Geometry. Oxford University Press, Oxford (2000)
Cartan, E.: Sur la possibilite de plonger un espace riemannian donne un espace Euclidean. Ann. Soc. Pol. Math. 6, 1–7 (1927)
Janet, M.: Sur la possibilite de plonger un espace riemannian donne das un espace Euclidean. Ann. Soc. Math. Pol. 5, 74–85 (1931)
Nash, J.: The imbedding problem for Riemannian manifolds. Annals of Mathematics 63, 20–63 (1956)
RodrĂguez, C.: The Metrics Induced by the Kullback Number. In: Skilling, J. (ed.) Maximum Entropy and Bayesian Methods. Kluwer, Dordrecht (1989)
Jeffreys, H.: The Theory of Probability. Oxford University Press, Oxford (1961)
Rissanen, J.: Fisher Information and Stochastic Complexity. IEEE Transaction on Information Theory 42, 40–47 (1996)
Balasubramanian, V.: A Geometric Formulation of Occam’s Razor for Inference of Parametric Distributions. Princeton physics preprint PUPT-1588, Princeton (1996)
RodrĂguez, C.: Entropic priors for discrete probabilistic networks and for mixtures of Gaussian models. In: Proceedings of the 21st International Worskhop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, APL Johns Hopkins University, August 4–9 (2001)
RodrĂguez, C.: The Volume of Bitnets. In: Proceedings of the 24th International Worskhop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, AIP Conference Proceedings, Garching, Germany, vol. 735(1), pp. 555–564 (2004)
LaurĂa, E.: Learning the Structure of a Bayesian Network: An Application of Information Geometry and the Minimum Description Length Principle. In: Knuth, K.H., Abbas, A.E., Morris, R.D., Castle, J.P. (eds.) Proceedings of the 25th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, San JosĂ© State University, USA, pp. 293–301 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
LaurĂa, E.J.M. (2008). An Information-Geometric Approach to Learning Bayesian Network Topologies from Data. In: Holmes, D.E., Jain, L.C. (eds) Innovations in Bayesian Networks. Studies in Computational Intelligence, vol 156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85066-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-85066-3_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85065-6
Online ISBN: 978-3-540-85066-3
eBook Packages: EngineeringEngineering (R0)