Skip to main content

An Information-Geometric Approach to Learning Bayesian Network Topologies from Data

  • Chapter
Innovations in Bayesian Networks

Part of the book series: Studies in Computational Intelligence ((SCI,volume 156))

Abstract

This work provides a general overview of structure learning of Bayesian networks (BNs), and goes on to explore the feasibility of applying an information-geometric approach to the task of learning the topology of a BN from data. An information-geometric scoring function based on the Minimum Description Length Principle is described. The info-geometric score takes into account the effects of complexity due to both the number of parameters in the BN, and the geometry of the statistical manifold on which the parametric family of probability distributions of the BN is mapped. The paper provides an introduction to information geometry, and lays out a theoretical framework supported by empirical evidence that shows that this info-geometric scoring function is at least as efficient as applying BIC (Bayesian information criterion); and that, for certain BN topologies, it can drastically increase the accuracy in the selection of the best possible BN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Friedman, N., Nachman, I., Peer, D.: Learning Bayesian Network Structures from Massive Datasets: The Sparse Candidate Algorithm. In: Proceedings of the Fifteenth Conference on Uncertainty in Articial Intelligence (UAI 1999), pp. 206–215 (1999)

    Google Scholar 

  2. Neapolitan, R.: Learning Bayesian Networks. Artificial Intelligence, Prentice-Hall, Englewood Cliffs (2003)

    Google Scholar 

  3. Cooper, G., Herskovits, E.: A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning 9(4), 309–347 (1992)

    MATH  Google Scholar 

  4. Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E.: Equation of state calculations by fast computing machines. Journal of Chemical Physics 21, 1087–1092 (1953)

    Article  Google Scholar 

  5. Kirkpatrick, S., Gelatt, D., Vecchi, M.: Optimization by simulated annealing. Science 220, 671–680 (1983)

    Article  MathSciNet  Google Scholar 

  6. Madigan, D., York, J.: Bayesian Graphical Methods for Discrete Data. International Statistical Review 63(2) (1995)

    Google Scholar 

  7. Friedman, N., Koller, D.: Being Bayesian About Network Structure: A Bayesian Approach to Structure Discovery in Bayesian Networks. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI) (2000)

    Google Scholar 

  8. Friedman, N.: Learning Bayesian networks in the presence of missing values and hidden variables. In: Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence (UAI) (1997)

    Google Scholar 

  9. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J.R. Statist. Soc. B 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  10. Kass, R.E., Tierney, L., Kadane, J.B.: The validity of posterior asymptotic expansions based on Laplace’s method. In: Geisser, S., Hodges, J.S., Press, S.J., Zellner, A. (eds.) Bayesian and Likelihood Methods in Statistics and Econometrics. North Holland, New York (1990)

    Google Scholar 

  11. Kass, R., Raftery, A.E.: Bayes factors and model uncertainty. Journal of the American Statistical Association 90, 773–795 (1995)

    Article  MATH  Google Scholar 

  12. Schwarz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461–464 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  13. Heckermann, D.: A tutorial on learning with Bayesian Networks. In: Jordan, M. (ed.) Learning in graphical models. MIT Press, Cambridge (1999)

    Google Scholar 

  14. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  15. Rissanen, J.: Modeling by the shortest data description. Automatica J. IFAC 14, 465–471 (1978)

    Article  MATH  Google Scholar 

  16. Shannon, C.: A Mathematical Theory of Communication. The Bell System Technical Journal 27, 379–423, 623–656 (1948)

    MathSciNet  Google Scholar 

  17. VitĂ¡nyi, P., Ming, L.: Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity. IEEE Transactions on Information Theory 46(2) (2000)

    Google Scholar 

  18. Solomonoff, R.J.: A formal theory of inductive inference. Inform. Contr. pt. 1, 2, 7, 224–254 (1964)

    MATH  MathSciNet  Google Scholar 

  19. Kolmogorov, A.N.: Three approaches to the quantitative definition of information. Probl. Inform. Transm. 1(1), 1–7 (1965)

    MathSciNet  Google Scholar 

  20. Chaitin, G.J.: A theory of program size formally identical to information theory. J. ACM 22, 329–340 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  21. Hansen, M., Yu, B.: Model Selection and the Principle of Minimum Description Length. JASA 96(454), 746–774 (2001)

    MATH  MathSciNet  Google Scholar 

  22. Rissanen, J.: Stochastic Complexity and Modeling. Annals of Statistics 14(3), 1080–1100 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  23. Lipschultz, M.: Differential Geometry. Schaum Series. McGraw-Hill, New York (1969)

    Google Scholar 

  24. Kreyszig, E.: Differential Geometry. Dover Publications (1991)

    Google Scholar 

  25. RodrĂ­guez, C.: Entropic priors, Tech. Report, SUNY Albany, Department of Mathematics and Statistics (1991)

    Google Scholar 

  26. Amari, S.I.: Differential Geometrical Methods in Statistics. Springer, Heidelberg (1985)

    MATH  Google Scholar 

  27. Amari, S.I., Nagaoka, H.: Methods of Information Geometry. Oxford University Press, Oxford (2000)

    MATH  Google Scholar 

  28. Cartan, E.: Sur la possibilite de plonger un espace riemannian donne un espace Euclidean. Ann. Soc. Pol. Math. 6, 1–7 (1927)

    Google Scholar 

  29. Janet, M.: Sur la possibilite de plonger un espace riemannian donne das un espace Euclidean. Ann. Soc. Math. Pol. 5, 74–85 (1931)

    Google Scholar 

  30. Nash, J.: The imbedding problem for Riemannian manifolds. Annals of Mathematics 63, 20–63 (1956)

    Article  MathSciNet  Google Scholar 

  31. RodrĂ­guez, C.: The Metrics Induced by the Kullback Number. In: Skilling, J. (ed.) Maximum Entropy and Bayesian Methods. Kluwer, Dordrecht (1989)

    Google Scholar 

  32. Jeffreys, H.: The Theory of Probability. Oxford University Press, Oxford (1961)

    Google Scholar 

  33. Rissanen, J.: Fisher Information and Stochastic Complexity. IEEE Transaction on Information Theory 42, 40–47 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  34. Balasubramanian, V.: A Geometric Formulation of Occam’s Razor for Inference of Parametric Distributions. Princeton physics preprint PUPT-1588, Princeton (1996)

    Google Scholar 

  35. Rodríguez, C.: Entropic priors for discrete probabilistic networks and for mixtures of Gaussian models. In: Proceedings of the 21st International Worskhop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, APL Johns Hopkins University, August 4–9 (2001)

    Google Scholar 

  36. Rodríguez, C.: The Volume of Bitnets. In: Proceedings of the 24th International Worskhop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, AIP Conference Proceedings, Garching, Germany, vol. 735(1), pp. 555–564 (2004)

    Google Scholar 

  37. Lauría, E.: Learning the Structure of a Bayesian Network: An Application of Information Geometry and the Minimum Description Length Principle. In: Knuth, K.H., Abbas, A.E., Morris, R.D., Castle, J.P. (eds.) Proceedings of the 25th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, San José State University, USA, pp. 293–301 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

LaurĂ­a, E.J.M. (2008). An Information-Geometric Approach to Learning Bayesian Network Topologies from Data. In: Holmes, D.E., Jain, L.C. (eds) Innovations in Bayesian Networks. Studies in Computational Intelligence, vol 156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85066-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85066-3_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85065-6

  • Online ISBN: 978-3-540-85066-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics