# Maximum likelihood estimates for Markov networks using inhomogeneous Markov chains

## Abstract

Given an undirected graph with *n* nodes, and any probability distribution over the *n* two-valued nodes. Which is the closest (in the sense of the information-divergence) probability distribution, defining a Markov network on the given graph?

Solving this task is equivalent to finding the maximum likelihood estimate in the set of probability disributions which define a Markov network on the given graph. Therefore it is given by the “M-step” of the *E*xpectation-*M*inimization-(*EM*) algorithm. Termed in Amari’s information-geometric framework, the M-step is the m-projection (“m-step”) of the given or observed distribution on the set *M* of statistical models.

In the field of probabilistic expert systems, the natural approach to knowledge representation in Markov or Bayes networks is based on conditional probabilities. However, the conditional probability approach to Markov networks suffers from serious consistency problems. We present an algorithm, which uses inconsistent conditional probabilities in an iterative way as transistion probabilities in an inhomogeneous Markov chain. It is shown that this algorithm converges to the m-projection on the set of Gibbs-distributions of a given graph.

## Keywords

Markov Random Field Probabilistic Neural Network Information Geometry Boltzmann Machine Markov Network## Preview

Unable to display preview. Download preview PDF.

## References

- [1]S Amari. Information Geometry of the EM and
*em*Algorithms for Neural Networks. Technical report, Department of Mathematical Engineering and Information Physics, Faculty of Engineering,University of Tokyo, April 1994. to appear in Neural Networks.Google Scholar - [2]S Amari, K Kurata, and H Nagaoka. Information Geometry of Boltzmann machines.
*IEEE Transactions on Neural Networks*, 3 (2): 260–271, March 1992.CrossRefGoogle Scholar - [3]Y Bishop, S Fienberg, and P Holland.
*Discrete multivariate analysis*. The MIT Press, Cambridge, MA, Cambridge, Massachussets,London, 10th edition, 1989.Google Scholar - [4]W Byrne. Alternating Minimization and Boltzmann Machine Learning.
*IEEE Transactions on Neural Networks*, 3: 612–620, 1992.CrossRefGoogle Scholar - [5]I Csiszar. I-Divergence Geometry of Probability Distributions and Minimization Problems.
*The Annals of Probability*, 3 (1): 146–158, 1975.CrossRefMATHMathSciNetGoogle Scholar - [6]I Csiszhr and G Tusnâdy. Information Geometry and Alternating Minimization Problems. In
*Statistics & Decision*,*Supplement Issue No.1*, pages 205–237. R. Oldenburg Verlag, München, 1984.Google Scholar - [7]L Martignon, H von Hassein, S Grün, A Aertsen, and G Palm. Detecting higher-order interactions among the spiking events in a group of Neurons.
*Biological Cybernetics*,1995. to appear.Google Scholar - [8]H von Hassein. A
*consisteny algorithm for Markov networks*. Dep. of Neural Information Processing, Univ. of Ulm, Germany, 1995. Dissertation in preparation.Google Scholar