Machine Learning

, Volume 76, Issue 2–3, pp 195–209 | Cite as

Learning multi-linear representations of distributions for efficient inference



We examine the class of multi-linear representations (MLR) for expressing probability distributions over discrete variables. Recently, MLR have been considered as intermediate representations that facilitate inference in distributions represented as graphical models.

We show that MLR is an expressive representation of discrete distributions and can be used to concisely represent classes of distributions which have exponential size in other commonly used representations, while supporting probabilistic inference in time linear in the size of the representation. Our key contribution is presenting techniques for learning bounded-size distributions represented using MLR, which support efficient probabilistic inference. We demonstrate experimentally that the MLR representations we learn support accurate and very efficient inference.


Learning probability distributions Multi-linear polynomials Probabilistic inference Graphical models 


  1. Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In SIGMOD ’93, 1993 (pp. 207–216). Google Scholar
  2. Asuncion, A., & Newman, D. (2007). UCI Machine learning repository. Google Scholar
  3. Burdick, D., Calimlim, M., Flannick, J., Yiu, T., & Gehrke, J. (2005). MAFIA: A maximal frequent itemset algorithm. IEEE Transactions of Knowledge Data Engineering, 17, 1490–1504. CrossRefGoogle Scholar
  4. Castillo, E., Gutiérrez, J. M., & Hadi, A. S. (1996). Goal oriented symbolic propagation in Bayesian networks. In AAAI/IAAI 1996 (Vol. 2, pp. 1263–1268). Google Scholar
  5. Castillo, E., Gutiérrez, J. M., Hadi, A. S., & Solares, C. (1997). Symbolic propagation and sensitivity analysis in Gaussian Bayesian networks with application to damage assessment. AI in Engineering, 11(2), 173–181. Google Scholar
  6. Chickering, D. M. (2002). The WinMine Toolkit (Tech. Rep. MSR-TR-2002-103). Microsoft, Redmond, WA. Google Scholar
  7. Darwiche, A. (2001). Recursive conditioning. Artificial Intelligence, 126(1–2), 5–41. MATHCrossRefMathSciNetGoogle Scholar
  8. Darwiche, A. (2003). A differential approach to inference in Bayesian networks. Journal of the ACM, 50(3), 280–305. CrossRefMathSciNetGoogle Scholar
  9. Dechter, R. (1996). Bucket elimination: A unifying framework for probabilistic inference. In UAI 1996 (pp. 211–219). Google Scholar
  10. Gilks, W. R., Richardson, S., & Speigelhalter, D. J. (1995). Markov chain Monte Carlo in practice. Boca Raton: Chapman & Hall/CRC. Google Scholar
  11. Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., & Sharm, R. S. (2003). Discovering all most specific sentences. ACM Transactions on Database Systems, 28(2), 140–174. CrossRefGoogle Scholar
  12. Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3), 197–243. MATHGoogle Scholar
  13. Jaeger, M., Nielsen, J. D., & Silander, T. (2006). Learning probabilistic decision graphs. International Journal of Approximate Reasoning, 42(1–2), 84–100. MATHCrossRefMathSciNetGoogle Scholar
  14. Jensen, F. V., Lauritzen, S., & Olesen, K. (1990). Bayesian updating in recursive graphical models by local computation. Computational Statistics Quarterly, 4, 269–282. MathSciNetGoogle Scholar
  15. Lowd, D., & Domingos, P. (2005). Naive Bayes models for probability estimation. In Proc. ICML-05 (pp. 529–536). Google Scholar
  16. Lowd, D., & Domingos, P. (2008). Learning arithmetic circuits. In UAI 2008 (pp. 383–392). Google Scholar
  17. Meila, M., & Jordan, M. I. (2000). Learning with mixtures of trees. Journal of Machine Learning Research, 1, 1–48. CrossRefMathSciNetGoogle Scholar
  18. Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In: ICDT’99, 1999 (pp. 398–416). Google Scholar
  19. Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Mateo: Morgan Kaufman. Google Scholar
  20. Roth, D. (1996). On the hardness of approximate reasoning. Artificial Intelligence, 82(1–2), 273–302. CrossRefMathSciNetGoogle Scholar
  21. Shachter, R. D., D’Ambrosio, B., & Favero, B. D. (1980). Symbolic probabilistic inference in belief networks. In AAAI 1990 (pp. 126–131). Google Scholar
  22. Srebro, N. (2003). Maximum likelihood bounded tree-width Markov networks. Artificial Intelligence, 143(1), 123–138. MATHCrossRefMathSciNetGoogle Scholar
  23. Wainwright, M. J., & Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1–2), 1–305. Google Scholar
  24. Yedidia, J. S., Freeman, W.T., Weiss, Y., (2005). Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51(7), 2282–2312. CrossRefMathSciNetGoogle Scholar
  25. Zhang, N. L., & Poole, D. (1996). Exploiting causal independence in Bayesian network inference. Journal Artificial Intelligence Research (JAIR), 5, 301–328. MATHMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignIllinoisUSA

Personalised recommendations