A Generalized Iterative Scaling Algorithm for Maximum Entropy Model Computations Respecting Probabilistic Independencies

Wilhelm, Marco; Kern-Isberner, Gabriele; Finthammer, Marc; Beierle, Christoph

doi:10.1007/978-3-319-90050-6_21

Marco Wilhelm¹⁵,
Gabriele Kern-Isberner¹⁵,
Marc Finthammer¹⁶ &
…
Christoph Beierle¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10833))

Included in the following conference series:

International Symposium on Foundations of Information and Knowledge Systems

484 Accesses
4 Citations

Abstract

Maximum entropy distributions serve as favorable models for commonsense reasoning based on probabilistic conditional knowledge bases. Computing these distributions requires solving high-dimensional convex optimization problems, especially if the conditionals are composed of first-order formulas. In this paper, we propose a highly optimized variant of generalized iterative scaling for computing maximum entropy distributions. As a novel feature, our improved algorithm is able to take probabilistic independencies into account that are established by the principle of maximum entropy. This allows for exploiting the logical information given by the knowledge base, represented as weighted conditional impact systems, in a very condensed way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In this paper, predicate and variable names will always begin with an uppercase letter and constant names with a lowercase letter.
2.
Actually, the numbers of adjustment steps are smaller in both cases since we group (partial) possible worlds with the same conditional impact together (weighted conditional impacts) and filter out “impossible” worlds beforehand.
3.
We say that $\mathcal {R}_1$ and $\mathcal {R}_2$ share a ground atom $A\in \mathcal {G}_\varSigma $ if there are $r_1\in \mathcal {R}_1$ and $r_2\in \mathcal {R}_2$ with ground instances $r'_1\in \mathsf {Grnd}(r_1)$ and $r'_2\in \mathsf {Grnd}(r_2)$ that both contain the ground atom A.
4.
Consider the bijection $\beta :\varOmega _{\mathcal {G}_c}\rightarrow \varOmega _{\mathcal {G}_d}$ which simply replaces the constant c with the constant d whenever c occurs.
5.
This representation of $\mathcal {P}^\mathsf {ME}_{\mathcal {R}}$ exists except for very rare pathological cases which can be circumvented by prescient knowledge engineering.
6.
More precisely, uniform marginals of the probability distribution are considered in order to avoid iterations over the whole probability distribution.
7.
Correctness here means that $\alpha _0,\alpha _1,\ldots ,\alpha _m$ can be calculated with any precision if the loop in Step 4 is executed sufficiently often.
8.
Here, $\varGamma $ is the set of all ordinary $\mathsf {WCI}$s with respect to the knowledge base $\mathcal {R}$.

References

Getoor, L., Taskar, B. (eds.): Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)
MATH Google Scholar
Raedt, L.D., Frasconi, P., Kersting, K., Muggleton, S.H. (eds.): Probabilistic Inductive Logic Programming. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78652-8
Book MATH Google Scholar
Van Den Broeck, G.: First-order model counting in a nutshell. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), pp. 4086–4089. AAAI Press (2016)
Google Scholar
Paris, J.B.: The Uncertain Reasoner’s Companion - A Mathematical Perspective. Cambridge University Press, Cambridge (1994)
MATH Google Scholar
Kern-Isberner, G.: Conditionals in Nonmonotonic Reasoning and Belief Revision. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44600-1
Book MATH Google Scholar
Finthammer, M., Beierle, C.: A two-level approach to maximum entropy model computation for relational probabilistic logic based on weighted conditional impacts. In: Straccia, U., Calì, A. (eds.) SUM 2014. LNCS (LNAI), vol. 8720, pp. 162–175. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11508-5_14
Chapter Google Scholar
Thimm, M., Kern-Isberner, G.: On probabilistic inference in relational conditional logics. Logic J. IGPL 20(5), 872–908 (2012)
Article MathSciNet Google Scholar
Halpern, J.Y.: An analysis of first-order logics of probability. Artif. Intell. 46(3), 311–350 (1990)
Article MathSciNet Google Scholar
Paris, J.B.: Common sense and maximum entropy. Synthese 117(1), 75–93 (1999)
Article MathSciNet Google Scholar
Darroch, J.N., Ratcliff, D.: Generalized iterative scaling for log-linear models. Ann. Math. Stat. 43(5), 1470–1480 (1972)
Article MathSciNet Google Scholar
Koller, D., Friedman, N.: Probabilistic Graphical Models. MIT Press, Cambridge (2009)
MATH Google Scholar
Kern-Isberner, G., Thimm, M.: A ranking semantics for first-order conditionals. In: Proceedings of the 20th European Conference on Artificial Intelligence (ECAI). FAIA, vol. 242, pp. 456–461. IOS Press (2012)
Google Scholar
Finthammer, M., Beierle, C.: Using equivalences of worlds for aggregation semantics of relational conditionals. In: Glimm, B., Krüger, A. (eds.) KI 2012. LNCS (LNAI), vol. 7526, pp. 49–60. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33347-7_5
Chapter Google Scholar
Wilhelm, M., Kern-Isberner, G., Ecke, A.: Basic independence results for maximum entropy reasoning based on relational conditionals. In: Proceedings of the 3rd Global Conference on Artificial Intelligence (GCAI). EPiC Series in Computing, vol. 50, pp. 36–50 (2017)
Google Scholar
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6(6), 721–741 (1984)
Article Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book Google Scholar

Download references

Acknowledgements

This research was supported by the German National Science Foundation (DFG), Research Unit FOR 1513 on Hybrid Reasoning for Intelligent Systems.

Author information

Authors and Affiliations

Department of Computer Science, TU Dortmund, Dortmund, Germany
Marco Wilhelm & Gabriele Kern-Isberner
Department of Computer Science, University of Hagen, Hagen, Germany
Marc Finthammer & Christoph Beierle

Authors

Marco Wilhelm
View author publications
You can also search for this author in PubMed Google Scholar
Gabriele Kern-Isberner
View author publications
You can also search for this author in PubMed Google Scholar
Marc Finthammer
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Beierle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Wilhelm .

Editor information

Editors and Affiliations

Software Competence Center Hagenberg, Hagenberg im Mühlkreis, Austria
Flavio Ferrarotti
Vienna University of Technology, Vienna, Austria
Stefan Woltran

Proofs of Results

Proposition 1

Let $\mathcal {R}$ be a consistent knowledge base, and let $\{\mathcal {G}_1,\ldots ,\mathcal {G}_k\}$ be a syntax partition for $\mathcal {R}$. For all $\omega \in \varOmega $,

$$ \mathcal {P}^\mathsf {ME}_\mathcal {R}(\omega )=\prod _{j=1}^k \mathcal {P}^\mathsf {ME}_\mathcal {R}(\omega _{\mathcal {G}_j}). $$

Proof

We give a proof for those cases in which the representation (5) of $\mathcal {P}^\mathsf {ME}_\mathcal {R}$ exists. The normalizing constant can be written as $\alpha _0=\sum _{\omega \in \varOmega }\prod _{i=1}^m \alpha _i^{f_i(\omega )}$ where $f_{X}(C)$ abbreviates $(1-p_i)\cdot \mathsf {ver}_{X}(C)-p_i\cdot \mathsf {fal}_X(C)$ for any ground formula $C\in \mathsf {FOL}$. Further, let $\mathfrak {R}=\{R^1_G,\ldots ,R^k_G\}$ be a $\{\mathcal {G}_1,\ldots ,\mathcal {G}_k\}$-respecting decomposition of $\mathcal {R}$ with $R^j_G=\{R^j_1,\ldots ,R^j_n\}$ for $j=1,\ldots ,k$. Then, $\alpha _0=\prod _{j=1}^k \alpha _0^j$ holds where $\alpha _0^j=\sum _{\omega _j\in \varOmega _{\mathcal {G}_j}}\prod _{i=1}^m \alpha _i^{f_i(\omega _j)}$. For $\omega \in \varOmega \setminus \varOmega ^0$, it follows that

$$\begin{aligned} \mathcal {P}^\mathsf {ME}_\mathcal {R}(\omega ) =&\alpha _0 \prod _{i=1}^m \alpha _i^{f_i(\omega )} = \alpha _0 \prod _{i=1}^m \prod _{j=1}^k \alpha _i^{f_{R^j_i}(\omega _{\mathcal {G}_j})} \\ =&\prod _{j=1}^k \Big [\left( \alpha _0^j \prod _{i=1}^m \alpha _i^{f_{R^j_i}(\omega _{\mathcal {G}_j})}\right) \cdot \prod _{l\ne j} \underbrace{\left( \sum _{\omega '_l\in \varOmega _{\mathcal {G}_l}} \alpha _0^l \prod _{i=1}^m \alpha _i^{f_{R^l_i}(\omega '_l)} \right) }_{=1}\Big ] \\ =&\prod _{j=1}^k \Big ( \sum _{\begin{array}{c} \omega '\in \varOmega \\ \omega '\,{\models }\,\omega _{\mathcal {G}_j} \end{array}} \alpha _0 \prod _{i=1}^m \prod _{l=1}^k \alpha _i^{f_{R^l_i}(\omega _{\mathcal {G}_l})}\Big ) = \prod _{j=1}^k \Big ( \sum _{\begin{array}{c} \omega '\in \varOmega \\ \omega '\,{\models }\,\omega _{\mathcal {G}_j} \end{array}} \alpha _0 \prod _{i=1}^m \alpha _i^ {f_i(\omega ')} \Big ) \\ =&\prod _{j=1}^k \mathcal {P}^\mathsf {ME}_\mathcal {R}(\omega _{\mathcal {G}_j}). \end{aligned}$$

If $\omega \in \varOmega ^0$, there is a deterministic conditional $r=(B|A)[p]\in \mathcal {R}$ and an index $l\in \{1,\ldots ,k\}$ such that $\mathsf {ver}_{\mathsf {Grnd}(r)}(\omega _{\mathcal {G}_l})>0$ if $p=0$ and $\mathsf {fal}_{\mathsf {Grnd}(r)}(\omega _{\mathcal {G}_l})>0$ if $p=1$. As a consequence, every $\omega '$ with $\omega '\,{\models }\,\omega _{\mathcal {G}_l}$ is a null-world, and

$$ \prod _{j=1}^k \mathcal {P}^\mathsf {ME}_\mathcal {R}(\omega _{\mathcal {G}_j})= \left( \sum _{\omega '\,{\models }\,\omega _{\mathcal {G}_l}} \mathcal {P}^\mathsf {ME}_\mathcal {R}(\omega ')\right) \cdot \prod _{j\ne l} \mathcal {P}^\mathsf {ME}_{\mathcal {R}}(\omega _{\mathcal {G}_j})=0\cdot \prod _{j\ne l} \mathcal {P}^\mathsf {ME}_{\mathcal {R}}(\omega _{\mathcal {G}_j})=0 $$

as required. $\square $

Proposition 2

Let $\mathcal {R}$ be a knowledge base, let $\mathfrak {G}$ be a syntax partition for $\mathcal {R}$, and let $\mathfrak {R}$ be a $\mathfrak {G}$-respecting decomposition of $\mathcal {R}$ as described above. If $\omega \in \varOmega $ is not a null-world, then

$$\begin{aligned} {\varvec{\gamma }}_{\mathcal {R}_G}(\omega )=\big (( \sum _{j=1}^k (\gamma _{R^j_i}(\omega _{\mathcal {G}_j})_i)_1, \sum _{j=1}^k (\gamma _{R^j_i}(\omega _{\mathcal {G}_j})_i)_2 )\big )_{i=1,\ldots ,m}. \end{aligned}$$

If $\omega $ is a null-world, then ${\varvec{\gamma }}_{\mathcal {R}^j_G}(\omega _{\mathcal {G}_j})$ is undefined for at least one $j\in \{1,\ldots ,k\}$.

Proof

Let $\omega \in \varOmega \setminus \varOmega ^0$. By definition, $\varvec{\gamma }_{\mathcal {R}_\mathcal {G}}(\omega )=((\mathsf {ver}_i(\omega ),\mathsf {fal}_i(\omega )))_{i=1,\ldots ,m}$. Since $\mathfrak {R}$ is a $\mathfrak {G}$-respecting decomposition of $\mathcal {R}$, $\mathsf {ver}_{i}(\omega )=\sum _{j=1}^k \mathsf {ver}_{R^j_i}(\omega _{\mathcal {G}_j})$ as well as $\mathsf {fal}_{i}(\omega )=\sum _{j=1}^k \mathsf {fal}_{R^j_i}(\omega _{\mathcal {G}_j})$ hold for $i=1,\ldots ,n$, and hence, in particular, this holds for $i=1,\ldots ,m$ (since $m\le n$). By applying the definition of $\varvec{\gamma }_{R^j_i}(\omega _{\mathcal {G}_j})$, the proposition follows. As syntax partitions also take deterministic conditionals into account, the statement concerning null-worlds follows immediately. $\square $

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wilhelm, M., Kern-Isberner, G., Finthammer, M., Beierle, C. (2018). A Generalized Iterative Scaling Algorithm for Maximum Entropy Model Computations Respecting Probabilistic Independencies. In: Ferrarotti, F., Woltran, S. (eds) Foundations of Information and Knowledge Systems. FoIKS 2018. Lecture Notes in Computer Science(), vol 10833. Springer, Cham. https://doi.org/10.1007/978-3-319-90050-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-90050-6_21
Published: 18 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90049-0
Online ISBN: 978-3-319-90050-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Generalized Iterative Scaling Algorithm for Maximum Entropy Model Computations Respecting Probabilistic Independencies

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Proofs of Results

Proofs of Results

Proposition 1

Proof

Proposition 2

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation