Probabilistic Logic and Relational Models
Synonyms
Glossary
 FirstOrder Predicate Logic

Formal system of mathematical logic that supports reasoning about structures consisting of a domain on which certain relations and functions are defined.
 Bayesian Network

A graphical representation for the joint probability distribution of a set of random variables. A Bayesian network is specified by a directed acyclic graph whose nodes are the random variables and the conditional probability distributions of the random variables given their parents in the graph.
 Markov Network

A graphical representation for the joint probability distribution of a set of random variables. A Markov network is specified by an undirected graph whose nodes are the random variables and potential functions defined on the cliques of the graph.
 Horn Clause

A special class of “if … then” expressions in firstorder predicate logic, where the if condition is a conjunction of atomic statements, and the then conclusion is a single atomic statement.
 Domain

a set of specific entities (also referred to as objects, nodes, etc.) for which a (probabilistic) model is to be constructed. In the context of probabilistic logic models, domains are mostly assumed to be finite.
 Factorization

Representation of a joint probability distribution of multiple random variables as a product of functions (factors) that each depend only on a subset of the random variables.
Definition
A probabilistic logic model (PLM) is a statistical model for relationally structured data. PLMs are specified in formal probabilistic logic modeling languages (PLMLs), which are accompanied by general algorithmic tools for specifying, analyzing, and learning probabilistic models. Elements of firstorder logic syntax and semantics are used to define probability spaces and distributions.
Introduction
Probabilistic logic modeling languages provide tools to construct probabilistic models for richly structured data, specifically graph and network data, or, generally, data contained in a relational database. Their logicbased representation languages support model specifications at a high level of abstraction and generality, which facilitates the adaptation of a single model to different data domains.
A PLML consists of syntax and semantics of the representation language, inference algorithms for computing the answers to probabilistic queries, and, in most cases, statistical methods for learning a model from data.
PLMs are distinguished from probabilistic logics (e.g., Nilsson 1986; Halpern 1990) in that the latter do not define a unique probabilistic model. Instead, they provide logicbased representation languages that can be used to formulate constraints on a set of possible models. Inference from such a probabilistic logic knowledge base then amounts to infer properties that hold for all admissible models. In the case of simple probabilistic queries for the probability of specific propositions, this means that only intervals of possible values can usually be derived, whereas a PLM will define a unique value.
Key Points
A PLML is a uniform representation and inference framework for a wide spectrum of probabilistic modeling tasks in different application areas. For example, models for community detection, link prediction, or information diffusion in social network analysis, localization and navigation in robotics, or protein secondary structure prediction in bioinformatics can all be built using the same highlevel, interpretable PLML.
Fundamental analysis and learning tasks are supported by general purpose algorithms. PLMLs, thus, enable data analysis by only formulating different model hypotheses (a process that can also be automated), without the need to design suitable inference routines on a casebycase basis. They are ideally suited for exploratory analysis and rapid prototyping. The high level of flexibility often comes at the cost that for a concrete model designed for a specific application the generic algorithms do not obtain the best possible computational performance. Modelspecific customization and optimization of the inference routines may then still be required in order to handle largescale problems on a PLM platform.
Historical Background
The development of probabilistic logic modeling languages is based on two main foundations: inductive logic programming and probabilistic graphical models. Inductive logic programming is an area of machine learning that is concerned with learning concept definitions from examples, where both the hypothesis space of concept definitions and the examples as well as available background knowledge are expressed by formulas in firstorder logic, specifically by formulas from the restricted class of Horn clauses. Originally purely qualitative, inductive logic programming approaches were extended with quantitative, probabilistic elements to also deal with uncertainty and noisy data (Sato 1995; Muggleton 1996).
Probabilistic graphical models, such as Bayesian networks and Markov networks, provide efficient inference and learning techniques for probability distributions over multivariate state spaces. However, they lack flexibility in the sense that even small changes in the underlying state space will usually require a complete reconstruction of the graphical model. Research in artificial intelligence aimed to provide more adaptable and reusable models based on representation languages that express probabilistic dependencies at a higher level of abstraction than graphical models. From probabilistic knowledgebases in such representation languages one can then automatically generate concrete graphical models for specific state spaces. Logicbased representations, especially Horn clauses, also played an important role in the design of these knowledgebased model construction approaches (Breese 1992; Breese et al. 1994; Haddawy 1994).
In a related development in a more purely statistical context, the BUGS framework was developed to support compact specifications and generic inference modules for complex Bayesian statistical models (Gilks et al. 1994). BUGS model specifications are based on elements of imperative programming languages rather than logicbased knowledge representation. In terms of functionality and semantics, they are nevertheless closely related to PLMLs.
Already the probabilistic Horn abduction framework (Poole 1993) provided a clear synthesis of logic programming and graphical models. However, only from around (Kersting and De Raedt 2001) onward were the research lines of probabilistic inductive logic programming and knowledgebased model construction more broadly merged. Techniques originating in inductive logic programming have a lasting influence on current methods for learning the logical, qualitative dependency structure of probabilistic logic models. Techniques from probabilistic graphical models provide the main tools for numerical computations in learning the numerical parameters of a model and in performing probabilistic inference.
PLM Semantics
Semantically, most PLMs can be understood as a generalization of the classical ErdősRényi random graph model. A random graph model defines for a given finite set of nodes N a probability distribution over all graphs over N. Probabilistic logic models extend this in several ways: first, they define probability distributions not only for a single binary edge relation but for multiple random attributes and relations of arbitrary arities. Second, they can allow the definition of this distribution to be conditioned on a set of already fixed and known attributes and relations. Third, they are not limited to Boolean attributes and relations, but can also define distributions over multivalued and, to a limited extent, also numeric attributes and relations.
Formally, if D = {d_{1}, …, d_{ n }} is a finite domain, then an nary relation on D is a subset of D^{ n }. If n = 1, one usually speaks of an attribute, rather than a relation. In most cases, the domain will be partitioned into subdomains of objects of different types. In predicate logic, relations are represented by relation symbols. An atom is a relation symbol followed by a list of arguments, which can be object identifiers or variables. A ground atom is an atom with only identifiers as arguments.
As an example, a bibliographic domain may contain entities D_{ P } = {p_{1}, …, p_{ k }} of type person, and D_{ A } = {a_{1}, … , a_{ l }} of type article, so that D = D_{ P }∪ D_{ A }. A binary relation on D is author, taking one argument of type person, and one argument of type article, i.e., author ⊆ D_{ P }× D_{ A }. Examples of atoms then are author(X, Y), author(X, a_{3}), and author(p_{1}, a_{3}), where X, Y are variable symbols and the last of these atoms is ground. The set of all ground atoms that can be formed using a given set of relations symbols, and identifiers from a given domain, is called the Herbrand base. A ground atom represents a Boolean variable, which in a purely logical context is a propositional variable and in a probabilistic context a Boolean random variable. A truthvalue assignment to all atoms in the Herbrand base is a Herbrand interpretation. The strict limitation to Boolean variables is loosened in many PLMLs, so that also ground atoms representing multivalued variables can be allowed. Thus, a ground atom department(p_{1}) could represent an attribute with possible values CS, statistics, and engineering, and a ground atom association(p_{4}, a_{2}) could represent a multivalued relation with possible values author, editor, and reviewer.
In most cases, a PLM will be a generic model that defines the conditional distribution (1) for a general class of admissible input structures H^{inp}. Thus, a PLM for modeling probabilistic cites and department relations would be applicable to any input domain structured like the example in Fig. 1.
Other typical examples for PLMs are models for genetic traits in a pedigree, or models for biomolecular data. In the first case, input domains may consist of family trees defined by a set of individuals and input relations mother and father. The PLM can then define a probability distribution for probabilistic relations modeling uncertain genetic properties and relationships among the individuals. For biomolecular data, an input domain may consist of a protein given by its constituent amino acids and its known linear sequence structure as an input relation. A probabilistic relation can be a binary proximity or contact relation representing which amino acids are close in the threedimensional folding of the protein.
A PLM can also be more specifically constructed for a single input domain. This is the usual scenario when a PLM models relations in a specific database. A popular example is the Internet Movie Database, consisting of entities of types, movie, actor, director, etc., and known relations such as in_cast and directed_by. Probabilistic relations modeled by a PLM can be any relations that are not fully known, or that one may want to predict for new entities added to the database.
The explicit distinction between input and probabilistic relations is not made in all PLMLs. When the distinction is not made, then all relations, in principle, are probabilistic, and the model simply defines distributions P(H^{prob}  D) for a class of domains D. Complete knowledge of a distinguished subset of probabilistic relations for a specific input domain can still be integrated into the PLM by conditioning P(H^{prob}  D) on the observed values of the known relations.
According to the PLM semantics as P(H^{prob}  H^{inp}, D), the set D of domain entities is always part of the known input, and the probability model P(H^{prob}  H^{inp}, D) only is a distribution over the finite set of different H^{prob} for a fixed D. Within this semantic framework, it is not possible to express uncertainty about the domain itself, i.e., how many or what entities exist in the domain D. Probabilistic logic models that go beyond the probabilistic Herbrand interpretation semantics in the narrow sense of (1) are probabilistic programming languages that define distributions over infinite outcome spaces defined by program executions (Muggleton 1996; Pfeffer 2001; Goodman et al. 2008; Ng et al. 2008), or infinite classes of Herbrand interpretations with varying domains (Milch et al. 2005). The added expressivity of these types of models comes at the cost of less semantic transparency, because semantics are usually described in procedural terms, whereas semantics for PLMs in the narrower sense can be given in a declarative manner. Furthermore, the added complexity of probabilistic programming languages often leaves stochastic simulation as the only viable inference technique.
Modeling Languages
Probabilistic logic modeling languages are formal representation languages that typically incorporate some elements of predicate logic, or relational database design. The plethora of existing modeling frameworks can be classified along several dimensions: directed vs. undirected probabilistic models, discriminative vs. generative, and logic oriented vs. database oriented. The most fundamental distinction is between directed and undirected models, which can be understood as procedural and descriptive modeling approaches, respectively.
Directed Models
In the procedural approach, the model specification corresponds to a direct, constructive specification of a sampling process for Herbrand interpretations. We note that procedural in this sense is to be distinguished from generative models in the usual sense of machine learning: generative there refers to any model that defines a full joint distribution of all random variables, in contrast to discriminative models that only define a conditional distribution for distinguished target variables, given the values of a different set of predictor variables.
RuleBased Representations
A rulebased PLM
department(P) = Stats\( \overset{0.3}{\leftarrow } \) 
department(P) = CS\( \overset{0.5}{\leftarrow } \) 
department(P) = Eng\( \overset{0.2}{\leftarrow } \) 
cites(A, A′) \( \overset{0.05}{\leftarrow } \)cites(A″, A′)  A′ ≺ A″ ≺ A 
cites(A, A′) \( \overset{0.02}{\leftarrow } \)department(P) = Stats, department(P′) = Stats  author(P, A), author(P′, A′) 
cites(A, A′) \( \overset{0.01}{\leftarrow } \)department(P) = CS, department(P′) = CS  author(P, A), author(P′, A′) 
cites(A, A′) \( \overset{0.01}{\leftarrow } \)department(P) = Eng, department(P′) = Eng  author(P, A), author(P′, A′) 
combine(cites) = noisy – or 
The first three rules of Table 1 specify the probability distribution for the 3valued department attribute. The following rules specify the distribution for the cites relation. According to the first cites rule, the probability that an article A′ is cited by an article A increases if there are other articles A″ already citing A′. This rule is constrained via the input relation ≺ to apply only to articles A, A′, and A″ in the correct temporal sequence. The following two rules stipulate that the probability that A cites A′ increases if the authors of the papers belong to the same department, where the level of this increase can be different for different departments.
A ground instance of a rule is obtained by substituting concrete elements from an input domain for the logical variables in the rule. A ground rule instance is a partial specification of the factor P(α_{ i }  pa(α_{ i }), H^{inp}, D), where α_{ i } is the ground head of the rule. The specifications from multiple rule groundings with head α_{ i } have to be combined using a combining rule. In the example of Table 1, multiple rules for the cites relation are combined using the noisyor function, which means that each applicable ground rule with head α_{ i } is interpreted as describing an independent potential cause for α_{ i } to be true.
To obtain a welldefined distribution via the factorization (2), the dependency relation on ground atoms must be acyclic, which, in turn, usually implies certain constraints on admissible input domains for a rulebased model. For the PLM of Table 1, for example, the dependency relation on probabilistic atoms will be acyclic whenever ≺ is a linear order, but may not be acyclic otherwise.
Modeling languages that essentially use rulebased representations include probabilistic knowledge bases (Ngo and Haddawy 1997), relational Bayesian networks (Jaeger 1997), and Bayesian logic programs (Kersting and De Raedt 2001).
Graphical Representations
There are several approaches for graphbased representations of PLMs. Network fragments (Laskey and Mahoney 1997; Laskey 2008) is the PLML most closely linked to probabilistic graphical models. The PLM here is represented by means of network templates that are parameterized with logical variables. Similar to the grounding operation of probabilistic logic rules, templates are instantiated by substituting elements from a concrete input domain for the variables. The collection of ground network fragments so obtained then is connected to a ground network as in Fig. 3.
A different approach to graphical representations is derived from a database perspective; it specifies PLMs in terms of probabilistic extensions of entity relationship diagrams. Probabilistic relational models in the sense of Friedman et al. (1999) first introduced this approach, which was subsequently refined and generalized in many ways, and found its most mature expression in the directed acyclic probabilistic entity relationship (DAPER) model (Heckerman et al. 2007).
The DAPER model does not include a specific representation language for the specification of the conditional probability distributions for the probabilistic attributes. Any suitable way of defining these distributions in the form of tables or functions may be used. As for rulebased PLMs, combination functions may be used to handle manytoone dependencies. Due to their orientation toward databases, probabilistic relational and DAPER models are somewhat more adapted toward also modeling numerical attributes than the logicoriented, rulebased approaches. Manytoone dependencies on numerical attributes are usually defined in terms of an aggregate such as mean or max of multiple numerical parents.
Independent Choices
While the rulebased approaches discussed above use syntax inspired from logic programming, they are semantically rooted in probabilistic graphical models. Another type of PLMLs, represented by Prism (Sato 1995), independent choice logic (ICL) (Poole 2008), and ProbLog (Kimmig et al. 2011), also derives their semantics from the theory of logic programs. We refer to them as IC models.
An IC model is given as a set L of logic program clauses labeled with probabilities, where different types of IC models impose different restrictions on the structure of L. The probability labels represent inclusion probabilities for the clauses in a random subset L′ ⊆ L: each clause of L is included in L′ independently of other clauses, with a probability according to its label. Every subset L′ then has a probability P (L′) of being the outcome of this selection process. L′ is a logic program that has a unique least Herbrand model LHM(L′), i.e., the Herbrand interpretation in which all the program clauses are satisfied, and a minimal number of ground atoms are assigned the value true.
IC models typically do not use a distinguished set of input relations defining H^{inp}. Any known structure can be specified within L by ground atoms with selection probability 1.
A ProbLog model
0.8 :: edge(a, c) 
0.6 :: edge(b, c) 
0.7 :: edge(a, b) 
0.9 :: edge(c, d) 
0.8 :: edge(c, e) 
0.5 :: edge(e, d) 
1.0 :: path(X, Y) ← edge(X, Y) 
1.0 :: path(X, Y) ← edge(X, Z), path(Z, Y) 
The last clause of Table 2 leads to a cyclic dependency of path atoms, and therefore the model cannot be directly compiled into a directed network representation as in Fig. 3. The model can still be understood as a constructive sampling process for Herbrand interpretations, where now the role that in rulebased representations is played by the acyclic dependency condition on ground atoms is filled by the iterative construction process whose least fixed point defines the least Herbrand model.
Undirected Models
In descriptive modeling approaches, a probability distribution over the space of Herbrand interpretations H^{prob} is defined by assigning weights to certain features of interpretations. The probability of an interpretation then is the normalized product of its feature weights.
Here the Φ_{ i } is 0/1valued feature functions that may depend on any subset \( {\alpha}_{i_1},\dots, {\alpha}_{i_{k_i}} \) of atoms, w_{ i } is a nonnegative weight associated with Φ_{ i }, and Z is the normalizing constant. Often, the loglinear version (5) is used, so that feature weights can be arbitrary reals \( {w}_i^{\prime }= \log \left({w}_i\right) \). The factorization (4) corresponds to a Markov network, whose nodes are the ground atoms α_{1}, …, α_{ n } and where there is an edge between α_{ i } and α_{ j } if α_{ i } and α_{ j } appear together in one of the feature functions \( {\varPhi}_i\left({\alpha}_{i_1},\dots, {\alpha}_{i_{k_i}}\right) \).
Featurebased PLMLs are closely related to exponential random graph models (Robins et al. 2007). They generalize standard exponential random graph models in that they define distributions over multirelational structures. More importantly, they provide precise, expressive formal representation languages for defining feature functions Φ by means of logical formulas ϕ(X_{1,} … , X_{ k }). Substitution of domain elements a_{ h } for the logical variables X_{ h } leads to the ground feature functions Φ(α_{1}, … , α_{ k }).
A Markov logic network
−∞ : cites(A, A′) ∧ A ≺ A′ 
−0.5 : cites(A, A′) 
0.8 : cites(A, A′) ∧ cites(A″, A′) 
1.2 : department(P) = CS ∧ department(P′) = CS ∧ author(P, A) ∧ author(P′, A) 
−0.6 : department(P) = CS ∧ department(P′) = Stats ∧ author(P, A) ∧ author(P′, A) 
The main strength of the featurebased specification lies in their ability to model mutual dependencies that are not easily factored into an acyclic dependency structure and to construct a model in a modular fashion by a list of constraints, without the need to specify, as in rulebased approaches, a combination function that combines disparate model components into a coherent conditional probability specification. A disadvantage lies in the fact that weights attached to features have no easily understood meaning and can be difficult to calibrate to obtain a probability distribution with the correct or desired properties.
Discriminative Models
All PLMLs mentioned so far are used to construct generative models in the sense that they define a probability distribution over full Herbrand interpretations. Discriminative PLMs are designed to solve specific classification tasks consisting of the prediction of a class attribute, or class relation. Most attention here has been given to logicrelational extensions of decision tree classifiers. As for decision tree classifiers for conventional attributevalue data, there is only a small step from purely qualitative classlabel prediction models to quantitative estimation models for the posterior classlabel probability distribution. Qualitative, logicrelational decision tree models have been developed in the inductive logic programming line of research (Blockeel and De Raedt 1998). Probabilistic relational decision trees were introduced in Neville et al. (2003).
Inference
Tasks
PLMs can be used for a wide range of predictive (classification) and descriptive (clustering) inference tasks. As usual for probabilistic models, prediction is performed by computing the posterior probability distribution of unobserved random variables given the values of observed variables, and clustering is performed by predicting a special hidden, or latent, variable. Even though some forms of clustering (e.g., community detection) can be very relevant for the type of relational data modeled with PLMs, it is so far predictive tasks that have received the most attention.
Depending on the nature of the relation being predicted, special types of prediction tasks can be distinguished. When the predicted variable is a binary relation, one often speaks of link prediction. An example is the prediction of the cites relation. When a predicted binary relation represents an identity relation, one speaks of entity resolution. For example, if the entities in the domain are bibliographic records for scientific articles (rather than the articles themselves), then the binary relation same as between records stands for the fact that both records refer to the same article.
Techniques
When the distribution defined by a PLM can be represented by a directed or undirected probabilistic graphical model as depicted in Fig. 3, then inference can be performed using standard inference techniques for probabilistic graphical models in such a support network. An individual prediction query (6) corresponds to computing a singlenode posterior distribution in the network. A collective classification task corresponds to a maximum a posteriori (MAP) hypothesis inference task, i.e., the computation of the jointly most probable configuration of a specific subset of network nodes. Clustering tasks, too, can be solved by a MAP inference computing the most probable configuration of a latent cluster attribute.
Thus, instead of computing the most probable joint configuration of only the selected set of class (or cluster) variables, one computes the most probable configuration of all unobserved probabilistic relations. The value for H^{class} induced by the solution of (8) may be different from the one obtained by (7).
Figure 3 shows a part of a support network for answering queries for the PLM of Table 1 and domain from Fig. 1. Even though the number of nodes in a support network is polynomial in the size of the input domain, exact inference on the support network typically is intractable for input domains of realistic size. Approximate inference techniques, notably sampling approaches such as Markov chain Monte Carlo simulation or importance sampling, then have to be used.
The computation of (conditional) probabilities (6) can also be reduced to a weighted model counting problem, i.e., the computation of the sum of weights of all models of a propositional theory, where each model has a weight defined by its truth assignment to the propositional variables (Chavira and Darwiche 2008). The reduction can be based on a given support network. Alternatively, one may also directly compile a PLM query into a data structure used for solving weighted model counting problems, especially when the usual support network construction is not possible, as in some IC frameworks (Fierens et al. 2011).
The inference techniques mentioned so far all operate on the level of ground instances of the PLM, i.e., the highlevel representation language only is used to construct lowlevel models such as depicted in Fig. 3, which are entirely defined in terms of ground atoms. The inference methods used on these lowlevel models do not take advantage of symmetries in the ground model that are due to the fact that it is constructed out of generic rules, and therefore many of its ground atoms are indistinguishable. Lifted inference techniques for PLMs are developed with the goal to leverage these symmetries and to perform basic inference operations jointly for groups of indistinguishable atoms (Poole 2003; de Salvo Braz et al. 2005). In certain cases this can reduce inference complexity from exponential to linear in the size of the domain. Lifted versions have been developed for a variety of exact and approximate inference methods, including variable elimination (Poole 2003; de Salvo Braz et al. 2005; Milch et al. 2008) and weighted model counting (Gogate and Domingos 2011; Van den Broeck et al. 2011). Theoretical limitations for lifted inference are given by lower complexity bounds for probabilistic inference in PLMs (Jaeger 2000).
Learning
Often N = 1, i.e., a model is learned from a single relational structure. For example, a PLM for bibliographic data may be learned from a single bibliographic database, such as graphically depicted jointly in Figs. 1 and 2. In this case, the observed random variables, i.e., the probabilistic Herbrand atoms, do not usually obey any assumptions of being independent and/or identically distributed (IID). The fact that learning is not based on IID data often is seen as a main distinguishing feature of PLM learning as opposed to more traditional machine learning scenarios.
When N > 1, then the individual observations \( \left({H}_i^{\mathrm{prob}},{H}_i^{\mathrm{inp}},{D}_i\right) \) will typically be assumed to be independent. This case is further subdivided into the scenario where \( \left({H}_1^{\mathrm{inp}},{D}_1\right)=\ldots =\left({H}_N^{\mathrm{inp}},{D}_N\right) \), i.e., data consists of multiple observations of the probabilistic relations for a fixed input domain, and \( \left({H}_1^{\mathrm{inp}},{D}_1\right)\ne \ldots \ne \left({H}_N^{\mathrm{inp}},{D}_N\right) \), i.e., one observes multiple input domains. An example for the first scenario is that (H^{inp}, D) represents a fixed social network of a set of nodes D and a social link relation defined by H^{inp}. The \( {H}_N^{\mathrm{inp}} \) may then contain timestamped observations of different messages m_{ i } that are propagated through the network, which can be encoded as ground atoms has message(d, m_{ i }, t), where d ∈ D and t are time stamps. From this data, a PLM for information diffusion could then be learned.
A typical example for the second scenario is biomolecular data, where data cases correspond to different molecules, which are described by their constituent atoms D_{ i } and known structural properties represented as \( {H}_i^{\mathrm{inp}} \). The probabilistic relations observed in \( {H}_i^{\mathrm{prob}} \) encode uncertain biochemical or structural properties of the molecule.
From such data PLMs for predicting chemical reactivity or (secondary) structure of a molecule could be learned.
For rulebased, IC, or featurebased models, S consists of the set of logical formulas used in the model, and θ comprises the probability or weight parameters. For models based on graphical representations, S consists of the graph structure as in Fig. 4, and θ comprises parameters needed to specify the local probability distributions.
Based on a model decomposition (9), one distinguishes parameter and structure learning. Parameter learning is the task of fitting the parameter vector θ given a model structure S, where S may either be a fixed, manually designed structure, or a current candidate structure within a structure learning procedure. For parameter learning, most methods of statistical machine learning can be adapted to the PLM context, either based on maximizing the likelihood P(H^{prob}  θ, S, H^{inp}, D) or, in Bayesian approaches, the posterior probability P(θ  H^{prob}, S, H^{inp}, D). Like probabilistic inference, optimization of these score functions usually is performed on the basis of ground instances of the PLM, using existing learning techniques for such ground models. Thus, for example, in the case of directed models, one may construct a support Bayesian network and apply parameter learning techniques for Bayesian networks. These techniques have to be slightly adapted; however seen as a conventional Bayesian network, each node in a support network like the one of Fig. 3 has its own parameter vector, which can only be learned from data containing multiple observations of the node and its parents in the network. PLM parameter learning from a single observation of the nodes is enabled by parameter tying: in the PLMinduced support network, many parameters in different nodes are known to be identical, since they are equal to (functions of) the original parameters in the PLM. Thus, for example, all the nodes department(p_{1}), … , department(p_{5}) share the same parameters θ_{1} = P (department(p_{ i }) = CS), θ_{2} = P (department(p_{ i }) = Stats), and θ_{3} = P (department(p_{ i }) = Eng). Thereby, a single observation of values for all nodes can be sufficient to estimate the model parameters.
Key Applications
As mentioned above, a key feature of PLMLs is their generality and flexibility, which leads to a very broad range of possible applications, especially in a prototype development phase. In the context of social network analysis, PLMs have mostly been applied to prediction tasks, including individual and collective classification, and link prediction.
Future Directions
The development of lifted inference techniques that could provide scalable inference for rich classes of PLMs is an active research area. Various ways of extending the expressivity of PLMLs are also a topic of current research. This includes the step from PLMLs in the somewhat narrower sense to probabilistic programming languages, integration of numerical random attributes and relations into PLMLs, and the extension of PLMLs to decision support models by integrating utility and decision variables.
CrossReferences
References
 Blockeel H, De Raedt L (1998) Topdown induction of firstorder logical decision trees. Artif Intell 101(1–2):285–297MathSciNetCrossRefzbMATHGoogle Scholar
 Breese JS (1992) Construction of belief and decision networks. Comput Intell 8(4):624–647CrossRefGoogle Scholar
 Breese JS, Goldman RP, Wellman MP (1994) Introduction to the special section on knowledgebased construction of probabilistic decision models. IEEE Trans Syst Man Cybern 24(11):1577–1579Google Scholar
 Chavira M, Darwiche A (2008) On probabilistic inference by weighted model counting. Artif Intell 172:772–799MathSciNetCrossRefzbMATHGoogle Scholar
 de Salvo Braz R, Amir E, Roth D (2005) Lifted firstorder probabilistic inference. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI05), pp 1319–1325Google Scholar
 Fierens D, den Broeck GV, Thon I, Gutmann B, De Raedt L (2011) Inference in probabilistic logic programs using weighted CNF’s. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011). AUAI Press, CorvallisGoogle Scholar
 Friedman N, Getoor L, Koller D, Pfeffer A (1999) Learning probabilistic relational models. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI99)Google Scholar
 Gilks WR, Thomas A, Spiegelhalter DJ (1994) A language and program for complex bayesian modelling. Statistician 43(1):169–177CrossRefGoogle Scholar
 Gogate V, Domingos P (2011) Probabilistic theorem proving. In: Proceedings of the 27th Conference of Uncertainty in Artificial Intelligence (UAI11). AUAI Press, CorvallisGoogle Scholar
 Goodman ND, Mansinghka VK, Roy D, Bonawitz K, Tenenbaum JB (2008) Church: a language for generative models. In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI08). AUAI Press, CorvallisGoogle Scholar
 Haddawy P (1994) Generating Bayesian networks from probability logic knowledge bases. In: Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence (UAI94). Morgan Kaufmann, San Francisco, pp 262–269Google Scholar
 Halpern J (1990) An analysis of firstorder logics of probability. Artif Intell 46:311–350MathSciNetCrossRefzbMATHGoogle Scholar
 Heckerman D, Meek C, Koller D (2007) Probabilistic entityrelationship models, PRMs, and plate models. In: Getoor L, Taskar B (eds) Introduction to statistical relational learning. MIT Press, Cambridge, MAGoogle Scholar
 Jaeger M (1997) Relational bayesian networks. In: Geiger D, Shenoy PP (eds) Proceedings of the 13th Conference of Uncertainty in Artificial Intelligence (UAI97). Morgan Kaufmann, San Francisco, pp 266–273Google Scholar
 Jaeger M (2000) On the complexity of inference about probabilistic relational models. Artif Intell 117:297–308MathSciNetCrossRefzbMATHGoogle Scholar
 Kersting K, De Raedt L (2001) Towards combining inductive logic programming and bayesian networks. In: Proceedings of the Eleventh International Conference on Inductive Logic Programming (ILP2001). Springer, Berlin, Heidelberg, pp 118–131Google Scholar
 Kimmig A, Demoen B, De Raedt L, Santos Costa V, Rocha R (2011) On the implementation of the probabilistic logic programming language problog. Theory Pract Logic Program 11(2–3):235–262MathSciNetCrossRefzbMATHGoogle Scholar
 Laskey KB (2008) Mebn: a language for firstorder bayesian knowledge bases. Artif Intell 172(2–3):140–178. doi:10.1016/j.artint.2007.09.006MathSciNetCrossRefzbMATHGoogle Scholar
 Laskey KB, Mahoney SM (1997) Network fragments: representing knowledge for constructing probabilistic models. In: Proceedings of the 13th Annual Conference on Uncertainty in Artificial Intelligence (UAI–97). Morgan Kaufmann Publishers, San Francisco, pp 334–341Google Scholar
 Milch B, Marthi B, Russell S, Sontag D, Ong D, Kolobov A (2005) Blog: probabilistic logic with unknown objects. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI05), pp 1352–1359Google Scholar
 Milch B, Zettlemoyer LS, Kersting K, Haimes M, Kaelbling LP (2008) Lifted probabilistic inference with counting formulas. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI08). AAAI Press, Menlo ParkGoogle Scholar
 Muggleton S (1996) Stochastic logic programs. In: De Raedt L (ed) Advances in Inductive Logic Programming. IOS Press, Washington, DC, pp 254–264Google Scholar
 Neville J, Jensen D, Friedland L, Hay M (2003) Learning relational probability trees. In: Proceedings of The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD03). ACM, New YorkGoogle Scholar
 Ng KS, Lloyd JW, Uther WTB (2008) Probabilistic modelling, inference and learning using logical theories. Ann Math Artif Intell 54(1–3):159–205MathSciNetCrossRefzbMATHGoogle Scholar
 Ngo L, Haddawy P (1997) Answering queries from contextsensitive probabilistic knowledge bases. Theor Comput Sci 171:147–177MathSciNetCrossRefzbMATHGoogle Scholar
 Nilsson N (1986) Probabilistic logic. Artif Intell 28:71–88MathSciNetCrossRefzbMATHGoogle Scholar
 Pfeffer A (2001) IBAL: a probabilistic rational programming language. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI01)Google Scholar
 Poole D (1993) Probabilistic horn abduction and Bayesian networks. Artif Intell 64:81–129CrossRefzbMATHGoogle Scholar
 Poole D (2003) Firstorder probabilistic inference. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI03)Google Scholar
 Poole D (2008) The independent choice logic and beyond. In: De Raedt L, Frasconi P, Kersting K, Muggleton S (eds) Probabilistic inductive logic programming, lecture notes in artificial intelligence, vol 4911. Springer, Berlin, pp 222–243CrossRefGoogle Scholar
 Richardson M, Domingos P (2006) Markov logic networks. Mach Learn 62(1–2):107–136CrossRefGoogle Scholar
 Robins G, Pattison P, Kalish Y, Lusher D (2007) An introduction to exponential random graph (p*) models for social networks. Soc Networks 29(2):173–191CrossRefGoogle Scholar
 Sato T (1995) A statistical learning method for logic programs with distribution semantics. In: Proceedings of the 12th International Conference on Logic Programming (ICLP’95). MIT Press, Cambridge, pp 715–729Google Scholar
 Van den Broeck G, Taghipour N, Meert W, Davis J, De Raedt L (2011) Lifted probabilistic inference by firstorder knowledge compilation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI11)Google Scholar
Recommended Reading
 De Raedt L (2008) Logical and relational learning. Springer, BerlinCrossRefzbMATHGoogle Scholar
 De Raedt L, Frasconi P, Kersting K, Muggleton S (eds) (2008) Probabilistic inductive logic programming, lecture notes in artificial intelligence, vol 4911. Springer, BerlinGoogle Scholar
 Getoor L, Taskar B (eds) (2007) Introduction to statistical relational learning. MIT Press, Cambridge, MAzbMATHGoogle Scholar