Learning directed relational models with recursive dependencies
 1.1k Downloads
 2 Citations
Abstract
Recently, there has been an increasing interest in generative models that represent probabilistic patterns over both links and attributes. A common characteristic of relational data is that the value of a predicate often depends on values of the same predicate for related entities. For directed graphical models, such recursive dependencies lead to cycles, which violates the acyclicity constraint of Bayes nets. In this paper we present a new approach to learning directed relational models which utilizes two key concepts: a pseudo likelihood measure that is well defined for recursive dependencies, and the notion of stratification from logic programming. An issue for modelling recursive dependencies with Bayes nets are redundant edges that increase the complexity of learning. We propose a new normal form format that removes the redundancy, and prove that assuming stratification, the normal form constraints involve no loss of modelling power. Empirical evaluation compares our approach to learning recursive dependencies with undirected models (Markov Logic Networks). The Bayes net approach is orders of magnitude faster, and learns more recursive dependencies, which lead to more accurate predictions.
Keywords
Statistical relational learning Bayesian networks Autocorrelation Recursive dependencies1 Introduction: relational data and recursive dependencies
Relational data are very common in realworld applications, ranging from social network analysis to enterprise databases. A phenomenon that distinguishes relational data from singlepopulation data is that the value of an attribute for an entity can be predicted by the value of the same attribute for related entities; this phenomenon has been called a “nearly ubiquitious characteristic” of relational datasets (Neville and Jensen 2007, Sect. 1). For example, whether individual a smokes may be predicted by the smoking habits of a’s friends. This pattern can be represented by clausal notation such as Smokes(X)←Smokes(Y),Friend(X,Y).
Different subfields concerned with relational data have introduced different terms for this phenomenon. From a logic programming perspective, it is natural to speak of a recursive dependency, where a predicate depends on itself. In statisticalrelational learning, Jensen and Neville introduced the term relational autocorrelation in analogy with temporal autocorrelation (Jensen and Neville 2002; Neville and Jensen 2007). In multirelational data mining, such dependencies are found by considering selfjoins where a table is joined to itself (Chen et al. 2009). We will use both the terms recursive dependency and autocorrelation. The former emphasizes the format of the rules we consider, whereas the latter distinguishes the probabilistic dependencies we model from deterministic logical entailment.
In this paper we investigate a new approach to learning recursive dependencies with Bayes nets, specifically Poole’s Parametrized Bayes Nets (PBNs) (Poole 2003); however, our results apply to other directed relational models as well, such as Probabilistic Relational Models (PRMs) (Getoor et al. 2001) and Bayes Logic Programs (BLPs) (Kersting and de Raedt 2007). Two key difficulties are well known for learning recursive dependencies using directed models.
(1) Recursive dependencies lead to cyclic dependencies among ground facts (Ramon et al. 2008; Domingos and Richardson 2007; Taskar et al. 2002). The cycles make it difficult to define a model likelihood function for observed ground facts in the data, which is an essential component of statistical model selection. To define a model likelihood function for Bayes net search, we utilize Schulte’s recent relational Bayes net pseudo likelihood (Schulte 2011) that measures the fit of a PBN to a relational database and is welldefined even in the presence of recursive dependencies. The recent efficient learnandjoin algorithm (Khosravi et al. 2010) searches for models that maximize the pseudo likelihood. In this paper we evaluate the pseudo likelihood approach on datasets with strong autocorrelations.
(2) A related problem is that defining valid probabilistic inferences in cyclic models is difficult. To avoid cycles in the ground model while doing inference, Khosravi et al. proposed converting a learned Bayes net to an undirected model using the standard moralization procedure (Khosravi et al. 2010). In graphical terms, moralization connects all coparents of a node, then omits edge directions. Inference with recursive dependencies can then be carried out using Markov Logic Networks (MLNs), a prominent relational model class that combines the syntax of logical clauses with the semantics of Markov random fields (Domingos and Richardson 2007). The moralization approach combines the efficiency and scalability of Bayes net learning with the highquality inference procedures of MLNs.
We compared our learning algorithms with two stateoftheart Markov Logic Network methods using public domain datasets The pseudo likelihood algorithm with main functor format is orders of magnitude faster, and learns more recursive dependencies, which lead to more accurate predictions.
Paper organization
We review the relevant background and define our notation. We prove theoretical results regarding relational autocorrelation: the first gives a necessary and sufficient condition for a ground Parametrized Bayes net to be acyclic, the second is the normal form theorem mentioned. We describe the normal form extension of the learnandjoin algorithm. Our simulations evaluate the ability of the extended algorithm to learn recursive dependencies, compared to Markov Logic Network learner.
Contributions
 1.
A new formal form theorem for Parametrized Bayes nets that addresses redundancies in modelling autocorrelations.
 2.
An extension of the learnandjoin algorithm for learning Bayes nets that include autocorrelations.
 3.
An evaluation of the pseudolikelihood measure (Schulte 2011) for learning autocorrelations.
2 Related work
Parametrized Bayes nets (PBNs) are a basic statisticalrelational model due to Poole (Poole 2003). PBNs utilize the functor concept from logic programming to connect logical structure with random variables.
Bayes Net Learning for Relational Data. Adaptations of Bayes net learning methods for relational data have been considered by several researchers (Khosravi et al. 2010; Fierens et al. 2007; Ramon et al. 2008; Friedman et al. 1999; Kersting and de Raedt 2007). Issues connected to learning Bayes nets with recursive dependencies are discussed in detail by Ramon et al. (2008). Early work on this topic required ground graphs to be acyclic (Kersting and de Raedt 2007; Friedman et al. 1999). For example, Probabilistic Relational Models allow dependencies that are cyclic at the predicate level as long as the user guarantees acyclicity at the ground level (Friedman et al. 1999). A recursive dependency of an attribute on itself is shown as a self loop in the model graph. If there is a natural ordering of the ground atoms in the domain (e.g., temporal), there may not be cycles in the ground graph; but this assumption is restrictive in general. The generalized ordersearch of Ramon et al. instead resolves cycles by learning an ordering of ground atoms. A basic difference between our work and generalized order search is that we focus on learning at the predicate level. Our algorithm can be combined with generalized ordersearch as follows: First use our algorithm to learn a Bayes net structure at the predicate/class level. Second carry out a search for a good ordering of the ground atoms. We leave integrating the two systems for future work.
Stratified models Stratification is a widely imposed condition on logic programs, because it increases the tractability of reasoning with a relatively small loss of expressive power. Our definition is very similar to the definition of local stratification in logic programming (Apt and Bezem 1991). The difference is that levels are assigned to predicates/functions rather than ground literals, so the definition does not need to distinguish positive from negative literals. Related ordering constraints appear in the statisticalrelational literature (Fierens 2009; Friedman et al. 1999).
3 Background and notation
We define the target model class of Parametrized Bayes nets. Then we briefly discuss the problems arising from cyclic dependencies that have been addressed in our previous work. The next section discusses the redundancy problem that has not been previously addressed.
3.1 Bayes nets for relational data
We follow the original presentation of Parametrized Bayes nets due to Poole (Poole 2003). A functor is a function symbol or a predicate symbol. In this paper we discuss only functors with a finite range of possible values. A functor whose range is {T,F} is a predicate, usually written with uppercase letters like P,R. A parametrized random variable or functor node or simply fnode is of the form f(X _{1},…,X _{ k })=f(X) where f is a functor and each firstorder variable X _{ i } is of the appropriate type for the functor. If a functor node f(τ) contains no variable, it is ground node. An assignment of the form f(τ)=a, where a is a constant in the range of f, is an atom; if f(τ) is ground, the assignment is a ground atom. A population is a set of individuals, corresponding to a domain or type in logic. Each firstorder variable X is associated with a population. An instantiation or grounding for a set of variables X _{1},…,X _{ k } assigns to each variable X _{ i } a constant from the population of X _{ i }. Getoor and Grant discuss the applications of function concepts as a unifying language for statisticalrelational modelling (Getoor and Grant 2006).
3.2 Relational pseudolikelihood measure for Bayes nets
Scorebased learning algorithms for Bayes nets require the specification of a numeric model selection score that measures how well a given Bayes net model fits observed data. A common approach to defining a score for a relational database is known as knowledgebased model construction (Getoor and Tasker 2007; Ngo and Haddawy 1997; Koller and Pfeffer 1997; Wellman et al. 1992). The basic idea is to consider the ground graph for a given database, illustrated in Fig. 2. A given database like the one in Fig. 1 specifies a value for each node in the ground graph. Thus the likelihood of the Parametrized Bayes net for the database can be defined as the likelihood assigned by the ground graph to the facts in the database following the usual Bayes net product formula.
 1.
Randomly select a grounding for all 1storder variables that occur in the Bayes Net. The result is a ground graph with as many nodes as the original Bayes net.
 2.
Look up the value assigned to each ground node in the database. Compute the loglikelihood of this joint assignment using the usual product formula; this defines a loglikelihood for the random instantiation.
 3.
The expected value of this loglikelihood is the pseudo loglikelihood of the database given the Bayes net.
The computation of the random grounding pseudo likelihood for the Bayes net of Fig. 2 and the database of Fig. 1. Each row is a simultaneous grounding of all 1storder variables in the Bayes net. The values of functors for each grounding defines an assignment of values to the Bayes net nodes. The Bayes net assigns a likelihood for each grounding using the standard product formula. The rounded numbers shown were obtained using the CP parameters of Fig. 2 together with P _{ B }(Smokes(X)=T)=1 and P _{ B }(Friend(X,Y)=T)=1/2, chosen for easy computation. The pseudo loglikelihood is the average of the loglikelihoods for each grounding, given by −(2.254+1.406+1.338+2.185)/4≈−1.8
Γ  X  Y  F(X,Y)  S(X)  S(Y)  C(Y)  \(P_{B}^{\gamma }\)  \(\ln(P_{B}^{\gamma })\) 

γ _{1}  Anna  Bob  T  T  T  F  0.105  −2.254 
γ _{2}  Bob  Anna  T  T  T  T  0.245  −1.406 
γ _{3}  Anna  Anna  F  T  T  T  0.263  −1.338 
γ _{4}  Bob  Bob  F  T  T  F  0.113  −2.185 
3.3 Inference and moralization
In the presence of cycles, the ground graph does not provide a valid basis for probabilistic inference. Several researchers advocate the use of undirected rather than directed models because cycles do not arise with the former (Domingos and Richardson 2007; Taskar et al. 2002; Neville and Jensen 2007). Undirected Markov random fields are therefore important models for inference with relational data. The recently introduced moralization approach (Khosravi et al. 2010) is essentially a hybrid method that uses directed models for learning and undirected models for inference.
Pedro Domingos has connected Markov random fields to logical clauses by showing that 1storder formulas can be viewed as templates for Markov random fields whose nodes comprise ground atoms that instantiate the formulas. Markov Logic Networks (MLNs) are presented in detail by Domingos and Richardson (Domingos and Richardson 2007). The qualitative component or structure of an MLN is a finite set of formulas or clauses {ϕ _{ i }}, and its quantitative component is a set of weights {w _{ i }}, one for each clause. The Markov Logic Network corresponding to a Moralized Bayes net simply contains one conjunctive clause for each possible state of each family. Thus the Markov Logic Network for a moralized PBN contains a conjunction for each conditional probability specified in the Bayes net. For converting the Bayes net conditional probabilities to MLN clause weights, Domingos and Richardson suggest using the log of the conditional probabilities as the clause weight (Domingos and Richardson 2007, 12.5.3). This is the standard conversion for propositional Bayes nets. Figure 3 illustrates the MLN clauses obtained by moralization using logprobabilities as weights.
4 Stratification and recursive dependencies
In this section we first consider analytically the relationship between cycles in a ground Bayes net and orderings of the functors that appear in the nonground Bayes net. It is common to characterize a Logic Program by the orderings of the functors that the logic program admits (Lifschitz 1996); we adapt the ordering concepts for Bayes nets. The key ordering concept is the notion of a level mapping. We apply it to Bayes nets as follows.
Definition 1

A Bayes net is strictly stratified if there is a level mapping such that for every edge f(τ)→g(τ), we have level(f)<level(g).

A Bayes net is stratified if there is a level mapping such that for every edge f(τ)→g(τ), we have level(f)≤level(g).
Strict stratification corresponds to the concept of a hierarchical rule set (Lifschitz 1996). Since it implies that one fnode cannot be an ancestor of another fnode with the same functor, strict stratification rules out recursive clauses. Stratification with a weak inequality, by contrast, does allow the representation of autocorrelations. Stratification is a widely imposed condition on logic programs, because it increases the tractability of reasoning with a relatively small loss of expressive power (Lifschitz, 1996, Sect. 3.5; Apt and Bezem, 1991). We next show that strict stratification characterizes the absence of cycles in a ground Bayes net. The proof is in Sect. 8.
Proposition 1
Let B be a Parametrized Bayes net, and let \(\mathcal {D}\) be a database instance such that every population (entity type) has at least two members. Then the ground graph \(\overline{B}\) for \(\mathcal {D}\) is acyclic if and only if the Bayes net B is strictly stratified.
This result shows that cyclic dependencies arise precisely when a node associated with one functor is an ancestor of another node associated with the same functor.^{1} This in turn is exactly the graphical condition associated with recursive dependencies, which means that recursive dependencies and cyclic dependencies are closely connected phenomena.
While stratified Bayes nets have the expressive power to represent autocorrelations, there is potential for additional complexity in learning if each functor is treated as a separate random variables. We discuss this issue in the next subsection and propose a normal form constraint for resolving it.
4.1 Stratification and the main functor node format
Redundant edges can be avoided if we restrict the model class to the main functor format, where for each function symbol f, there is a main functor node f(τ) such that all other functor nodes f(τ′) associated with the same functor are sources in the graph, that is, they have no parents. The intuition for this restriction is that statistically, two functors with the same function symbol are equivalent, so it suffices to model the distribution of these functors conditional on a set of parents just once. This leads to the following formal definition.
Definition 2
A Bayes net B is in main functor node form if for every functor f of B, there is a distinguished functor node f(τ), called the main functor node for f, such that every other functor node f(τ′), where τ′≠τ, has no parents in B.
Example
The next proposition shows that this equivalence holds in general: For any Bayes net B there is an equivalent Bayes net B′ in main functor node form. This claim is established constructively by showing how the original B can be transformed into B′. The transformation procedure is a conceptual aid, rather than an algorithm to be used in practice; to build a practical learning algorithm, we simply restrict the Bayes net candidates to be in main functor form (see Sect. 5 below). It is easy to see that we can make local changes to the 1storder variables such that all child nodes for a given functor are the same. For instance, in the Bayes net of Fig. 4 we can first substitute Y for X to change the edge age(X)→Smokes(X) into the edge age(Y)→Smokes(Y). Then we delete the former edge and add the latter, that is, we make age(Y) a parent of Smokes(Y). Figures 4 and 5 illustrate that the original and transformed Bayes nets have the same ground graph. However, in general the change of variables may introduce cycles in the Bayes net. The basis for the next proposition is that if the original Bayes net is stratified, the transformed functor node graph is guaranteed not to contain cycles. The proof is in Sect. 8.
Theorem 1
Let B be a stratified Bayes net. Then there is a Bayes net B′ in main functor form such that for every database \(\mathcal {D}\), the ground graph \(\overline{B}\) is the same as the ground graph \(\overline{B'}\).
4.2 Discussion
Even if Bayes nets with or without the main functor constraints have the same groundings, at the variable or class level the two models may not be equivalent. For instance, the model of Fig. 4 implies that age(X) is independent of Friend(X,Y) given Smokes(X). But in the model of Fig. 5, the node age(Y) is dependent on (dconnected with) Friend(X,Y) given Smokes(Y). The transformed model represents more of the dependencies in the ground graph. For instance, the ground nodes age(a) and Friend(b,a) are both parents of the ground node Smokes(a), and hence dconnected given Smokes(a).
5 The learnandjoin structure algorithm with recursive dependencies
5.1 Example of algorithm
 1.
Applying the singletable Bayes net learner to the People table may produce a singleedge graph Smokes(Y)→Cancer(Y). (Line 5)
 2.Then form the join data table(Line 6). The Bayes net learner is applied to J, with the following constraints.$$J= \mathit{Friend} \Join\mathit{People} \Join\mathit{People} $$
 (a)
From the People Bayes net, there must be an edge Smokes(Y)→Cancer(Y), since Cancer(Y).
 (b)
No edges may point into Smokes(X) or Cancer(X), since these are not the main functor nodes for the functors Smokes and Cancer (Line 8).
 (a)
Discussion
The learnandjoin algorithm finds a structure that maximizes the pseudolikelihood described in Sect. 3.2 (Schulte 2011). Khosravi et al. discuss the time complexity of the basic learnandjoin algorithm and show that the edgeinheritance constraint essentially keeps the model search space size constant even as the number of nodes considered grows with larger table joins. For the learnandjoin algorithm, the main computational challenge in scaling to larger table joins is therefore not the increasing number of columns (attributes) in the join, but only the increasing number of rows (tuples). The main functor constraint contributes further to decreasing the search space. For instance, suppose that we have k duplicate nodes and n nodes in total. Then for each duplicate node, there are 2(n−1) possible directed adjacencies. The main functor constraint eliminates a possible direction for adjacencies involving duplicate nodes, hence removes k(n−1) directed adjacencies from consideration.
6 Evaluation
 Single Table Bayes Net Search

GES search (Chickering 2003) with the BDeu score as implemented in version 4.3.90 of CMU’s Tetrad package (structure prior uniform, ESS=10; (The Tetrad Group 2008)).
 MLN Parameter Learning

The default weight training procedure (Lowd and Domingos 2007) of the Alchemy package (Kok et al. 2009), Version 30.
 MLN Inference

The MCSAT inference algorithm (Poon and Domingos 2006) to compute a probability estimate for each possible value of a descriptive attribute for a given object or tuple of objects.
Algorithms
 MBN

An MLN structure is learned using the extended learnandjoin algorithm (Sect. 5). The weights of clauses are learned using Alchemy. This method is called MBN for “moralized Bayes Net” by Khosravi et al. (2010).
 LHL

Lifted Hypergraph Learning (Kok and Domingos 2009) uses relational path finding to induce a more compact representation of data, in the form of a hypergraph over clusters of constants. Clauses represent associations among the clusters.
 LSM

Learning Structural Motifs (Kok and Domingos 2010) uses random walks to identify densely connected objects in data, and groups them and their associated relations into a motif.
We chose LSM and LHL because they are the most recent MLN structure learning methods that can potentially learn recursive dependencies.^{3}
Performance metrics
We use 3 performance metrics: Runtime, Accuracy (ACC), and Conditional log likelihood (CLL). Runtime includes structure learning and parameter learning time. ACC and CLL have been used in previous studies of MLN learning (Mihalkova and Mooney 2007; Kok and Domingos 2009). The CLL of a ground atom in a database given an MLN is its logprobability given the MLN and the information in the database. Accuracy is evaluated using the most likely value for a ground atom. For ACC and CLL the values we report are averages over all attribute predicates. We evaluate the learning methods using 5fold crossvalidation as follows. We formed 5 subdatabases for each by randomly selecting entities from each entity table and restricting the relationship tuples in each subdatabase to those that involve only the selected entities (Khosravi et al. 2010). The models were trained on 4 of the 5 subdatabases, then tested on the remaining fold. We report the average over the 5 runs, one for each fold.
Results on synthetic data
University +  MBN  LSM  LHL 

Time (seconds)  12  1  2941 
Accuracy  0.86  0.44  0.47 
CLL  −0.89  −2.21  −4.68 
Results on Mondial
Mondial  MBN  LSM  LHL 

Time (seconds)  50  2  15323 
Accuracy  0.43  0.26  0.26 
CLL  −1.39  −1.43  −3.69 
Results
Dependencies discovered by the autocorrelation extension of the learnandjoin algorithm
Database  Recursive Dependency Discovered 
University  gpa(X)←ranking(X),grade(X,Y),registered(X,Y),Friend(X,Z),gpa(Z) 
University  coffee(X)←coffee(Y),Friend(X,Y) 
Mondial  religion(X)←continent(X),Border(X,Y),religion(Y) 
Mondial  continent(X)←Border(X,Y),continent(Y),gdp(X),religion(Y) 
The predictive accuracy using MLN inference was much better in the moralized model (average accuracy improved by 25 % or more). This indicates that the discovered recursive dependencies are important for improving predictions.
Both MBN and LSM are fast. The speed of LSM is due to the fact that its rules are mostly just the unit clauses that model marginal probabilities (e.g., intelligence(S,1)).
Main functor constraint
Our last set of simulations examines the impact of the main functor constraint. A common way to learn recursive dependencies in multirelational data mining is to duplicate the entity tables involved in a selfrelationship as follows (Yin et al. 2004; Chen et al. 2009). For instance for a selfrelationship Friend(U _{1},U _{2}) with two foreign key pointers to an entity table User, we introduce a second entity table User _{ aux }, which contains exactly the same information as the original User table. Then the Friend relation is rewritten as Friend(U _{1},U _{ aux }), where the second copy of the User table is treated as a different entity table from the original one. On the duplication approach, the Bayes net learning algorithm treats the variables U _{1} and U _{ aux } as separate variables, which we expect would lead to learning valid but redundant edges.
 SLtime(s)

Structure learning time in seconds
 Numrules

Number of clauses in the Markov Logic Network excluding rules with weight 0.
 AvgLength

The average number of atoms per clause.
 AvgAbWt

The average absolute weight value.
Comparison to study the effects of removing Main Functor Constraints. Left: University+ dataset. Right: Mondial dataset
University +  Constraint  Duplicate 

SLtime (s)  3.1  3.2 
# Rules  289  350 
AvgLength  4.26  4.11 
AvgAbWt  2.08  1.86 
ACC  0.86  0.86 
CLL  −0.89  −0.89 
Mondial  Constraint  Duplicate 

SLtime (s)  8.9  13.1 
# Rules  739  798 
AvgLength  3.98  3.8 
AvgAbWt  0.22  0.23 
ACC  0.43  0.43 
CLL  −1.39  −1.39 
7 Conclusion and future work
An effective structure learning approach has been to upgrade propositional Bayes net learning for relational data. We presented a new method for applying Bayes net learning for recursive dependencies based on a recent pseudolikelihood score and a new normal form theorem. The pseudolikelihood score quantifies the fit of a recursive dependency model to relational data, and allows us to apply efficient model search algorithms. A new normal form eliminates potential redundancies that arise when predicates are duplicated to capture recursive relationships. In evaluations our structure learning method was very efficient and found recursive dependencies that were missed by structure learning methods for undirected models.
In our simulations, we considered recursive dependencies among attributes only. In future work, we aim to apply our results to learning recursive relationships among links (e.g., Friend(X,Y) and Friend(Y,Z) predicts Friend(X,Z)). Our theoretical results (Proposition 1 and Theorem 1) apply to link dependencies as well. However, as far as implementation goes, the current version of the learnandjoin algorithm is restricted to dependencies among attributes only.
8 Proofs
Proof outline for Proposition 1
The result assumes that no functor node contains the same variable twice. This assumption does not involve a loss of modelling power because a functor node with a repeated variable can be rewritten using a new functor symbol (provided the functor node contains at least one variable). For instance, a functor node Friend(X,X) can be replaced by the unary functor symbol Friend _{ self }(X).
(⇐) If B is strictly stratified, then so is the ground graph \(\overline{B}\), using the same level mapping. Since each child node is ranked less than its parent, there can be no cycle in \(\overline{B}\).
Proof of Theorem 1
This result assumes that functor nodes do not contain constants, which is true in typical statisticalrelational models. Let B be a stratified Bayes net. Consider the first function symbol f at level 0. Enumerate its associated functors as f(τ _{1}),…,f(τ _{ k }), such that for every i,j, if i<j, then f(τ _{ i }) is not a descendant of f(τ _{ j }) in B. This is possible since B is acyclic. For instance, if functor f is unary, we can order the associated functor nodes as f(X _{1})<f(X _{2})<⋯.
Finally, add all edges of the form g(σ _{ j })→f(τ _{ k }) to B and eliminate all edges into f(τ _{ j }), for j<k. The resulting graph B _{0} has the same ground graph as B. It is in main functor format wrt f since f(τ _{ k }) is the only functor with function symbol f that may have parents. To see that B _{0} is acyclic, note that by stratification f=g, so all new edges are from functors f(τ _{ j }) to f(τ _{ k }). So a cycle in B _{0} implies that f(τ _{ k }) is an ancestor of f(τ _{ j }) in B, for j<k, which is a contradiction.
We now repeat the construction for level 1, 2, etc. The resulting graphs B _{1},B _{2},… are acyclic because when an edge g(σ _{ j })→f(τ _{ k }) is added, either g is at a lower level than f, or g=f, therefore g(σ _{ j }) is not an ancestor of f(τ _{ k }). After completing the construction for the highest stratum, we obtain a graph B′ in main functor form whose grounding is the same as that of B, for any database. □
Footnotes
 1.
In some statisticalrelational models such as PRMs and LBNs, the ground graph is constructed somewhat using the known relational context to add fewer edges (Friedman et al. 1999; Fierens 2009). In that case strict stratification remains sufficient for acyclicity but may no longer be necessary; see Sect. 2.
 2.
BLP notation uses  instead of ← for Bayesian clauses.
 3.
The gradient boosting algorithm of Khot et al. is even more recent, but is restricted to learn only nonrecursive clauses (Khot et al. 2011).
Notes
Acknowledgements
Supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada. We are indebted to reviewers of the ILP conference and the Machine Learning journal for helpful comments.
References
 Apt, K. R., & Bezem, M. (1991). Acyclic programs. New Generation Computing, 9, 335–364. CrossRefGoogle Scholar
 Chen, H., Liu, H., Han, J., & Yin, X. (2009). Exploring optimization of semantic relationship graph for multirelational Bayesian classification. Decision Support Systems, 48(1), 112–121. CrossRefGoogle Scholar
 Chickering, D. (2003). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3, 507–554. MathSciNetMATHGoogle Scholar
 Domingos, P., & Richardson, M. (2007). Markov logic: A unifying framework for statistical relational learning. In L. Getoor & B. Tasker (Eds.), Introduction to statistical relational learning. Cambridge: MIT Press. Chapter 8 Google Scholar
 Fierens, D. (2009). On the relationship between logical bayesian networks and probabilistic logic programming based on the distribution semantics. In ILP (pp. 17–24). Google Scholar
 Fierens, D., Ramon, J., Bruynooghe, M., & Blockeel, H. (2007). Learning directed probabilistic logical models: Orderingsearch versus structuresearch. In ECML (pp. 567–574). Google Scholar
 Friedman, N., Getoor, L., Koller, D., & Pfeffer, A. (1999). Learning probabilistic relational models. In IJCAI (pp. 1300–1309). Berlin: Springer. Google Scholar
 Getoor, L., & Grant, J. (2006). Prl: A probabilistic relational language. Machine Learning, 62, 7–31. CrossRefGoogle Scholar
 Getoor, L., & Tasker, B. (2007). Introduction to statistical relational learning. Cambridge: MIT Press. MATHGoogle Scholar
 Getoor, L. G., Friedman, N., & Taskar, B. (2001). Learning probabilistic models of relational structure. In ICML (pp. 170–177). San Mateo: Morgan Kaufmann. Google Scholar
 Heckerman, D., Meek, C., & Koller, D. (2007). Probabilistic entityrelationship models, PRMs, and plate models. In L. Getoor & B. Tasker (Eds.), Introduction to statistical relational learning. Cambridge: MIT Press. Chapter 8. Google Scholar
 Jensen, D., & Neville, J. (2002). Linkage and autocorrelation cause feature selection bias in relational learning. In ICML. Google Scholar
 Kersting, K., & de Raedt, L. (2007). Bayesian logic programming: theory and tool. In L. Getoor & B. Tasker (Eds.), Introduction to statistical relational learning (pp. 291–318). Cambridge: MIT Press. Chapter 10. Google Scholar
 Khosravi, H., Schulte, O., Man, T., Xu, X., & Bina, B. (2010). Structure learning for Markov logic networks with many descriptive attributes. In AAAI (pp. 487–493). Google Scholar
 Khosravi, H., Man, T., Hu, J., Gao, E., & Schulte, O. (2012). (Learn and join algorithm code.) URL = http://www.cs.sfu.ca/~oschulte/jbn/.
 Khot, T., Natarajan, S., Kersting, K., & Shavlik, J. W. (2011). Learning Markov logic networks via functional gradient boosting. In ICDM (pp. 320–329). Google Scholar
 Klug, A. C. (1982). Equivalence of relational algebra and relational calculus query languages having aggregate functions. Journal of the Association for Computing Machinery, 29, 699–717. MATHCrossRefGoogle Scholar
 Kok, S., & Domingos, P. (2009). Learning Markov logic network structure via hypergraph lifting. In ICML (pp. 64–71). Google Scholar
 Kok, S., & Domingos, P. (2010). Learning Markov logic networks using structural motifs. In ICML (pp. 551–558). Google Scholar
 Kok, S., Summer, M., Richardson, M., Singla, P., Poon, H., Lowd, D., Wang, J., & Domingos, P. (2009). The Alchemy system for statistical relational AI. Technical report, University of Washington. Version 30. Google Scholar
 Koller, D., & Pfeffer, A. (1997). Learning probabilities for noisy firstorder rules. In IJCAI (pp. 1316–1323). Google Scholar
 Lifschitz, V. (1996). Foundations of logic programming. Principles of knowledge representation. Stanford: CSLI. Google Scholar
 Lowd, D., & Domingos, P. (2007). Efficient weight learning for Markov logic networks. In PKDD (pp. 200–211). Google Scholar
 May, W. (1999). Information extraction and integration: the mondial case study. Technical report, Universität Freiburg, Institut für Informatik. Google Scholar
 Mihalkova, L., & Mooney, R. J. (2007). Bottomup learning of Markov logic network structure. In ICML (pp. 625–632). New York: ACM. CrossRefGoogle Scholar
 Neville, J., & Jensen, D. (2007). Relational dependency networks. In L. Getoor & B. Tasker (Eds.), Introduction to statistical relational learning. Cambridge: MIT Press. Chapter 8. Google Scholar
 Ngo, L., & Haddawy, P. (1997). Answering queries from contextsensitive probabilistic knowledge bases. Theoretical Computer Science, 171, 147–177. MathSciNetMATHCrossRefGoogle Scholar
 Poole, D. (2003). Firstorder probabilistic inference. In IJCAI (pp. 985–991). Google Scholar
 Poon, H., & Domingos, P. (2006). Sound and efficient inference with probabilistic and deterministic dependencies. In AAAI. Google Scholar
 Ramon, J., Croonenborghs, T., Fierens, D., Blockeel, H., & Bruynooghe, M. (2008). Generalized orderingsearch for learning directed probabilistic logical models. Machine Learning, 70, 169–188. CrossRefGoogle Scholar
 Schulte, O. (2011). A tractable pseudolikelihood function for Bayes nets applied to relational data. In SIAM SDM (pp. 462–473). Google Scholar
 She, R., Wang, K., & Xu, Y. (2005). Pushing feature selection ahead of join. In SIAM SDM. Google Scholar
 Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In UAI (pp. 485–492). Google Scholar
 The Tetrad Group: The Tetrad project. (2008). http://www.phil.cmu.edu/projects/tetrad/.
 Ullman, J. D. (1982). Principles of database systems (Vol. 2). New York: Computer Science Press. MATHGoogle Scholar
 Wellman, M., Breese, J., & Goldman, R. (1992). From knowledge bases to decision models. Knowledge Engineering Review, 7, 35–53. CrossRefGoogle Scholar
 Yin, X., Han, J., Yang, J., & Yu, P. S. (2004). Crossmine: efficient classification across multiple database relations. In ConstraintBased mining and inductive databases (pp. 172–195). Google Scholar