A method for reduction of examples in relational learning

Kuželka, Ondřej; Szabóová, Andrea; Železný, Filip

doi:10.1007/s10844-013-0294-z

A method for reduction of examples in relational learning

Published: 17 December 2013

Volume 42, pages 255–281, (2014)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Ondřej Kuželka¹,
Andrea Szabóová¹ &
Filip Železný¹

308 Accesses
2 Citations
Explore all metrics

Abstract

Feature selection methods often improve the performance of attribute-value learning. We explore whether also in relational learning, examples in the form of clauses can be reduced in size to speed up learning without affecting the learned hypothesis. To this end, we introduce the notion of safe reduction: a safely reduced example cannot be distinguished from the original example under the given hypothesis language bias. Next, we consider the particular, rather permissive bias of bounded treewidth clauses. We show that under this hypothesis bias, examples of arbitrary treewidth can be reduced efficiently. We evaluate our approach on four data sets with the popular system Aleph and the state-of-the-art relational learner nFOIL. On all four data sets we make learning faster in the case of nFOIL, achieving an order-of-magnitude speed up on one of the data sets, and more accurate in the case of Aleph.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reducing Examples in Relational Learning with Bounded-Treewidth Hypotheses

LazyBum: Decision Tree Learning Using Lazy Propositionalization

A tree-based algorithm for attribute selection

Article 04 August 2017

Notes

In this paper we follow the conventions of Atserias et al. (2007). In other works, what we call k-consistency is known as strong k+1-consistency (Rossi et al. 2006).
This is not always the case. For example, we have a decision procedure for x-subsumption w.r.t. the set of all clauses – the ordinary 𝜃-subsumption which is decidable but NP-hard.
Note again the terminology used in this paper following Atserias et al. (2007). In CSP-literature, it is often common to call 2-consistency what we call 1-consistency.
We are applying 𝜃 also to e because e need not be ground.
A first-order interpretation consists of several components, one them is the domain of discourse, and another is a function ϕ which maps constants to elements of the domain of discourse. We assume w.l.o.g that there is a constant c _d for every element d of the domain of discourse.

References

Appice, A., Ceci,M., Rawles, S., Flach, P.A. (2004). Redundant feature elimination for multi-class problems. In ICML (vol. 69).
Atserias, A., Bulatov, A., Dalmau, V. (2007). On the power of k-consistency. In Proceedings of ICALP-2007 (pp. 266–271).
Beeri, C., Fagin, R., Maier, D., Yannakakis, M. (1983). On the desirability of acyclic database schemes. Journal of ACM, 30(3), 479–513.
Article MATH MathSciNet Google Scholar
Bodlaender, H.L., & Mohring, R.H. (1993). The pathwidth and treewidth of cographs. SIAM Journal of Discrete Methematics, 6, 238–255.
MathSciNet Google Scholar
Courcelle, B. (1990). The monadic second-order logic of graphs. i. recognizable sets of finite graphs. Information and Computation, 85(1), 12–75.
Article MATH MathSciNet Google Scholar
De Raedt, L. (1997).) Logical settings for concept-learning. Artificial Intelligence, 95(1), 187–201.
Article MATH MathSciNet Google Scholar
De Raedt, L. (2008). Logical and relational learning. New York: Springer.
Dechter, R. (2003). Constraint processing. San Francisco: Morgan Kaufmann.
Erickson, J. (2009). CS 598: Computational topology, course notes, University of Illinois at Urbana-Champaign. http://compgeom.cs.uiuc.edu/~jeffe/teaching/comptop/.
Fagin, R. (1983). Degrees of acyclicity for hypergraphs and relational database schemes. Journal of the ACM, 30(3), 514–550.
Article MATH MathSciNet Google Scholar
Feder, T., & Vardi, M.Y. (1998). The computational structure of monotone monadic snp and constraint satisfaction: a study through datalog and group theory. SIAM Journal on Computing, 28(1), 57–104.
Article MATH MathSciNet Google Scholar
Freuder, E.C. (1990). Complexity of k-tree structured constraint satisfaction problems. In Proceedings of the eighth national conference on artificial intelligence (vol. 1, pp. 4–9). AAAI’90: AAAI Press.
Hastie, T., Tibshirani, R., Friedman, J. (2001). The elements of statistical learning: data mining, inference, and prediction. New York: Springer.
Helma, C., King, R.D., Kramer, S., Srinivasan, A. (2001). The predictive toxicology challenge 2000–2001. Bioinformatics, 17(1), 107–108.
Article Google Scholar
Krogel, M.A., Rawles, S., Železný, F., Flach, P., Lavrac, N., Wrobel, S. (2003). Comparative evaluation of approaches to propositionalization. In ILP. Springer.
Kuželka, O., & Železný, F. (2009). Block-wise construction of acyclic relational features with monotone irreducibility and relevancy properties. In ICML 2009: the 26th International Conference on Machine Learning.
Kuželka, O., Železný, F. (2011a). Block-wise construction of tree-like relational features with monotone reducibility and redundancy. Machine Learning, 83, 163–192.
Article MATH MathSciNet Google Scholar
Kuželka, O., Železný, F. (2011b). Seeing the world through homomorphism: An experimental study on reducibility of examples. In ILP’10: Inductive logic programming (pp. 138–145).
Kuželka, O., Szabóová, A., Železný, F. (2013a). Bounded least general generalization. In ILP’12: inductive logic programming.
Kuželka, O., Szabóová, A., Železný, F. (2013b). Reducing examples in relational learning with bounded-treewidth hypotheses. In New frontiers in mining complex patterns (pp. 17–32).
Landwehr, N., Kersting, K., Raedt, L.D. (2007). Integrating naïve bayes and FOIL. Journal of Machine Learning Research, 8, 481–507.
MATH Google Scholar
Lavrač, N., Gamberger, D., Jovanoski, V. (1999). A study of relevance for learning in deductive databases. Journal of Logic Programming, 40(2/3), 215–249.
Article MATH MathSciNet Google Scholar
Liu, H.,Motoda, H., Setiono, R., Zhao, Z. (2010). Feature selection: an ever evolving frontier in data mining. Journal of Machine Learning Research - Proceedings Track, 10, 4–13.
Google Scholar
Mackworth, A. (1977). Consistency in networks of relations. Artificial Intelligence, 8(1), 99–118.
Article MATH MathSciNet Google Scholar
Maloberti, J., & Sebag, M. (2004). Fast theta-subsumption with constraint satisfaction algorithms. Machine Learning, 55(2), 137–174.
Article MATH Google Scholar
Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing, Special Issue on Inductive Logic Programming, 13(3–4), 245–286.
Article Google Scholar
Nassif, H., Al-Ali, H., Khuri, S., Keirouz, W., Page, D. (2009). An inductive logic programming approach to validate hexose biochemical knowledge. In: Proceedings of the 19th international conference on ILP (pp. 149–165). Leuven.
Nienhuys-Cheng, S.H., de Wolf, R., (eds.) (1997). Foundations of inductive logic programming. Lecture Notes in Computer Science (vol. 1228). Springer.
Plotkin, G. (1970). A note on inductive generalization. Edinburgh: Edinburgh University Press.
Rossi, F., van Beek, P., Walsh T., (Eds.) (2006). Handbook of constraint programming. New York: Elsevier.
Žaková, M., Železný, F., Garcia-Sedano, J., Tissot, C.M., Lavrač, N., Křemen, P., Molina, J. (2007). Relational data mining applied to virtual engineering of product designs. In ILP06, LNAI (vol. 4455, pp. 439–453). Springer.

Download references

Acknowledgments

This work was supported by the Czech Grant Agency through project 103/11/2170 Transferring ILP techniques to SRL. The authors would like to thank the anonymous reviewers of NFMCP’12 and of the JIIS special issue for helpful remarks.

Author information

Authors and Affiliations

Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
Ondřej Kuželka, Andrea Szabóová & Filip Železný

Authors

Ondřej Kuželka
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Szabóová
View author publications
You can also search for this author in PubMed Google Scholar
Filip Železný
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ondřej Kuželka.

Appendix A: Propositions and Proofs

Proposition 9

Let X be a set of clauses. If ≼_X is x-subsumption w.r.t. X and ◃_X is an x-presubsumption w.r.t. X then (A ◃_X B) ⇒ (A ≼_X B) for any two clauses A, B (not necessarily from X).

Proof

We need to show that if A ◃_x B then (C ≼_𝜃 A) ⇒ (C ≼_𝜃 B) for all clauses C ∈ X. First, if A ◃_x B and C ⋠ _𝜃 A then the proposition holds trivially. Second, C ≼_𝜃 A means that there is a substitution 𝜗 such that C𝜗 ⊆ A. This implies C𝜗 ◃_X B using the condition 1 from definition of x-presubsumption. Now, we can use the second condition which gives us C ≼_𝜃 B (note that C ∈ X and C𝜗 ◃_X B). □

Proposition 10

Let k ∈ N and let ◃_k be a relation on clauses defined as follows: A ◃_k B if and only if the k-consistency algorithm run on the CSP-encoding of the 𝜃-subsumption problem A ≼_𝜃 B returns true. The relation ◃_k is an x-presubsumption w.r.t. the set X _k of all clauses with treewidth at most k.

Proof

We need to verify that ◃_k satisfies the conditions stated in Definition 9.

1.
If A ◃_k B and C ⊆ A then C ◃_k B. This holds because if the k-consistency algorithm returns true for a problem then it must also return true for any of its subproblems (recall the discussion in Section 2.4). It is easy to check that if C ⊆ A are clauses then the CSP problem encoding the 𝜃-subsumption problem C ≼_𝜃 B is a subproblem of the CSP encoding of the 𝜃-subsumption problem A ≼_𝜃 B. Therefore this condition holds.
2.
If A ∈ X, 𝜗 is a substitution and A𝜗 ◃_x B then A ≼_𝜃 B. If A𝜗 ◃_k B then the k-consistency algorithm applied on the CSP encoding 𝒫_𝜗 = (𝒱_𝜗, 𝒟_𝜗, 𝒞_𝜗) of the problem A𝜗 ≼_𝜃 B finishes with a non-empty set of partial solutions H _𝜗. When applied on the CSP encoding P = (𝒱, 𝒟, 𝒞)s of the problem A ≼_𝜃 B, the k-consistency algorithm finishes with a set of partial solutions H. What we need to show is that this set H is non-empty. We will do this by showing that H has a non-empty subset H ^∗ which can be constructed as follows. For each partial solution φ ∈ H _𝜗, we take all sets of variables V ⊆ 𝒱 such that |V| ≤ k + 1 and V𝜗 ⊆ Supp(φ) and, for each such set v, we add to H ^∗ a new partial solution φ ^∗ such that Supp(φ ^∗) = V and φ ^∗(v) = φ(v𝜗) for all v ∈ Supp(φ ^∗). Clearly, any such φ ^∗is a valid partial solution of 𝒫 because otherwise φ could not have been a valid partial solution of 𝒫_𝜗. Moreover, for every φ ^∗ ∈ H ^∗with |Supp(φ ^∗) | ≤ k and every variable v ∈ 𝒱, there exists a partial solution ψ ^∗ ∈ H ^∗ such that φ ^∗ ⊆ ψ ^∗ and v ∈ Supp(ψ ^∗). This can be shown by contradiction as follows. Let us suppose that there is a partial solution φ ^∗ ∈ H ^∗ such that |Supp(φ ^∗) | ≤ k and a variable v ∈ 𝒱 such that there is no ψ ^∗ ∈ H ^∗ which would satisfy φ ^∗ ⊆ ψ ^∗and v ∈ Supp(ψ ^∗). But then the respective solution φ ∈ H _𝜗 from which φ ^∗ was constructed should have been removed by the k-consistency algorithm because there could not have been any partial solution ψ ∈ H _𝜗 such that φ ⊆ ψ and v𝜗 ∈ Supp(ψ) which is a contradiction. For every partial solution φ ^∗ ∈ H ^∗, the set H ^∗ must also contain all partial ’sub-solutions’ (i.e. solutions φ′^∗ such that φ′^∗ ⊆ φ ^∗). As a consequence of the above, the k-consistency algorithm running on the CSP encoding of the problem A ≼_𝜃 B cannot remove from H any partial solution contained in the set H ^∗ and, thus, H ^∗ ⊆ H and the k-consistency algorithm must return the value true. Since A ∈ X _k, it must also hold A ≼_𝜃 B.
3.
If A ≼_𝜃 B then A ◃_k B. This is a property of k-consistency.

□

Proposition 11

Let us have a set X and a polynomial-time decision procedure for checking ◃_X which is an x-presubsumption w.r.t. the set X. Then, given a clause A on input, the literal-elimination algorithm finishes in polynomial time and outputs a clause Â satisfying the following conditions:

1.
Â ≼_𝜃 A and A ≼_X Â where ≼_X is an x-subsumption w.r.t. the set X.
2.
|Â| ≤ |Â _𝜃| where Â _𝜃 is a 𝜃-reduction of a subset of A’s literals with maximum length.

Proof

We start by proving Â ≼_𝜃 A and A ≼_X Â. This can be shown as follows. First, A ≼_X A′ holds in any step of the algorithm which follows from (A′ ◃_X A′∖{L}) ⇒ (A′ ≼_X A′∖{L}) – recall that A′ is replaced by A′∖{L} in the literal elimination algorithm if and only if A′ ◃_X A′∖{L} – and from transitivity of x-subsumption. Consequently we also have A ≼_X Â because Â = A′ in the last step of the algorithm. Second, Â ≼_𝜃 A because Â ⊆ A. Now, we prove the second part of the proposition. What remains to be shown is that the resulting clause Â will not be bigger than Â _𝜃. Since Â ⊆ A, it suffices to show that Â cannot be 𝜃-reducible. Let us assume, for contradiction, that it is 𝜃-reducible. If Â was 𝜃-reducible, there would have to be a literal L ∈ Â such that Â ≼_𝜃 Â∖{L}. The relation ◃_X satisfies (A ≼_𝜃 B) ⇒ (A ◃_X B) therefore it would also have to hold A′ ◃_X A′∖{L}. However, then L should have been removed by the literal-elimination algorithm which is a contradiction with Â being output of it. The fact that the literal-elimination algorithm finishes in polynomial time follows from the fact that, for a given clause A, it calls the polynomial-time procedure for checking the relation ◃_X at most |A|² times (the other operations of the literal-elimination algorithm can be performed in polynomial time as well). □

We will need the following simple lemma in the proof of Proposition 6.

Lemma 1

Let A be a clause. If A is 𝜃-reducible then A𝜗 ≈_𝜃 A where 𝜗 = {V/t} for some V ∈ vars(A) and t ∈ terms(A) , V ≠ t and |vars(A𝜗)| < |vars(A)| .

Proof

Let A be 𝜃-reducible and let Â _𝜃 ⊆ A be 𝜃-reduction of A. There must be a substitution 𝜃 ^∗ such that A𝜃 ^∗ = Â _𝜃 and |A𝜃 ^∗ | < |A| and therefore also |vars(A𝜃 ^∗) | < |vars(A) | (because if |vars(A𝜃 ^∗) | = |vars(A) | then A and A𝜃 ^∗ would be isomorphic and they would have to contain the same number of literals). Let V ^∗ be a variable contained in vars(A) but not contained in vars(A𝜃 ^∗) (there must be some such variable because |vars(A𝜃 ^∗) | < |vars(A) | ) and let 𝜗 = {V ^∗/t} ⊆ 𝜃 ^∗. There are two cases depending on whether T is a variable or a constant. Let us start with the case when T is a variable which we denote as W = t for clarity. Now, there must be a variable S ≠ V ^∗ such that {S/W} ⊆ 𝜃 which can be shown as follows. If there is no such variable (i.e. if the only variable which is mapped on W is V ^∗) then we can construct a new substitution 𝜃 ^{∗ ∗} = 𝜃 ^∗ 𝜃 ^∗ which has the following two ’interesting’ properties (’interesting’ because they will allow us to arrive at a contradiction): (i) A𝜃 ^{∗ ∗} ≈_𝜃 A and (ii) |vars(A𝜃 ^{∗ ∗}) | < |vars(A𝜃 ^∗) | . The first property holds because A ≼_𝜃 A ^∗ ⊆ Â _𝜃 ⊆ A implies A ≼_𝜃 A𝜃 ^∗𝜃^∗ ⊆ Â _𝜃. The second property can be shown as follows. Clearly, none of the variables not in vars(A𝜃 ^∗) can reappear in vars(A𝜃 ^{∗ ∗}). Moreover, the variable W cannot be contained in vars(A𝜃 ^{∗ ∗}) because V ^∗ 𝜃 ^{∗ ∗} ≠ W and there was no other variable mapped to V ^∗or W by 𝜃 ^∗. Therefore |vars(A𝜃 ^{∗ ∗}) | < |vars(A𝜃 ^∗) | , i.e. the second property must be true as well. However, then we have a contradiction because we assumed that Â _𝜃 was 𝜃-reduction but Â _𝜃 cannot be a 𝜃-reduction because A ≈_𝜃 A𝜃 ^{∗ ∗}and |A𝜃 ^{∗ ∗} | < |A𝜃 ^∗| (which follows from |vars (A𝜃 ^∗∗)| < |vars(A𝜃 ^∗)| = |vars(Â _𝜃)| and A𝜃 ^∗∗ ⊆ Â _𝜃). Thus, there must be a variable S ≠ V ^∗ such that {S/W} ⊆ 𝜃. It follows that if we set 𝜗 = {V ^∗/S} it must hold A𝜃 ^∗ = A𝜗𝜃 ^∗. Therefore A𝜗 ≼_𝜃 A𝜃 ^∗ ≈ _𝜃 A. Now, if T is not a variable but a constant, we can simply set 𝜗 = {V ^∗/t} and it must hold A𝜃 ^∗ = A𝜗𝜃 ^∗ and therefore also A𝜗 ≼_𝜃 A ^∗ ≈_𝜃 A. Since also trivially A ≼_𝜃 A𝜗, we have A𝜗 ≈_𝜃 A and |vars(A𝜗) | < |vars(A) | which finishes the proof of this Lemma. □

Proposition 12

Let us have a set X and a polynomial-time decision procedure for checking ◃_X which is an x-presubsumption w.r.t. the set X. Then, given a clause A on input, the literal substitution algorithm finishes in polynomial time and outputs a clause Â satisfying the following conditions:

1.
Â ≼_X A and A ≼_𝜃 Â where ≼_X is the x-subsumption w.r.t. the set x.
2.
|terms(Â)| ≤ |terms(Â _𝜃)| where Â _𝜃 is a 𝜃-reduction of a clause A𝜗 with maximum cardinality of the set (Â _𝜃) where 𝜗 is some substitution mapping variables to elements of the set terms(A).

Proof

We start by proving Â ≼_X A and A ≼_𝜃 Â. This can be shown as follows. First, A′ ≼_X A holds in any step of the algorithm which follows from (A′𝜗 ◃_X A′) ⇒ (A′𝜗 ≼_X A′) – recall that A′is replaced by A′𝜗 in the literal substitution algorithm if and only if A>′𝜗 ◃_X A′ – and from transitivity of x-subsumption. Consequently we also have Â ≼_X A because Â = A′ in the last step of the algorithm. Second, A ≼_𝜃 Â because Â = A𝜃′ for some substitution 𝜃′( 𝜃′is the substitution composed of the substitutions 𝜗 applied on A′in the literal substitution algorithm). Now, we prove the second part of the proposition. What remains to be shown is that the resulting clause Â will not have more terms than Â _𝜃. Since Â = A𝜃′, it suffices to show that Â cannot be 𝜃-reducible. Let us assume, for contradiction, that it is 𝜃-reducible. If Â was 𝜃-reducible, then there would have to be a substitution 𝜗 = {V/t} such that A𝜗 ≈_𝜃 A where 𝜗 = {V/t} for some v ∈ vars(A), t ∈ terms(A), V ≠ t (this follows from Lemma 1). The relation ◃_X satisfies (A ≼_𝜃 B) ⇒ (A ◃_X B) therefore it would also have to hold A′𝜗 ◃_X A′. However, then 𝜗 should have been applied on A′ by the literal substitution algorithm which is a contradiction with Â being output of it. The fact that the literal-substitution algorithm finishes in polynomial time follows from the fact that, for a given clause A, it calls the polynomial-time procedure for checking the relation ◃_X at most |A| ³times (the other operations of the literal-substitution algorithm can be performed in polynomial time as well). □

Lemma 2

(Plotkin 1970) Let A and B be clauses. If A ≼_𝜃 B then A ⊧ B.

Proposition 13

Let ℒ be a hypothesis language and let e be a clause. Let e͂ be a clause obtained from e by variabilizing the constants which are not contained in the hypothesis language. Then (ℋ ⊧ e) ⇔ (ℋ ⊧ e͂) for any ℋ ∈ ℒ. If ê is a 𝜃-reduction of e͂ and |ê| < |e| then ê is also a safe reduction of e.

Proof

We will start by showing validity of the implication (H ⊧ e) ⇒ (H ⊧ e͂ ). For contradiction, let us assume that H ⊧ e and that there is a model M of the clausal theory H such that M ⊧ e and M ⊮ e͂ . Then there must be a substitution 𝜃 grounding all variables in e͂ such that^{Footnote 4} M ⊧ e𝜃 and M ⊮ e͂ 𝜃. Now, we will construct another model M′of H in which e will not be satisfied. We take each constant C in e that has been replaced by a variable v in e͂ and update the assignment ϕ of the constants C to objects from the domain of discourse in the model^{Footnote 5} so that ϕ(c) = ϕ(V𝜃). Clearly, we can do this for every constant C since every constant in e has been replaced by exactly one variable. Now, we see that M′ ⊮ e. However, we are not done yet as it might happen that the new model with the modified ϕ would no longer be a model of H. However, this is clearly not the case since none of the constants C appears in H and therefore the change of ϕ has no effect whatsoever on whether or not H is true in M′. So, we have arrived at a contradiction. We have a model M′such that M′ ⊧ H and M′ ⊮ e which contradicts the assumption H ⊧ e. The implication (H ⊧ e) ⇐ (H ⊧ e͂ ) follows directly from Lemma 2. We have e͂ ≼_𝜃 e therefore also e͂ ⊧ e and finally e͂ ≼_𝜃 e. (ii) In order to show (H ⊧ e) ⇒ (H ⊧ ê), it suffices to notice that H ⊧ e and e ≼_𝜃 ê imply H ⊧ ê. The implication (H ⊧ e) ⇒ (H ⊧ ê) may be shown similarly as follows: H ⊧ ê and ê ≼_𝜃 e imply H ⊧ e. □

The next lemmas are used to prove Proposition 8. In Lemmas 3 and 4, we formulate rather general conditions under which x-equivalence w.r.t. a set X implies safe equivalence w.r.t. to the set 2^X. Then, in the subsequent lemmas, we show that these conditions are satisfied by the set of bounded-treewidth Horn clauses.

Lemma 3

Let X be a set of clauses and let ℒ ⊆ 2^X be a hypothesis language. Let A and B be clauses. Let A ≈ _X B w.r.t. the set X and let the following be true for any ℋ ∈ ℒ and any clause C: if ℋ ⊧ C and C is not a tautology then there is a clause D ∈ X such that ℋ ⊧ D and D ≼_𝜃 C. Then for any ℋ ∈ ℒ, it holds (ℋ ⊧ A) ⇔ (ℋ ⊧ B).

Proof

First, we need to consider the case when A and B are both tautologies. If both A and B are tautologies then (ℋ ⊧ A) ⇔ (ℋ ⊧ B) naturally holds for any ℋ. Now, we can consider the case when at most one of the clauses A and B is a tautology. Let us assume w.l.o.g. that if one of the clauses is a tautology then it is the clause B. If ℋ ⊧ A then there is a clause D ∈ X such that ℋ ⊧ D and D ≼_𝜃 A (by the assumptions of the proposition). Since D ∈ X, D ≼_𝜃 A and A ≼_X B, we have D ≼_𝜃 B (from the definition of x-subsumption) and finally also ℋ ⊧ D ⊧ B and so ℋ ⊧ B. The other implication can be shown in a completely similar fashion if A is not a tautology. □

Lemma 4

Let X be a set of Horn clauses such that any clause which can be derived from a clausal theory ℋ ∈ 2^X using SLD resolution is contained in X. If e and ê are two Horn clauses such that e ≈_X ê then for any ℋ ∈ 2^X: (ℋ ⊧ e) ⇔ (ℋ ⊧ ê).

Proof

We will use the subsumption theorem for Horn clauses and Lemma 3. We will show that the conditions of this lemma imply conditions of Lemma 3. If A is a non-tautological clause and ℋ ⊧ A then by the subsumption theorem there must be a clause C derivable from ℋ using resolution (SLD resolution, respectively) such that C ≼_𝜃 A. Therefore for any non-tautological clause A, if ℋ ⊧ A where ℋ ∈ 2^Xthen there must be a clause C ∈ X such that ℋ ⊧ C (because resolution is sound) and C ≼_𝜃 A. Now, since e ≈ _X ê , we may finish the proof using Lemma 3 which gives us (ℋ ⊧ e) ⇔ (ℋ ⊧ ê ) for any ℋ ∈ 2^X. □

Lemma 5

(Clique containment, lemma Bodlaender and Mohring 1993) Let A be a clause and T _A be its tree decomposition. For any l ∈ A, there is a vertex in T _A labelled by a set of variables v such that vars(l) ⊆ V.

Proof

The proof can be found in Bodlaender and Mohring (1993). More precisely, the paper (Bodlaender and Mohring 1993) contains a lemma which states that if C is a clique in a graph G = (V, E) then any tree decomposition of G contains a vertex labelled by a set of vertices L such that C ⊆ L. Our statement of this lemma in terms of clauses and tree decompositions of clauses then follows immediately from this result which can be shown by noticing that the decomposition of the clause A can be easily converted to a tree decomposition of A’s Gaifman graph G _A where, for any l ∈ A, var(l) corresponds to a clique in G _A. □

Lemma 6

Let A be a function-free clause and T _A be its tree decomposition. Let l ^∗ ∈ A be a literal and let 𝜃 be a substitution not affecting any variable in the set vars(A)∖vars(l ^∗), mapping variables to variables or terms (i.e. not to function symbols) and never mapping any variables to elements of the set vars(A). Then a tree decomposition of A𝜃 can be obtained by applying the substitution 𝜃 on the variables contained in the labels of the tree decomposition T _A and removing constants from these sets if necessary – we denote the new labelled tree by T _A 𝜃. As a consequence, the treewidth of A𝜃 is never greater than the treewidth of A.

Proof

If we apply the substitution 𝜃 on the labels of the tree decomposition T _A then none of the label-sets associated to the vertices of T _A increases in size (this is in part due to the fact that we do not consider function symbols). Therefore if we are able to show that T _A 𝜃 is a tree decomposition of A𝜃 then we will automatically get also the result that the treewidth of A𝜃 is not greater than the treewidth of A. So, let us show that T _A 𝜃 is indeed a tree decomposition of A𝜃.

(i)
Claim: For every variableV ∈ vars(A𝜃) there is a vertex of T _A 𝜃 labelled by a set containingV . This is obvious.
(ii)
Claim: For every pair of variablesu, V which both appear in a literall ∈ A𝜃, there is a vertex of T _A 𝜃 labelled by a set containing both U and V . There must be a literal l′ ∈ A containing two variables U′and V′such that U′𝜃 = U and V′𝜃 = V (because U and V are both contained in a literal). Therefore there must be a vertex T of T _A labelled by a set S _t containing both U′and V′. After applying the substitution 𝜃 on the set S _t, we get a set by which some vertex contained in T _A 𝜃 is labelled and it contains both U and V .
(iii)
Claim: For everyV ∈ vars(A𝜃), the set of vertices of T _A 𝜃 labelled by sets containingV forms a connected subgraph of T _A 𝜃. Let us assume (for contradiction) that there is a variable V ∈ vars(A𝜃) such that the set of vertices of T _A 𝜃 labelled by sets containing V forms a disconnected graph. It follows that there must be two variables U′, V′( U′ ≠ V′) such that U′𝜃 = V′𝜃 = V and the sets of vertices S _U′ and S _V′ of the tree decomposition T _A corresponding to the variables U′and V′, respectively, must be disjoint (because the set of vertices with labels containing a given variable must form a connected subgraph in any tree decomposition). However, the variables U′and V′ must appear in the literal l ^∗ because the substitution 𝜃 affects only variables contained in l ^∗ and maps no variables of A to elements of the set vars(A) (since at least one of the variables must have been affected by the substitution and since it is equal to the other variable, the other variable must have been affected by the substitution as well). Thus, since both U′ and V′ must be contained in l ^∗ there must be a vertex in the tree decomposition T _A labelled by a set which contains both U′ and V′. The sets of vertices of T _A labelled by sets containing the variables U′ and V′, respectively, therefore cannot be disjoint which is a contradiction.

We have thus shown that T _A 𝜃 is a tree decomposition of A𝜃. □

Lemma 7

Let A = l ₁ ∨ l ₂ ∨ ⋯ ∨ l _m and B = m ₁ ∨ m ₂ ∨ ⋯ ∨ m _n be two standardized-apart function-free clauses. Let 𝜃 be a most general unifier of the literals l _i, ¬ m _j not affecting any variable in the set (vars(A) ∪ vars(B))∖(vars(l _i) ∪ vars(m _j)) such that vars(l _i 𝜃) ∩ vars(A) = vars(l _i 𝜃) ∩ vars(B) = ∅. Next, let

$$C = (l_{1} \vee \dots \vee l_{i-1} \vee l_{i+1} \vee \dots \vee l_{m} \vee m_{1} \vee \dots \vee m_{j-1} \vee m_{j+1} \vee \dots \vee m_n)\theta $$

be a binary resolvent of A and B. Then for the treewidth k _C of C, it holds k _C ≤ max{k _A, k _B} where k _A is the treewidth of A and k _B is the treewidth of B.

Proof

Using Lemma 6, we get that A𝜃 has a tree decomposition T _A𝜃 of width at most k _A and B𝜃 has a tree decomposition T _B𝜃 of width at most k _B. We will now show how to construct a tree decomposition of width at most max{k _A, k _B} for the clause C. Let V _A( V _B, respectively) be a vertex from T _A𝜃( T _B𝜃, respectively) which is labelled by a set of variables 𝒱_A( 𝒱_B, respectively) such that vars(l _i 𝜃) ⊆ 𝒱_A( vars(l _i 𝜃) ⊆ 𝒱_B) – such vertices must exist by Lemma 5. We construct the new tree decomposition T _C by connecting T _A𝜃 and T _B𝜃 by a new edge between the vertices V _A and V _B (we may remove the variables not contained in the clause C from the labels of the vertices of T _C). Clearly, T _C has width at most max{k _A, k _B}. We need to show that it is indeed a tree decomposition of C. The first two conditions from Definition 7 are trivially satisfied which follows from the fact that T _A𝜃 and T _B𝜃 are tree decompositions of the two clauses A𝜃 and B𝜃 and from C ⊆ A𝜃 ∪ B𝜃. It remains to show validity of the third condition (connectedness).Let us assume (for contradiction) that there is a variable V ∈ vars(C) such that the vertices labelled by sets containing the variable V form a disconnected subgraph of T _C. The variable V cannot be contained in vars(A) or vars(B). If v ∈ vars(A) then V ∉ vars(B) (because A and B were standardized apart) and also V ∉ vars(l _i 𝜃) (because we selected the unifier 𝜃 to satisfy vars(l _i 𝜃) ∩ vars(A) = ∅) but then the set of vertices labelled by sets containing the variable V could not be disconnected because it is actually connected in T _A𝜃 and none of the labels in T _B𝜃 contains V . The same argument can be used to show that V ∉ vars(B). So the only remaining possibility is that V ∈ vars(l _i 𝜃). However, this is not possible either. Since both T _A𝜃 and T _B𝜃 are tree decompositions, the set of vertices labelled by the sets containing the variable V forms a connected subgraph in both T _A𝜃 and T _B𝜃. Moreover, a vertex from T _A𝜃 and a vertex from T _B𝜃 which are both labelled by sets containing all variables from vars(l𝜃) are connected by an edge in T _C therefore the set of vertices labelled by the sets containing the variable V must form a connected subgraph of T _C. Thus, we have arrived at a contradiction because there cannot be any variable with a disconnected subgraph of T _C associated to it.

We have verified that T _C is a tree decomposition of C with width at most max{k _A, k _B}. □

Lemma 8

Let X _k be the set of all function-free Horn clauses with treewidth at most k. Then for any clause C derivable by SLD resolution from a clausal theory $\mathcal {H} \in 2^{X_{k}}$ it holds C ∈ X _k.

Proof

Since this lemma considers only SLD resolution, we can consider just the case of binary resolvents (we do not need to take factors into account). The proposition then follows immediately from Lemma 7 because any clause derived by applying the binary resolution rule on two clauses must always have treewidth bounded by the treewidth of the clauses from which it was derived. □

Proposition 14

Let X _k be the set of all function-free Horn clauses with treewidth at most k and let $\mathcal {L} = 2^{X_{k}}$ be the set of theories consisting of function-free Horn clauses with treewidth at most k. Then any two clauses which are x-equivalent w.r.t. X _k are also safely equivalent w.r.t. ℒ.

Proof

Follows directly from Lemmas 4 and 8. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuželka, O., Szabóová, A. & Železný, F. A method for reduction of examples in relational learning. J Intell Inf Syst 42, 255–281 (2014). https://doi.org/10.1007/s10844-013-0294-z

Download citation

Received: 08 May 2013
Revised: 09 September 2013
Accepted: 21 November 2013
Published: 17 December 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s10844-013-0294-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A method for reduction of examples in relational learning

Abstract

Access this article

Similar content being viewed by others

Reducing Examples in Relational Learning with Bounded-Treewidth Hypotheses

LazyBum: Decision Tree Learning Using Lazy Propositionalization

A tree-based algorithm for attribute selection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix A: Propositions and Proofs

Appendix A: Propositions and Proofs

Proposition 9

Proof

Proposition 10

Proof

Proposition 11

Proof

Lemma 1

Proof

Proposition 12

Proof

Lemma 2

Proposition 13

Proof

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

Lemma 6

Proof

Lemma 7

Proof

Lemma 8

Proof

Proposition 14

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation