Skip to main content
Log in

A method for reduction of examples in relational learning

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Feature selection methods often improve the performance of attribute-value learning. We explore whether also in relational learning, examples in the form of clauses can be reduced in size to speed up learning without affecting the learned hypothesis. To this end, we introduce the notion of safe reduction: a safely reduced example cannot be distinguished from the original example under the given hypothesis language bias. Next, we consider the particular, rather permissive bias of bounded treewidth clauses. We show that under this hypothesis bias, examples of arbitrary treewidth can be reduced efficiently. We evaluate our approach on four data sets with the popular system Aleph and the state-of-the-art relational learner nFOIL. On all four data sets we make learning faster in the case of nFOIL, achieving an order-of-magnitude speed up on one of the data sets, and more accurate in the case of Aleph.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig.ย 1
Fig.ย 2
Fig.ย 3
Fig.ย 4
Fig.ย 5

Similar content being viewed by others

Notes

  1. In this paper we follow the conventions of Atserias et al. (2007). In other works, what we call k-consistency is known as strong k+1-consistency (Rossi et al. 2006).

  2. This is not always the case. For example, we have a decision procedure for x-subsumption w.r.t. the set of all clauses โ€“ the ordinary ๐œƒ-subsumption which is decidable but NP-hard.

  3. Note again the terminology used in this paper following Atserias et al. (2007). In CSP-literature, it is often common to call 2-consistency what we call 1-consistency.

  4. We are applying ๐œƒ also to e because e need not be ground.

  5. A first-order interpretation consists of several components, one them is the domain of discourse, and another is a function ฯ• which maps constants to elements of the domain of discourse. We assume w.l.o.g that there is a constant c d for every element d of the domain of discourse.

References

  • Appice, A., Ceci,M., Rawles, S., Flach, P.A. (2004). Redundant feature elimination for multi-class problems. In ICML (vol. 69).

  • Atserias, A., Bulatov, A., Dalmau, V. (2007). On the power of k-consistency. In Proceedings of ICALP-2007 (pp. 266โ€“271).

  • Beeri, C., Fagin, R., Maier, D., Yannakakis, M. (1983). On the desirability of acyclic database schemes. Journal of ACM, 30(3), 479โ€“513.

    Articleย  MATHย  MathSciNetย  Google Scholarย 

  • Bodlaender, H.L., & Mohring, R.H. (1993). The pathwidth and treewidth of cographs. SIAM Journal of Discrete Methematics, 6, 238โ€“255.

    MathSciNetย  Google Scholarย 

  • Courcelle, B. (1990). The monadic second-order logic of graphs. i. recognizable sets of finite graphs. Information and Computation, 85(1), 12โ€“75.

    Articleย  MATHย  MathSciNetย  Google Scholarย 

  • De Raedt, L. (1997).) Logical settings for concept-learning. Artificial Intelligence, 95(1), 187โ€“201.

    Articleย  MATHย  MathSciNetย  Google Scholarย 

  • De Raedt, L. (2008). Logical and relational learning. New York: Springer.

  • Dechter, R. (2003). Constraint processing. San Francisco: Morgan Kaufmann.

  • Erickson, J. (2009). CS 598: Computational topology, course notes, University of Illinois at Urbana-Champaign. http://compgeom.cs.uiuc.edu/~jeffe/teaching/comptop/.

  • Fagin, R. (1983). Degrees of acyclicity for hypergraphs and relational database schemes. Journal of the ACM, 30(3), 514โ€“550.

    Articleย  MATHย  MathSciNetย  Google Scholarย 

  • Feder, T., & Vardi, M.Y. (1998). The computational structure of monotone monadic snp and constraint satisfaction: a study through datalog and group theory. SIAM Journal on Computing, 28(1), 57โ€“104.

    Articleย  MATHย  MathSciNetย  Google Scholarย 

  • Freuder, E.C. (1990). Complexity of k-tree structured constraint satisfaction problems. In Proceedings of the eighth national conference on artificial intelligence (vol. 1, pp. 4โ€“9). AAAIโ€™90: AAAI Press.

  • Hastie, T., Tibshirani, R., Friedman, J. (2001). The elements of statistical learning: data mining, inference, and prediction. New York: Springer.

  • Helma, C., King, R.D., Kramer, S., Srinivasan, A. (2001). The predictive toxicology challenge 2000โ€“2001. Bioinformatics, 17(1), 107โ€“108.

    Articleย  Google Scholarย 

  • Krogel, M.A., Rawles, S., ลฝeleznรฝ, F., Flach, P., Lavrac, N., Wrobel, S. (2003). Comparative evaluation of approaches to propositionalization. In ILP. Springer.

  • Kuลพelka, O., & ลฝeleznรฝ, F. (2009). Block-wise construction of acyclic relational features with monotone irreducibility and relevancy properties. In ICML 2009: the 26th International Conference on Machine Learning.

  • Kuลพelka, O., ลฝeleznรฝ, F. (2011a). Block-wise construction of tree-like relational features with monotone reducibility and redundancy. Machine Learning, 83, 163โ€“192.

    Articleย  MATHย  MathSciNetย  Google Scholarย 

  • Kuลพelka, O., ลฝeleznรฝ, F. (2011b). Seeing the world through homomorphism: An experimental study on reducibility of examples. In ILPโ€™10: Inductive logic programming (pp. 138โ€“145).

  • Kuลพelka, O., Szabรณovรก, A., ลฝeleznรฝ, F. (2013a). Bounded least general generalization. In ILPโ€™12: inductive logic programming.

  • Kuลพelka, O., Szabรณovรก, A., ลฝeleznรฝ, F. (2013b). Reducing examples in relational learning with bounded-treewidth hypotheses. In New frontiers in mining complex patterns (pp. 17โ€“32).

  • Landwehr, N., Kersting, K., Raedt, L.D. (2007). Integrating naรฏve bayes and FOIL. Journal of Machine Learning Research, 8, 481โ€“507.

    MATHย  Google Scholarย 

  • Lavraฤ, N., Gamberger, D., Jovanoski, V. (1999). A study of relevance for learning in deductive databases. Journal of Logic Programming, 40(2/3), 215โ€“249.

    Articleย  MATHย  MathSciNetย  Google Scholarย 

  • Liu, H.,Motoda, H., Setiono, R., Zhao, Z. (2010). Feature selection: an ever evolving frontier in data mining. Journal of Machine Learning Research - Proceedings Track, 10, 4โ€“13.

    Google Scholarย 

  • Mackworth, A. (1977). Consistency in networks of relations. Artificial Intelligence, 8(1), 99โ€“118.

    Articleย  MATHย  MathSciNetย  Google Scholarย 

  • Maloberti, J., & Sebag, M. (2004). Fast theta-subsumption with constraint satisfaction algorithms. Machine Learning, 55(2), 137โ€“174.

    Articleย  MATHย  Google Scholarย 

  • Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing, Special Issue on Inductive Logic Programming, 13(3โ€“4), 245โ€“286.

    Articleย  Google Scholarย 

  • Nassif, H., Al-Ali, H., Khuri, S., Keirouz, W., Page, D. (2009). An inductive logic programming approach to validate hexose biochemical knowledge. In: Proceedings of the 19th international conference on ILP (pp. 149โ€“165). Leuven.

  • Nienhuys-Cheng, S.H., de Wolf, R., (eds.) (1997). Foundations of inductive logic programming. Lecture Notes in Computer Science (vol. 1228). Springer.

  • Plotkin, G. (1970). A note on inductive generalization. Edinburgh: Edinburgh University Press.

  • Rossi, F., van Beek, P., Walsh T., (Eds.) (2006). Handbook of constraint programming. New York: Elsevier.

  • ลฝakovรก, M., ลฝeleznรฝ, F., Garcia-Sedano, J., Tissot, C.M., Lavraฤ, N., Kล™emen, P., Molina, J. (2007). Relational data mining applied to virtual engineering of product designs. In ILP06, LNAI (vol. 4455, pp. 439โ€“453). Springer.

Download references

Acknowledgments

This work was supported by the Czech Grant Agency through project 103/11/2170 Transferring ILP techniques to SRL. The authors would like to thank the anonymous reviewers of NFMCPโ€™12 and of the JIIS special issue for helpful remarks.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ondล™ej Kuลพelka.

Appendix A: Propositions and Proofs

Appendix A: Propositions and Proofs

Proposition 9

Let X be a set of clauses. If โ‰ผ X is x-subsumption w.r.t. X and โ—ƒ X is an x-presubsumption w.r.t. X then (A โ—ƒ X B) โ‡’ (A โ‰ผ X B) for any two clauses A, B (not necessarily from X).

Proof

We need to show that if A โ—ƒ x B then (C โ‰ผ ๐œƒ A) โ‡’ (C โ‰ผ ๐œƒ B) for all clauses C โˆˆ X. First, if A โ—ƒ x B and C โ‹  ๐œƒ A then the proposition holds trivially. Second, C โ‰ผ ๐œƒ A means that there is a substitution ๐œ— such that C๐œ— โŠ† A. This implies C๐œ— โ—ƒ X B using the condition 1 from definition of x-presubsumption. Now, we can use the second condition which gives us C โ‰ผ ๐œƒ B (note that C โˆˆ X and C๐œ— โ—ƒ X B). โ–ก

Proposition 10

Let k โˆˆ N and let โ—ƒ k be a relation on clauses defined as follows: A โ—ƒ k B if and only if the k-consistency algorithm run on the CSP-encoding of the ๐œƒ-subsumption problem A โ‰ผ ๐œƒ B returns true. The relation โ—ƒ k is an x-presubsumption w.r.t. the set X k of all clauses with treewidth at most k.

Proof

We need to verify that โ—ƒ k satisfies the conditions stated in Definition 9.

  1. 1.

    If A โ—ƒ k B and C โŠ† A then C โ—ƒ k B. This holds because if the k-consistency algorithm returns true for a problem then it must also return true for any of its subproblems (recall the discussion in Section 2.4). It is easy to check that if C โŠ† A are clauses then the CSP problem encoding the ๐œƒ-subsumption problem C โ‰ผ ๐œƒ B is a subproblem of the CSP encoding of the ๐œƒ-subsumption problem A โ‰ผ ๐œƒ B. Therefore this condition holds.

  2. 2.

    If A โˆˆ X, ๐œ— is a substitution and A๐œ— โ—ƒ x B then A โ‰ผ ๐œƒ B. If A๐œ— โ—ƒ k B then the k-consistency algorithm applied on the CSP encoding ๐’ซ๐œ— = (๐’ฑ๐œ—, ๐’Ÿ๐œ—, ๐’ž๐œ—) of the problem A๐œ— โ‰ผ ๐œƒ B finishes with a non-empty set of partial solutions H ๐œ— . When applied on the CSP encoding P = (๐’ฑ, ๐’Ÿ, ๐’ž)s of the problem A โ‰ผ ๐œƒ B, the k-consistency algorithm finishes with a set of partial solutions H. What we need to show is that this set H is non-empty. We will do this by showing that H has a non-empty subset H โˆ— which can be constructed as follows. For each partial solution ฯ† โˆˆ H ๐œ— , we take all sets of variables V โŠ† ๐’ฑ such that |V| โ‰ค k + 1 and V๐œ— โŠ† Supp(ฯ†) and, for each such set v, we add to H โˆ— a new partial solution ฯ† โˆ— such that Supp(ฯ† โˆ—) = V and ฯ† โˆ—(v) = ฯ†(v๐œ—) for all v โˆˆ Supp(ฯ† โˆ—). Clearly, any such ฯ† โˆ—is a valid partial solution of ๐’ซ because otherwise ฯ† could not have been a valid partial solution of ๐’ซ๐œ—. Moreover, for every ฯ† โˆ— โˆˆ H โˆ—with |Supp(ฯ† โˆ—) | โ‰ค k and every variable v โˆˆ ๐’ฑ, there exists a partial solution ฯˆ โˆ— โˆˆ H โˆ— such that ฯ† โˆ— โŠ† ฯˆ โˆ— and v โˆˆ Supp(ฯˆ โˆ—). This can be shown by contradiction as follows. Let us suppose that there is a partial solution ฯ† โˆ— โˆˆ H โˆ— such that |Supp(ฯ† โˆ—) | โ‰ค k and a variable v โˆˆ ๐’ฑ such that there is no ฯˆ โˆ— โˆˆ H โˆ— which would satisfy ฯ† โˆ— โŠ† ฯˆ โˆ—and v โˆˆ Supp(ฯˆ โˆ—). But then the respective solution ฯ† โˆˆ H ๐œ— from which ฯ† โˆ— was constructed should have been removed by the k-consistency algorithm because there could not have been any partial solution ฯˆ โˆˆ H ๐œ— such that ฯ† โŠ† ฯˆ and v๐œ— โˆˆ Supp(ฯˆ) which is a contradiction. For every partial solution ฯ† โˆ— โˆˆ H โˆ—, the set H โˆ— must also contain all partial โ€™sub-solutionsโ€™ (i.e. solutions ฯ†โ€ฒโˆ— such that ฯ†โ€ฒโˆ— โŠ† ฯ† โˆ—). As a consequence of the above, the k-consistency algorithm running on the CSP encoding of the problem A โ‰ผ ๐œƒ B cannot remove from H any partial solution contained in the set H โˆ— and, thus, H โˆ— โŠ† H and the k-consistency algorithm must return the value true. Since A โˆˆ X k , it must also hold A โ‰ผ ๐œƒ B.

  3. 3.

    If A โ‰ผ ๐œƒ B then A โ—ƒ k B. This is a property of k-consistency.

โ–ก

Proposition 11

Let us have a set X and a polynomial-time decision procedure for checking โ—ƒ X which is an x-presubsumption w.r.t. the set X. Then, given a clause A on input, the literal-elimination algorithm finishes in polynomial time and outputs a clause ร‚ satisfying the following conditions:

  1. 1.

    ร‚ โ‰ผ ๐œƒ A and A โ‰ผ X ร‚ where โ‰ผ X is an x-subsumption w.r.t. the set X.

  2. 2.

    |ร‚| โ‰ค |ร‚ ๐œƒ | where ร‚ ๐œƒ is a ๐œƒ-reduction of a subset of Aโ€™s literals with maximum length.

Proof

We start by proving ร‚ โ‰ผ ๐œƒ A and A โ‰ผ X ร‚. This can be shown as follows. First, A โ‰ผ X Aโ€ฒ holds in any step of the algorithm which follows from (Aโ€ฒ โ—ƒ X Aโ€ฒโˆ–{L}) โ‡’ (Aโ€ฒ โ‰ผ X Aโ€ฒโˆ–{L}) โ€“ recall that Aโ€ฒ is replaced by Aโ€ฒโˆ–{L} in the literal elimination algorithm if and only if Aโ€ฒ โ—ƒ X Aโ€ฒโˆ–{L} โ€“ and from transitivity of x-subsumption. Consequently we also have A โ‰ผ X ร‚ because ร‚ = Aโ€ฒ in the last step of the algorithm. Second, ร‚ โ‰ผ ๐œƒ A because ร‚ โŠ† A. Now, we prove the second part of the proposition. What remains to be shown is that the resulting clause ร‚ will not be bigger than ร‚ ๐œƒ . Since ร‚ โŠ† A, it suffices to show that ร‚ cannot be ๐œƒ-reducible. Let us assume, for contradiction, that it is ๐œƒ-reducible. If ร‚ was ๐œƒ-reducible, there would have to be a literal L โˆˆ ร‚ such that ร‚ โ‰ผ๐œƒ ร‚โˆ–{L}. The relation โ—ƒ X satisfies (A โ‰ผ ๐œƒ B) โ‡’ (A โ—ƒ X B) therefore it would also have to hold Aโ€ฒ โ—ƒ X Aโ€ฒโˆ–{L}. However, then L should have been removed by the literal-elimination algorithm which is a contradiction with ร‚ being output of it. The fact that the literal-elimination algorithm finishes in polynomial time follows from the fact that, for a given clause A, it calls the polynomial-time procedure for checking the relation โ—ƒ X at most |A|2 times (the other operations of the literal-elimination algorithm can be performed in polynomial time as well). โ–ก

We will need the following simple lemma in the proof of Proposition 6.

Lemma 1

Let A be a clause. If A is ๐œƒ-reducible then A๐œ— โ‰ˆ ๐œƒ A where ๐œ— = {V/t} for some V โˆˆ vars(A) and t โˆˆ terms(A) , V โ‰  t and |vars(A๐œ—)| < |vars(A)| .

Proof

Let A be ๐œƒ-reducible and let ร‚ ๐œƒ โŠ† A be ๐œƒ-reduction of A. There must be a substitution ๐œƒ โˆ— such that A๐œƒ โˆ— = ร‚ ๐œƒ and |A๐œƒ โˆ— | < |A| and therefore also |vars(A๐œƒ โˆ—) | < |vars(A) | (because if |vars(A๐œƒ โˆ—) | = |vars(A) | then A and A๐œƒ โˆ— would be isomorphic and they would have to contain the same number of literals). Let V โˆ— be a variable contained in vars(A) but not contained in vars(A๐œƒ โˆ—) (there must be some such variable because |vars(A๐œƒ โˆ—) | < |vars(A) | ) and let ๐œ— = {V โˆ—/t} โŠ† ๐œƒ โˆ—. There are two cases depending on whether T is a variable or a constant. Let us start with the case when T is a variable which we denote as W = t for clarity. Now, there must be a variable S โ‰  V โˆ— such that {S/W} โŠ† ๐œƒ which can be shown as follows. If there is no such variable (i.e. if the only variable which is mapped on W is V โˆ—) then we can construct a new substitution ๐œƒ โˆ— โˆ— = ๐œƒ โˆ— ๐œƒ โˆ— which has the following two โ€™interestingโ€™ properties (โ€™interestingโ€™ because they will allow us to arrive at a contradiction): (i) A๐œƒ โˆ— โˆ— โ‰ˆ ๐œƒ A and (ii) |vars(A๐œƒ โˆ— โˆ—) | < |vars(A๐œƒ โˆ—) | . The first property holds because A โ‰ผ ๐œƒ A โˆ— โŠ† ร‚ ๐œƒ โŠ† A implies A โ‰ผ ๐œƒ A๐œƒ โˆ—๐œƒโˆ— โŠ† ร‚ ๐œƒ . The second property can be shown as follows. Clearly, none of the variables not in vars(A๐œƒ โˆ—) can reappear in vars(A๐œƒ โˆ— โˆ—). Moreover, the variable W cannot be contained in vars(A๐œƒ โˆ— โˆ—) because V โˆ— ๐œƒ โˆ— โˆ— โ‰  W and there was no other variable mapped to V โˆ—or W by ๐œƒ โˆ—. Therefore |vars(A๐œƒ โˆ— โˆ—) | < |vars(A๐œƒ โˆ—) | , i.e. the second property must be true as well. However, then we have a contradiction because we assumed that ร‚ ๐œƒ was ๐œƒ-reduction but ร‚ ๐œƒ cannot be a ๐œƒ-reduction because A โ‰ˆ ๐œƒ A๐œƒ โˆ— โˆ—and |A๐œƒ โˆ— โˆ— | < |A๐œƒ โˆ—| (which follows from |vars (A๐œƒ โˆ—โˆ—)| < |vars(A๐œƒ โˆ—)| = |vars(ร‚ ๐œƒ )| and A๐œƒ โˆ—โˆ— โŠ† ร‚ ๐œƒ ). Thus, there must be a variable S โ‰  V โˆ— such that {S/W} โŠ† ๐œƒ. It follows that if we set ๐œ— = {V โˆ—/S} it must hold A๐œƒ โˆ— = A๐œ—๐œƒ โˆ—. Therefore A๐œ— โ‰ผ ๐œƒ A๐œƒ โˆ— โ‰ˆ ๐œƒ A. Now, if T is not a variable but a constant, we can simply set ๐œ— = {V โˆ—/t} and it must hold A๐œƒ โˆ— = A๐œ—๐œƒ โˆ— and therefore also A๐œ— โ‰ผ ๐œƒ A โˆ— โ‰ˆ ๐œƒ A. Since also trivially A โ‰ผ ๐œƒ A๐œ—, we have A๐œ— โ‰ˆ ๐œƒ A and |vars(A๐œ—) | < |vars(A) | which finishes the proof of this Lemma. โ–ก

Proposition 12

Let us have a set X and a polynomial-time decision procedure for checking โ—ƒ X which is an x-presubsumption w.r.t. the set X. Then, given a clause A on input, the literal substitution algorithm finishes in polynomial time and outputs a clause ร‚ satisfying the following conditions:

  1. 1.

    ร‚ โ‰ผ X A and A โ‰ผ ๐œƒ ร‚ where โ‰ผ X is the x-subsumption w.r.t. the set x.

  2. 2.

    |terms(ร‚)| โ‰ค |terms(ร‚ ๐œƒ )| where ร‚ ๐œƒ is a ๐œƒ-reduction of a clause A๐œ— with maximum cardinality of the set (ร‚ ๐œƒ ) where ๐œ— is some substitution mapping variables to elements of the set terms(A).

Proof

We start by proving ร‚ โ‰ผ X A and A โ‰ผ ๐œƒ ร‚. This can be shown as follows. First, Aโ€ฒ โ‰ผ X A holds in any step of the algorithm which follows from (Aโ€ฒ๐œ— โ—ƒ X Aโ€ฒ) โ‡’ (Aโ€ฒ๐œ— โ‰ผ X Aโ€ฒ) โ€“ recall that Aโ€ฒis replaced by Aโ€ฒ๐œ— in the literal substitution algorithm if and only if A>โ€ฒ๐œ— โ—ƒ X Aโ€ฒ โ€“ and from transitivity of x-subsumption. Consequently we also have ร‚ โ‰ผ X A because ร‚ = Aโ€ฒ in the last step of the algorithm. Second, A โ‰ผ ๐œƒ ร‚ because ร‚ = A๐œƒโ€ฒ for some substitution ๐œƒโ€ฒ( ๐œƒโ€ฒis the substitution composed of the substitutions ๐œ— applied on Aโ€ฒin the literal substitution algorithm). Now, we prove the second part of the proposition. What remains to be shown is that the resulting clause ร‚ will not have more terms than ร‚ ๐œƒ . Since ร‚ = A๐œƒโ€ฒ, it suffices to show that ร‚ cannot be ๐œƒ-reducible. Let us assume, for contradiction, that it is ๐œƒ-reducible. If ร‚ was ๐œƒ-reducible, then there would have to be a substitution ๐œ— = {V/t} such that A๐œ— โ‰ˆ ๐œƒ A where ๐œ— = {V/t} for some v โˆˆ vars(A), t โˆˆ terms(A), V โ‰  t (this follows from Lemma 1). The relation โ—ƒ X satisfies (A โ‰ผ ๐œƒ B) โ‡’ (A โ—ƒ X B) therefore it would also have to hold Aโ€ฒ๐œ— โ—ƒ X Aโ€ฒ. However, then ๐œ— should have been applied on Aโ€ฒ by the literal substitution algorithm which is a contradiction with ร‚ being output of it. The fact that the literal-substitution algorithm finishes in polynomial time follows from the fact that, for a given clause A, it calls the polynomial-time procedure for checking the relation โ—ƒ X at most |A| 3times (the other operations of the literal-substitution algorithm can be performed in polynomial time as well). โ–ก

Lemma 2

(Plotkin 1970) Let A and B be clauses. If A โ‰ผ ๐œƒ B then A โŠง B.

Proposition 13

Let โ„’ be a hypothesis language and let e be a clause. Let eอ‚ be a clause obtained from e by variabilizing the constants which are not contained in the hypothesis language. Then (โ„‹ โŠง e) โ‡” (โ„‹ โŠง eอ‚) for any โ„‹ โˆˆ โ„’. If รช is a ๐œƒ-reduction of eอ‚ and |รช| < |e| then รช is also a safe reduction of e.

Proof

We will start by showing validity of the implication (H โŠง e) โ‡’ (H โŠง eอ‚ ). For contradiction, let us assume that H โŠง e and that there is a model M of the clausal theory H such that M โŠง e and M โŠฎ eอ‚ . Then there must be a substitution ๐œƒ grounding all variables in eอ‚ such thatFootnote 4 M โŠง e๐œƒ and M โŠฎ eอ‚ ๐œƒ. Now, we will construct another model Mโ€ฒof H in which e will not be satisfied. We take each constant C in e that has been replaced by a variable v in eอ‚ and update the assignment ฯ• of the constants C to objects from the domain of discourse in the modelFootnote 5 so that ฯ•(c) = ฯ•(V๐œƒ). Clearly, we can do this for every constant C since every constant in e has been replaced by exactly one variable. Now, we see that Mโ€ฒ โŠฎ e. However, we are not done yet as it might happen that the new model with the modified ฯ• would no longer be a model of H. However, this is clearly not the case since none of the constants C appears in H and therefore the change of ฯ• has no effect whatsoever on whether or not H is true in Mโ€ฒ. So, we have arrived at a contradiction. We have a model Mโ€ฒsuch that Mโ€ฒ โŠง H and Mโ€ฒ โŠฎ e which contradicts the assumption H โŠง e. The implication (H โŠง e) โ‡ (H โŠง eอ‚ ) follows directly from Lemma 2. We have eอ‚ โ‰ผ ๐œƒ e therefore also eอ‚ โŠง e and finally eอ‚ โ‰ผ ๐œƒ e. (ii) In order to show (H โŠง e) โ‡’ (H โŠง รช), it suffices to notice that H โŠง e and e โ‰ผ ๐œƒ รช imply H โŠง รช. The implication (H โŠง e) โ‡’ (H โŠง รช) may be shown similarly as follows: H โŠง รช and รช โ‰ผ ๐œƒ e imply H โŠง e. โ–ก

The next lemmas are used to prove Proposition 8. In Lemmas 3 and 4, we formulate rather general conditions under which x-equivalence w.r.t. a set X implies safe equivalence w.r.t. to the set 2X. Then, in the subsequent lemmas, we show that these conditions are satisfied by the set of bounded-treewidth Horn clauses.

Lemma 3

Let X be a set of clauses and let โ„’ โŠ† 2X be a hypothesis language. Let A and B be clauses. Let A โ‰ˆ X B w.r.t. the set X and let the following be true for any โ„‹ โˆˆ โ„’ and any clause C: if โ„‹ โŠง C and C is not a tautology then there is a clause D โˆˆ X such that โ„‹ โŠง D and D โ‰ผ ๐œƒ C. Then for any โ„‹ โˆˆ โ„’, it holds (โ„‹ โŠง A) โ‡” (โ„‹ โŠง B).

Proof

First, we need to consider the case when A and B are both tautologies. If both A and B are tautologies then (โ„‹ โŠง A) โ‡” (โ„‹ โŠง B) naturally holds for any โ„‹. Now, we can consider the case when at most one of the clauses A and B is a tautology. Let us assume w.l.o.g. that if one of the clauses is a tautology then it is the clause B. If โ„‹ โŠง A then there is a clause D โˆˆ X such that โ„‹ โŠง D and D โ‰ผ ๐œƒ A (by the assumptions of the proposition). Since D โˆˆ X, D โ‰ผ ๐œƒ A and A โ‰ผ X B, we have D โ‰ผ ๐œƒ B (from the definition of x-subsumption) and finally also โ„‹ โŠง D โŠง B and so โ„‹ โŠง B. The other implication can be shown in a completely similar fashion if A is not a tautology. โ–ก

Lemma 4

Let X be a set of Horn clauses such that any clause which can be derived from a clausal theory โ„‹ โˆˆ 2X using SLD resolution is contained in X. If e and รช are two Horn clauses such that e โ‰ˆ X รช then for any โ„‹ โˆˆ 2X: (โ„‹ โŠง e) โ‡” (โ„‹ โŠง รช).

Proof

We will use the subsumption theorem for Horn clauses and Lemma 3. We will show that the conditions of this lemma imply conditions of Lemma 3. If A is a non-tautological clause and โ„‹ โŠง A then by the subsumption theorem there must be a clause C derivable from โ„‹ using resolution (SLD resolution, respectively) such that C โ‰ผ ๐œƒ A. Therefore for any non-tautological clause A, if โ„‹ โŠง A where โ„‹ โˆˆ 2Xthen there must be a clause C โˆˆ X such that โ„‹ โŠง C (because resolution is sound) and C โ‰ผ ๐œƒ A. Now, since e โ‰ˆ X รช , we may finish the proof using Lemma 3 which gives us (โ„‹ โŠง e) โ‡” (โ„‹ โŠง รช ) for any โ„‹ โˆˆ 2X. โ–ก

Lemma 5

(Clique containment, lemma Bodlaender and Mohring 1993) Let A be a clause and T A be its tree decomposition. For any l โˆˆ A, there is a vertex in T A labelled by a set of variables v such that vars(l) โŠ† V.

Proof

The proof can be found in Bodlaender and Mohring (1993). More precisely, the paper (Bodlaender and Mohring 1993) contains a lemma which states that if C is a clique in a graph G = (V, E) then any tree decomposition of G contains a vertex labelled by a set of vertices L such that C โŠ† L. Our statement of this lemma in terms of clauses and tree decompositions of clauses then follows immediately from this result which can be shown by noticing that the decomposition of the clause A can be easily converted to a tree decomposition of Aโ€™s Gaifman graph G A where, for any l โˆˆ A, var(l) corresponds to a clique in G A . โ–ก

Lemma 6

Let A be a function-free clause and T A be its tree decomposition. Let l โˆ— โˆˆ A be a literal and let ๐œƒ be a substitution not affecting any variable in the set vars(A)โˆ–vars(l โˆ—), mapping variables to variables or terms (i.e. not to function symbols) and never mapping any variables to elements of the set vars(A). Then a tree decomposition of A๐œƒ can be obtained by applying the substitution ๐œƒ on the variables contained in the labels of the tree decomposition T A and removing constants from these sets if necessary โ€“ we denote the new labelled tree by T A ๐œƒ. As a consequence, the treewidth of A๐œƒ is never greater than the treewidth of A.

Proof

If we apply the substitution ๐œƒ on the labels of the tree decomposition T A then none of the label-sets associated to the vertices of T A increases in size (this is in part due to the fact that we do not consider function symbols). Therefore if we are able to show that T A ๐œƒ is a tree decomposition of A๐œƒ then we will automatically get also the result that the treewidth of A๐œƒ is not greater than the treewidth of A. So, let us show that T A ๐œƒ is indeed a tree decomposition of A๐œƒ.

  1. (i)

    Claim: For every variableV โˆˆ vars(A๐œƒ) there is a vertex of T A ๐œƒ labelled by a set containingV . This is obvious.

  2. (ii)

    Claim: For every pair of variablesu, V which both appear in a literall โˆˆ A๐œƒ, there is a vertex of T A ๐œƒ labelled by a set containing both U and V . There must be a literal lโ€ฒ โˆˆ A containing two variables Uโ€ฒand Vโ€ฒsuch that Uโ€ฒ๐œƒ = U and Vโ€ฒ๐œƒ = V (because U and V are both contained in a literal). Therefore there must be a vertex T of T A labelled by a set S t containing both Uโ€ฒand Vโ€ฒ. After applying the substitution ๐œƒ on the set S t , we get a set by which some vertex contained in T A ๐œƒ is labelled and it contains both U and V .

  3. (iii)

    Claim: For everyV โˆˆ vars(A๐œƒ), the set of vertices of T A ๐œƒ labelled by sets containingV forms a connected subgraph of T A ๐œƒ. Let us assume (for contradiction) that there is a variable V โˆˆ vars(A๐œƒ) such that the set of vertices of T A ๐œƒ labelled by sets containing V forms a disconnected graph. It follows that there must be two variables Uโ€ฒ, Vโ€ฒ( Uโ€ฒ โ‰  Vโ€ฒ) such that Uโ€ฒ๐œƒ = Vโ€ฒ๐œƒ = V and the sets of vertices S Uโ€ฒ and S Vโ€ฒ of the tree decomposition T A corresponding to the variables Uโ€ฒand Vโ€ฒ, respectively, must be disjoint (because the set of vertices with labels containing a given variable must form a connected subgraph in any tree decomposition). However, the variables Uโ€ฒand Vโ€ฒ must appear in the literal l โˆ— because the substitution ๐œƒ affects only variables contained in l โˆ— and maps no variables of A to elements of the set vars(A) (since at least one of the variables must have been affected by the substitution and since it is equal to the other variable, the other variable must have been affected by the substitution as well). Thus, since both Uโ€ฒ and Vโ€ฒ must be contained in l โˆ— there must be a vertex in the tree decomposition T A labelled by a set which contains both Uโ€ฒ and Vโ€ฒ. The sets of vertices of T A labelled by sets containing the variables Uโ€ฒ and Vโ€ฒ, respectively, therefore cannot be disjoint which is a contradiction.

We have thus shown that T A ๐œƒ is a tree decomposition of A๐œƒ. โ–ก

Lemma 7

Let A = l 1 โˆจ l 2 โˆจ โ‹ฏ โˆจ l m and B = m 1 โˆจ m 2 โˆจ โ‹ฏ โˆจ m n be two standardized-apart function-free clauses. Let ๐œƒ be a most general unifier of the literals l i , ยฌ m j not affecting any variable in the set (vars(A) โˆช vars(B))โˆ–(vars(l i ) โˆช vars(m j )) such that vars(l i ๐œƒ) โˆฉ vars(A) = vars(l i ๐œƒ) โˆฉ vars(B) = โˆ…. Next, let

$$C = (l_{1} \vee \dots \vee l_{i-1} \vee l_{i+1} \vee \dots \vee l_{m} \vee m_{1} \vee \dots \vee m_{j-1} \vee m_{j+1} \vee \dots \vee m_n)\theta $$

be a binary resolvent of A and B. Then for the treewidth k C of C, it holds k C โ‰ค max{k A , k B } where k A is the treewidth of A and k B is the treewidth of B.

Proof

Using Lemma 6, we get that A๐œƒ has a tree decomposition T A๐œƒ of width at most k A and B๐œƒ has a tree decomposition T B๐œƒ of width at most k B . We will now show how to construct a tree decomposition of width at most max{k A , k B } for the clause C. Let V A ( V B , respectively) be a vertex from T A๐œƒ ( T B๐œƒ , respectively) which is labelled by a set of variables ๐’ฑ A ( ๐’ฑ B , respectively) such that vars(l i ๐œƒ) โŠ† ๐’ฑ A ( vars(l i ๐œƒ) โŠ† ๐’ฑ B ) โ€“ such vertices must exist by Lemma 5. We construct the new tree decomposition T C by connecting T A๐œƒ and T B๐œƒ by a new edge between the vertices V A and V B (we may remove the variables not contained in the clause C from the labels of the vertices of T C ). Clearly, T C has width at most max{k A , k B }. We need to show that it is indeed a tree decomposition of C. The first two conditions from Definition 7 are trivially satisfied which follows from the fact that T A๐œƒ and T B๐œƒ are tree decompositions of the two clauses A๐œƒ and B๐œƒ and from C โŠ† A๐œƒ โˆช B๐œƒ. It remains to show validity of the third condition (connectedness).Let us assume (for contradiction) that there is a variable V โˆˆ vars(C) such that the vertices labelled by sets containing the variable V form a disconnected subgraph of T C . The variable V cannot be contained in vars(A) or vars(B). If v โˆˆ vars(A) then V โˆ‰ vars(B) (because A and B were standardized apart) and also V โˆ‰ vars(l i ๐œƒ) (because we selected the unifier ๐œƒ to satisfy vars(l i ๐œƒ) โˆฉ vars(A) = โˆ…) but then the set of vertices labelled by sets containing the variable V could not be disconnected because it is actually connected in T A๐œƒ and none of the labels in T B๐œƒ contains V . The same argument can be used to show that V โˆ‰ vars(B). So the only remaining possibility is that V โˆˆ vars(l i ๐œƒ). However, this is not possible either. Since both T A๐œƒ and T B๐œƒ are tree decompositions, the set of vertices labelled by the sets containing the variable V forms a connected subgraph in both T A๐œƒ and T B๐œƒ . Moreover, a vertex from T A๐œƒ and a vertex from T B๐œƒ which are both labelled by sets containing all variables from vars(l๐œƒ) are connected by an edge in T C therefore the set of vertices labelled by the sets containing the variable V must form a connected subgraph of T C . Thus, we have arrived at a contradiction because there cannot be any variable with a disconnected subgraph of T C associated to it.

We have verified that T C is a tree decomposition of C with width at most max{k A , k B }. โ–ก

Lemma 8

Let X k be the set of all function-free Horn clauses with treewidth at most k. Then for any clause C derivable by SLD resolution from a clausal theory \(\mathcal {H} \in 2^{X_{k}}\) it holds C โˆˆ X k .

Proof

Since this lemma considers only SLD resolution, we can consider just the case of binary resolvents (we do not need to take factors into account). The proposition then follows immediately from Lemma 7 because any clause derived by applying the binary resolution rule on two clauses must always have treewidth bounded by the treewidth of the clauses from which it was derived. โ–ก

Proposition 14

Let X k be the set of all function-free Horn clauses with treewidth at most k and let \(\mathcal {L} = 2^{X_{k}}\) be the set of theories consisting of function-free Horn clauses with treewidth at most k. Then any two clauses which are x-equivalent w.r.t. X k are also safely equivalent w.r.t. โ„’.

Proof

Follows directly from Lemmas 4 and 8. โ–ก

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuลพelka, O., Szabรณovรก, A. & ลฝeleznรฝ, F. A method for reduction of examples in relational learning. J Intell Inf Syst 42, 255โ€“281 (2014). https://doi.org/10.1007/s10844-013-0294-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-013-0294-z

Keywords

Navigation