# Polynomial Time Inductive Inference of Ordered Tree Patterns with Internal Structured Variables from Positive Data

## Abstract

Tree structured data such as HTML/XML files are represented by rooted trees with ordered children and edge labels. As a representation of a tree structured pattern in such tree structured data, we propose an ordered tree pattern, called a term tree, which is a rooted tree pattern consisting of ordered children and internal structured variables. A term tree is a generalization of standard tree patterns representing first order terms in formal logic. For a set of edge labels *Λ* and a term tree *t*, the term tree language of *t*, denoted by *L* _{Λ}(*t*), is the set of all labeled trees which are obtained from a term tree *t* by substituting arbitrary labeled trees for all variables in *t*. In this paper, we propose polynomial time algorithms for the following two problems for two fundamental classes of term trees. The membership problem is, given a term tree *t* and a tree *T*, to decide whether or not *L* _{Λ}(*t*) includes *T*. The minimal language problem is, given a set of labeled trees *S*, to find a term tree *t* such that *L* _{Λ}(*t*) is minimal among all term tree languages which contain all trees in *S*. Then, by using these two algorithms, we show that the two classes of term trees are polynomial time inductively inferable from positive data.

