Polynomial Time Inductive Inference of Ordered Tree Patterns with Internal Structured Variables from Positive Data
Tree structured data such as HTML/XML files are represented by rooted trees with ordered children and edge labels. As a representation of a tree structured pattern in such tree structured data, we propose an ordered tree pattern, called a term tree, which is a rooted tree pattern consisting of ordered children and internal structured variables. A term tree is a generalization of standard tree patterns representing first order terms in formal logic. For a set of edge labels Λ and a term tree t, the term tree language of t, denoted by L Λ(t), is the set of all labeled trees which are obtained from a term tree t by substituting arbitrary labeled trees for all variables in t. In this paper, we propose polynomial time algorithms for the following two problems for two fundamental classes of term trees. The membership problem is, given a term tree t and a tree T, to decide whether or not L Λ(t) includes T. The minimal language problem is, given a set of labeled trees S, to find a term tree t such that L Λ(t) is minimal among all term tree languages which contain all trees in S. Then, by using these two algorithms, we show that the two classes of term trees are polynomial time inductively inferable from positive data.
Unable to display preview. Download preview PDF.
- 1.S. Abiteboul, P. Buneman, and D. Suciu. Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, 2000.Google Scholar
- 2.T. R. Amoth, P. Cull, and P. Tadepalli. Exact learning of unordered tree patterns from queries. Proc. COLT-99, ACM Press, pages 323–332, 1999.Google Scholar
- 4.H. Arimura, H. Sakamoto, and S. Arikawa. Efficient learning of semi-structured data from queries. Proc. ALT-2001, Springer-Verlag, LNAI 2225, pages 315–331, 2001.Google Scholar
- 5.H. Arimura, T. Shinohara, and S. Otsuki. Finding minimal generalizations for unions of pattern languages and its application to inductive inference from positive data. Proc. STACS-94, Springer-Verlag, LNCS 775, pages 649–660, 1994.Google Scholar
- 6.S. Matsumoto, Y. Hayashi, and T. Shoudai. Polynomial time inductive inference of regular term tree languages from positive data. Proc. ALT-97, Springer-Verlag, LNAI 1316, pages 212–227, 1997.Google Scholar
- 7.T. Miyahara, T. Shoudai, T. Uchida, K. Takahashi, and H. Ueda. Polynomial time matching algorithms for tree-like structured patterns in knowledge discovery. Proc. PAKDD-2000, Springer-Verlag, LNAI 1805, pages 5–16, 2000.Google Scholar
- 8.T. Miyahara, T. Shoudai, T. Uchida, K. Takahashi, and H. Ueda. Discovery of frequent tree structured patterns in semistructured web documents. Proc. PAKDD-2001, Springer-Verlag, LNAI 2035, pages 47–52, 2001.Google Scholar
- 9.T. Miyahara, Y. Suzuki, T. Shoudai, T. Uchida, K. Takahashi, and H. Ueda. Discovery of frequent tag tree patterns in semistructured web documents. Proc. PAKDD-2002, Springer-Verlag, LNAI (to appear), 2002.Google Scholar
- 10.T. Shinohara. Polynomial time inference of extended regular pattern languages. In Springer-Verlag, LNCS 147, pages 115–127, 1982.Google Scholar
- 11.T. Shinohara and S. Arikawa. Pattern inference. GOSLER Final Report, Springer-Verlag, LNAI 961, pages 259–291, 1995.Google Scholar
- 12.T. Shoudai, T. Miyahara, T. Uchida, and S. Matsumoto. Inductive inference of regular term tree languages and its application to knowledge discovery. Information Modeling and Knowledge Bases XI, IOS Press, pages 85–102, 2000.Google Scholar
- 13.T. Shoudai, T. Uchida, and T. Miyahara. Polynomial time algorithms for finding unordered tree patterns with internal variables. Proc. FCT-2001, Springer-Verlag, LNCS 2138, pages 335–346, 2001.Google Scholar
- 14.Y. Suzuki, T. Shoudai, T. Miyahara, and T. Uchida. Polynomial time inductive inference of ordered tree patterns with internal structured variables from positive data. Tech. Rep. Japanese Society for Artificial Intelligence, SIG-FAI-A104, pages 71–78, 2002.Google Scholar