Abstract
Along with the increasing amounts of XML data available, the data mining community has been motivated to discover the useful information from the collections of XML documents. One of the most popular approaches to find the information is to extract frequent subtrees from a set of XML trees. In this paper, we propose a novel algorithm, EXiT-B, for efficiently extracting maximal frequent subtrees from a set of XML documents. The main contribution of our algorithm is that there is no need to perform tree join operation during the phase of generating maximal frequent subtrees. Thus, the task of finding maximal frequent subtrees can be significantly simplified comparing to the previous approaches.
This research was supported by the MIC(Ministry of Information and Communication), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Assessment)
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. of the 12th International Conference on Very Large Databases, pp. 487–499 (1994)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: Proc. of the 2nd SIAM International Conference on Data Mining (ICDM 2002) (2002)
Inokuchi, A., Washio, T., Motoda, H.: An Apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proc. of IEEE International Conference on Data Mining (ICDM 2001), pp. 313–320 (2001)
Miyahara, T., Suzuki, T., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of frequent tag tree patterns in semistructured web documents. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 341–355. Springer, Heidelberg (2002)
Paik, J., Shin, D.R., Kim, U.: EFoX: a Scalable Method for Extracting Frequent Subtrees. In: Proc. of the 5th International Conference on Computational Science (ICCS 2005), Atlanta, USA, May 22-25 (2005) (to appear)
Termier, A., Rousset, M.-C., Sebag, M.: TreeFinder: a First step towards XML data mining. In: Proc. of IEEE International Conference on Data Mining (ICDM 2002), pp. 450–457 (2002)
Wang, K., Liu, H.: Schema discovery for semistructured data. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD 1997), pp. 271–274 (1997)
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD 2002), pp. 71–80 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Paik, J., Won, D., Fotouhi, F., Kim, U.M. (2005). EXiT-B: A New Approach for Extracting Maximal Frequent Subtrees from XML Data. In: Gallagher, M., Hogan, J.P., Maire, F. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2005. IDEAL 2005. Lecture Notes in Computer Science, vol 3578. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11508069_1
Download citation
DOI: https://doi.org/10.1007/11508069_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26972-4
Online ISBN: 978-3-540-31693-0
eBook Packages: Computer ScienceComputer Science (R0)