EXiT-B: A New Approach for Extracting Maximal Frequent Subtrees from XML Data

Paik, Juryon; Won, Dongho; Fotouhi, Farshad; Kim, Ung Mo

doi:10.1007/11508069_1

EXiT-B: A New Approach for Extracting Maximal Frequent Subtrees from XML Data

Juryon Paik¹⁹,
Dongho Won¹⁹,
Farshad Fotouhi²⁰ &
…
Ung Mo Kim¹⁹

Conference paper

1320 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3578))

Abstract

Along with the increasing amounts of XML data available, the data mining community has been motivated to discover the useful information from the collections of XML documents. One of the most popular approaches to find the information is to extract frequent subtrees from a set of XML trees. In this paper, we propose a novel algorithm, EXiT-B, for efficiently extracting maximal frequent subtrees from a set of XML documents. The main contribution of our algorithm is that there is no need to perform tree join operation during the phase of generating maximal frequent subtrees. Thus, the task of finding maximal frequent subtrees can be significantly simplified comparing to the previous approaches.

This research was supported by the MIC(Ministry of Information and Communication), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Assessment)

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. of the 12th International Conference on Very Large Databases, pp. 487–499 (1994)
Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: Proc. of the 2nd SIAM International Conference on Data Mining (ICDM 2002) (2002)
Google Scholar
Inokuchi, A., Washio, T., Motoda, H.: An Apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Chapter Google Scholar
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proc. of IEEE International Conference on Data Mining (ICDM 2001), pp. 313–320 (2001)
Google Scholar
Miyahara, T., Suzuki, T., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of frequent tag tree patterns in semistructured web documents. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 341–355. Springer, Heidelberg (2002)
Chapter Google Scholar
Paik, J., Shin, D.R., Kim, U.: EFoX: a Scalable Method for Extracting Frequent Subtrees. In: Proc. of the 5th International Conference on Computational Science (ICCS 2005), Atlanta, USA, May 22-25 (2005) (to appear)
Google Scholar
Termier, A., Rousset, M.-C., Sebag, M.: TreeFinder: a First step towards XML data mining. In: Proc. of IEEE International Conference on Data Mining (ICDM 2002), pp. 450–457 (2002)
Google Scholar
Wang, K., Liu, H.: Schema discovery for semistructured data. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD 1997), pp. 271–274 (1997)
Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD 2002), pp. 71–80 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Sungkyunkwan University, 300 Chunchun-dong, Jangan-gu, Suwon, Gyeonggi-do, 440-746, Republic of Korea
Juryon Paik, Dongho Won & Ung Mo Kim
Wayne State University, Detroit, MI, USA
Farshad Fotouhi

Authors

Juryon Paik
View author publications
You can also search for this author in PubMed Google Scholar
Dongho Won
View author publications
You can also search for this author in PubMed Google Scholar
Farshad Fotouhi
View author publications
You can also search for this author in PubMed Google Scholar
Ung Mo Kim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology and Electrical Engineering, University of Queensland, 4072, Australia
Marcus Gallagher
, POB 30031, FL 32503-1031, Pensacola
James P. Hogan
Faculty of Information Technology, Queensland University of Technology, Box 2434, Q 4001, Brisbane, Australia
Frederic Maire

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Paik, J., Won, D., Fotouhi, F., Kim, U.M. (2005). EXiT-B: A New Approach for Extracting Maximal Frequent Subtrees from XML Data. In: Gallagher, M., Hogan, J.P., Maire, F. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2005. IDEAL 2005. Lecture Notes in Computer Science, vol 3578. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11508069_1

Download citation

DOI: https://doi.org/10.1007/11508069_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26972-4
Online ISBN: 978-3-540-31693-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics