Skip to main content

EXiT-B: A New Approach for Extracting Maximal Frequent Subtrees from XML Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3578))

Abstract

Along with the increasing amounts of XML data available, the data mining community has been motivated to discover the useful information from the collections of XML documents. One of the most popular approaches to find the information is to extract frequent subtrees from a set of XML trees. In this paper, we propose a novel algorithm, EXiT-B, for efficiently extracting maximal frequent subtrees from a set of XML documents. The main contribution of our algorithm is that there is no need to perform tree join operation during the phase of generating maximal frequent subtrees. Thus, the task of finding maximal frequent subtrees can be significantly simplified comparing to the previous approaches.

This research was supported by the MIC(Ministry of Information and Communication), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Assessment)

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. of the 12th International Conference on Very Large Databases, pp. 487–499 (1994)

    Google Scholar 

  2. Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: Proc. of the 2nd SIAM International Conference on Data Mining (ICDM 2002) (2002)

    Google Scholar 

  3. Inokuchi, A., Washio, T., Motoda, H.: An Apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  4. Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proc. of IEEE International Conference on Data Mining (ICDM 2001), pp. 313–320 (2001)

    Google Scholar 

  5. Miyahara, T., Suzuki, T., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of frequent tag tree patterns in semistructured web documents. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 341–355. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  6. Paik, J., Shin, D.R., Kim, U.: EFoX: a Scalable Method for Extracting Frequent Subtrees. In: Proc. of the 5th International Conference on Computational Science (ICCS 2005), Atlanta, USA, May 22-25 (2005) (to appear)

    Google Scholar 

  7. Termier, A., Rousset, M.-C., Sebag, M.: TreeFinder: a First step towards XML data mining. In: Proc. of IEEE International Conference on Data Mining (ICDM 2002), pp. 450–457 (2002)

    Google Scholar 

  8. Wang, K., Liu, H.: Schema discovery for semistructured data. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD 1997), pp. 271–274 (1997)

    Google Scholar 

  9. Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD 2002), pp. 71–80 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Paik, J., Won, D., Fotouhi, F., Kim, U.M. (2005). EXiT-B: A New Approach for Extracting Maximal Frequent Subtrees from XML Data. In: Gallagher, M., Hogan, J.P., Maire, F. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2005. IDEAL 2005. Lecture Notes in Computer Science, vol 3578. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11508069_1

Download citation

  • DOI: https://doi.org/10.1007/11508069_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26972-4

  • Online ISBN: 978-3-540-31693-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics