Skip to main content

A Novel Method for Mining Frequent Subtrees from XML Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3177))

Abstract

In this paper, we focus on the problem of finding frequent subtrees in a large collection of XML data, where both of the patterns and the data are modeled by labeled ordered trees. We present an efficient algorithm RSTMiner that computes all rooted subtrees appearing in a collection of XML data trees with frequent above a user-specified threshold using a special structure Me-tree. In this algorithm, Me-tree is used as a merging tree to supply scheme information for efficient pruning and mining frequent sub-trees. The keys of the algorithm are efficient pruning candidates with Me-Tree structure and incrementally enumerating all rooted sub-trees in canonical form based on a extended right most expansion technique. Experiment results show that RSTMiner algorithm is efficient and scalable.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wang, K., Liu, H.: Discovering Structural Association of Semistructured Data. IEEE Trans. Knowl. Data Eng. 12, 353–371 (2000)

    Article  Google Scholar 

  2. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast Discovery of Association Rules. Advances in Knowledge Discovery and Data Mining, 307–328 (1996)

    Google Scholar 

  3. Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees: Proofs. Technical Report 1, Leiden Institute of Advanced Computer Science, Universiteit Leiden (2003)

    Google Scholar 

  4. Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: Proceedings of the IEEE International Conference on Data Mining, IEEE ICDM (2002)

    Google Scholar 

  5. Inokuchi, A., Washio, T., Motoda, H.: An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  6. Yang, L.H., Lee, M.-L., Hsu, W., Acharya, S.: Mining Frequent Quer Patterns from XML Queries. In: DASFAA, pp. 355–362 (2003)

    Google Scholar 

  7. Zaki, M.J.: Efficiently mining frequent trees in a forest. In: KDD, pp. 71–80 (2002)

    Google Scholar 

  8. Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees. In: Proceedings of the first International Workshop on Mining Graphs, Trees and Sequences, MGTS 2003 (2003)

    Google Scholar 

  9. Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. In: SDM 2002 (2002)

    Google Scholar 

  10. Liu, G., Lu, H., Xu, Y., Yu, J.X.: Ascending Frequency Ordered Prefixtree: Efficient Mining of Frequent Patterns. In: DASFAA, pp. 65–72 (2003)

    Google Scholar 

  11. Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  12. Yang, L.H., Lee, M.-L., Hsu, W.: Efficient Mining of XML Query Patterns for Caching. In: VLDB, pp. 69–80 (2003)

    Google Scholar 

  13. http://www.cs.wics.edu/niagara/data.html

  14. Miyahara, T., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 47–52. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  15. Aho, A.V., Hopcroft, J.E., Ullman, J.D.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, WS., Liu, DX., Zhang, JP. (2004). A Novel Method for Mining Frequent Subtrees from XML Data. In: Yang, Z.R., Yin, H., Everson, R.M. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2004. IDEAL 2004. Lecture Notes in Computer Science, vol 3177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28651-6_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28651-6_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22881-3

  • Online ISBN: 978-3-540-28651-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics