Skip to main content

TreeMiner: An Efficient Algorithm for Mining Embedded Ordered Frequent Trees

  • Chapter
Advanced Methods for Knowledge Discovery from Complex Data

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

Summary

Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called a scope-list. We contrast TreeMiner with a pattern-matching tree-mining algorithm (PatternMatcher). We conduct detailed experiments to test the performance and scalability of these methods. We find that TreeMiner outperforms the pattern matching approach by a factor of 4 to 20, and has good scale-up properties. We also present an application of tree mining to analyze real web logs for usage patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., H. Kaplan and T. Milo, 2001: Compact labeling schemes for ancestor queries. ACM Symp. on Discrete Algorithms.

    Google Scholar 

  2. Abiteboul, S., and V. Vianu, 1997: Regular path expressions with constraints. ACM Int’l Conf. on Principles of Database Systems.

    Google Scholar 

  3. Agrawal, R., H. Mannila, R. Srikant, H. Toivonen and A. I. Verkamo, 1996: Fast discovery of association rules. Advances in Knowledge Discovery and Data Mining, U. Fayyad et al., eds., AAAI Press, Menlo Park, CA, 307–28.

    Google Scholar 

  4. Agrawal, R., and R. Srikant, 1995: Mining sequential patterns. 11th Intl. Conf. on Data Engineering.

    Google Scholar 

  5. Asai, T., K. Abe, S. Kawasoe, H. Arimura, H. Satamoto and S. Arikawa, 2002: Efficient substructure discovery from large semi-structured data. 2nd SIAM Int’l Conference on Data Mining.

    Google Scholar 

  6. Asai, T., H. Arimura, T. Uno and S. Nakano, 2003: Discovering frequent substructures in large unordered trees. 6th Int’l Conf. on Discovery Science.

    Google Scholar 

  7. Chen, M., J. Park and P. Yu, 1996: Data mining for path traversal patterns in a web environment. International Conference on Distributed Computing Systems.

    Google Scholar 

  8. Chen, Z., H. Jagadish, F. Korn, N. Koudas, S. Muthukrishnan, R. Ng and D. Srivastava, 2001: Counting twig matches in a tree. 17th Intl. Conf. on Data Engineering.

    Google Scholar 

  9. Chi, Y., Y. Yang and R. R. Muntz, 2003: Indexing and mining free trees. 3rd IEEE International Conference on Data Mining.

    Google Scholar 

  10. — 2004: Hybridtreeminer: An efficient algorihtm for mining frequent rooted trees and free trees using canonical forms. 16th International Conference on Scientific and Statistical Database Management.

    Google Scholar 

  11. Cole, R., R. Hariharan and P. Indyk, 1999: Tree pattern matching and subset matching in deterministic o(n log3n)-time. 10th Symposium on Discrete Algorithms.

    Google Scholar 

  12. Cook, D., and L. Holder, 1994: Substructure discovery using minimal description length and background knowledge. Journal of Artificial Intelligence Research, 1, 231–55.

    Google Scholar 

  13. Cooley, R., B. Mobasher and J. Srivastava, 1997: Web mining: Information and pattern discovery on the world wide web. 8th IEEE Intl. Conf. on Tools with AI.

    Google Scholar 

  14. Dehaspe, L., H. Toivonen and R. King, 1998: Finding frequent substructures in chemical compounds. 4th Intl. Conf. Knowledge Discovery and Data Mining.

    Google Scholar 

  15. Fernandez, M., and D. Suciu, 1998: Optimizing regular path expressions using graph schemas. IEEE Int’l Conf. on Data Engineering.

    Google Scholar 

  16. Huan, J., W. Wang and J. Prins, 2003: Efficient mining of frequent subgraphs in the presence of isomorphism. IEEE Int’l Conf. on Data Mining.

    Google Scholar 

  17. Inokuchi, A., T. Washio and H. Motoda, 2000: An Apriori-based algorithm for mining frequent substructures from graph data. 4th European Conference on Principles of Knowledge Discovery and Data Mining.

    Google Scholar 

  18. — 2003: Complete mining of frequent patterns from graphs: Mining graph data. Machine Learning, 50, 321–54.

    Article  Google Scholar 

  19. Kilpelainen, P., and H. Mannila, 1995: Ordered and unordered tree inclusion. SIAM J. of Computing, 24, 340–56.

    MathSciNet  Google Scholar 

  20. Kuramochi, M., and G. Karypis, 2001: Frequent subgraph discovery. 1st IEEE Int’l Conf. on Data Mining.

    Google Scholar 

  21. — 2004: An efficient algorithm for discovering frequent subgraphs. IEEE Transactions on Knowledge and Data Engineering, 16, 1038–51.

    Article  Google Scholar 

  22. Li, Q., and B. Moon, 2001: Indexing and querying XML data for regular path expressions. 27th Int’l Conf. on Very Large Databases.

    Google Scholar 

  23. Nijssen, S., and J. N. Kok, 2003: Efficient discovery of frequent unordered trees. 1st Int’l Workshop on Mining Graphs, Trees and Sequences.

    Google Scholar 

  24. — 2004: A quickstart in frequent structure mining can make a difference. ACM SIGKDD Int’l Conf. on KDD.

    Google Scholar 

  25. Punin, J., M. Krishnamoorthy and M. J. Zaki, 2001: LOGML: Log markup language for web usage mining. ACM SIGKDD Workshop on Mining Log Data Across All Customer TouchPoints.

    Google Scholar 

  26. Ruckert, U., and S. Kramer, 2004: Frequent free tree discovery in graph data. Special Track on Data Mining, ACM Symposium on Applied Computing.

    Google Scholar 

  27. Shamir, R., and D. Tsur, 1999: Faster subtree isomorphism. Journal of Algorithms, 33, 267–80.

    Article  MathSciNet  Google Scholar 

  28. Shapiro, B., and K. Zhang, 1990: Comparing multiple RNA secondary structures using tree comparisons. Computer Applications in Biosciences, 6(4), 309–18.

    Google Scholar 

  29. Shasha, D., J. Wang and S. Zhang, 2004: Unordered tree mining with applications to phylogeny. International Conference on Data Engineering.

    Google Scholar 

  30. Termier, A., M.-C. Rousset and M. Sebag, 2002: Treefinder: a first step towards XML data mining. IEEE Int’l Conf. on Data Mining.

    Google Scholar 

  31. Wang, C., M. Hong, J. Pei, H. Zhou, W. Wang and B. Shi, 2004: Efficient pattern-growth methods for frequent tree pattern mining. Pacific-Asia Conference on KDD.

    Google Scholar 

  32. Wang, K., and H. Liu, 1998: Discovering typical structures of documents: A road map approach. ACM SIGIR Conference on Information Retrieval.

    Google Scholar 

  33. Xiao, Y., J.-F. Yao, Z. Li and M. H. Dunham, 2003: Efficient data mining for maximal frequent subtrees. International Conference on Data Mining.

    Google Scholar 

  34. Yan, X., and J. Han, 2002: gSpan: Graph-based substructure pattern mining. IEEE Int’l Conf. on Data Mining.

    Google Scholar 

  35. — 2003: Closegraph: Mining closed frequent graph patterns. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining.

    Google Scholar 

  36. Yoshida, K., and H. Motoda, 1995: CLIP: Concept learning from inference patterns. Artificial Intelligence, 75, 63–92.

    Article  Google Scholar 

  37. Zaki, M. J., 2001: Efficiently mining trees in a forest. Technical Report 01-7, Computer Science Dept., Rensselaer Polytechnic Institute.

    Google Scholar 

  38. — 2002: Efficiently mining frequent trees in a forest. 8th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining.

    Google Scholar 

  39. Zaki, M. J. and C. Aggarwal, 2003: Xrules: An effective structural classifier for XML data. 9th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining.

    Google Scholar 

  40. Zhang, C., J. Naughton, D. DeWitt, Q. Luo and G. Lohman, 2001: On supporting containment queries in relational database managment systems. ACM Int’l Conf. on Management of Data.

    Google Scholar 

Download references

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Dr Sanghamitra Bandyopadhyay

About this chapter

Cite this chapter

Zaki, M.J. (2005). TreeMiner: An Efficient Algorithm for Mining Embedded Ordered Frequent Trees. In: Advanced Methods for Knowledge Discovery from Complex Data. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/1-84628-284-5_5

Download citation

  • DOI: https://doi.org/10.1007/1-84628-284-5_5

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-85233-989-0

  • Online ISBN: 978-1-84628-284-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics