TreeMiner: An Efficient Algorithm for Mining Embedded Ordered Frequent Trees

Zaki, Mohammed J.

doi:10.1007/1-84628-284-5_5

Mohammed J. Zaki

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

990 Accesses
1 Citations

Summary

Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called a scope-list. We contrast TreeMiner with a pattern-matching tree-mining algorithm (PatternMatcher). We conduct detailed experiments to test the performance and scalability of these methods. We find that TreeMiner outperforms the pattern matching approach by a factor of 4 to 20, and has good scale-up properties. We also present an application of tree mining to analyze real web logs for usage patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abiteboul, S., H. Kaplan and T. Milo, 2001: Compact labeling schemes for ancestor queries. ACM Symp. on Discrete Algorithms.
Google Scholar
Abiteboul, S., and V. Vianu, 1997: Regular path expressions with constraints. ACM Int’l Conf. on Principles of Database Systems.
Google Scholar
Agrawal, R., H. Mannila, R. Srikant, H. Toivonen and A. I. Verkamo, 1996: Fast discovery of association rules. Advances in Knowledge Discovery and Data Mining, U. Fayyad et al., eds., AAAI Press, Menlo Park, CA, 307–28.
Google Scholar
Agrawal, R., and R. Srikant, 1995: Mining sequential patterns. 11th Intl. Conf. on Data Engineering.
Google Scholar
Asai, T., K. Abe, S. Kawasoe, H. Arimura, H. Satamoto and S. Arikawa, 2002: Efficient substructure discovery from large semi-structured data. 2nd SIAM Int’l Conference on Data Mining.
Google Scholar
Asai, T., H. Arimura, T. Uno and S. Nakano, 2003: Discovering frequent substructures in large unordered trees. 6th Int’l Conf. on Discovery Science.
Google Scholar
Chen, M., J. Park and P. Yu, 1996: Data mining for path traversal patterns in a web environment. International Conference on Distributed Computing Systems.
Google Scholar
Chen, Z., H. Jagadish, F. Korn, N. Koudas, S. Muthukrishnan, R. Ng and D. Srivastava, 2001: Counting twig matches in a tree. 17th Intl. Conf. on Data Engineering.
Google Scholar
Chi, Y., Y. Yang and R. R. Muntz, 2003: Indexing and mining free trees. 3rd IEEE International Conference on Data Mining.
Google Scholar
— 2004: Hybridtreeminer: An efficient algorihtm for mining frequent rooted trees and free trees using canonical forms. 16th International Conference on Scientific and Statistical Database Management.
Google Scholar
Cole, R., R. Hariharan and P. Indyk, 1999: Tree pattern matching and subset matching in deterministic o(n log³n)-time. 10th Symposium on Discrete Algorithms.
Google Scholar
Cook, D., and L. Holder, 1994: Substructure discovery using minimal description length and background knowledge. Journal of Artificial Intelligence Research, 1, 231–55.
Google Scholar
Cooley, R., B. Mobasher and J. Srivastava, 1997: Web mining: Information and pattern discovery on the world wide web. 8th IEEE Intl. Conf. on Tools with AI.
Google Scholar
Dehaspe, L., H. Toivonen and R. King, 1998: Finding frequent substructures in chemical compounds. 4th Intl. Conf. Knowledge Discovery and Data Mining.
Google Scholar
Fernandez, M., and D. Suciu, 1998: Optimizing regular path expressions using graph schemas. IEEE Int’l Conf. on Data Engineering.
Google Scholar
Huan, J., W. Wang and J. Prins, 2003: Efficient mining of frequent subgraphs in the presence of isomorphism. IEEE Int’l Conf. on Data Mining.
Google Scholar
Inokuchi, A., T. Washio and H. Motoda, 2000: An Apriori-based algorithm for mining frequent substructures from graph data. 4th European Conference on Principles of Knowledge Discovery and Data Mining.
Google Scholar
— 2003: Complete mining of frequent patterns from graphs: Mining graph data. Machine Learning, 50, 321–54.
Article Google Scholar
Kilpelainen, P., and H. Mannila, 1995: Ordered and unordered tree inclusion. SIAM J. of Computing, 24, 340–56.
MathSciNet Google Scholar
Kuramochi, M., and G. Karypis, 2001: Frequent subgraph discovery. 1st IEEE Int’l Conf. on Data Mining.
Google Scholar
— 2004: An efficient algorithm for discovering frequent subgraphs. IEEE Transactions on Knowledge and Data Engineering, 16, 1038–51.
Article Google Scholar
Li, Q., and B. Moon, 2001: Indexing and querying XML data for regular path expressions. 27th Int’l Conf. on Very Large Databases.
Google Scholar
Nijssen, S., and J. N. Kok, 2003: Efficient discovery of frequent unordered trees. 1st Int’l Workshop on Mining Graphs, Trees and Sequences.
Google Scholar
— 2004: A quickstart in frequent structure mining can make a difference. ACM SIGKDD Int’l Conf. on KDD.
Google Scholar
Punin, J., M. Krishnamoorthy and M. J. Zaki, 2001: LOGML: Log markup language for web usage mining. ACM SIGKDD Workshop on Mining Log Data Across All Customer TouchPoints.
Google Scholar
Ruckert, U., and S. Kramer, 2004: Frequent free tree discovery in graph data. Special Track on Data Mining, ACM Symposium on Applied Computing.
Google Scholar
Shamir, R., and D. Tsur, 1999: Faster subtree isomorphism. Journal of Algorithms, 33, 267–80.
Article MathSciNet Google Scholar
Shapiro, B., and K. Zhang, 1990: Comparing multiple RNA secondary structures using tree comparisons. Computer Applications in Biosciences, 6(4), 309–18.
Google Scholar
Shasha, D., J. Wang and S. Zhang, 2004: Unordered tree mining with applications to phylogeny. International Conference on Data Engineering.
Google Scholar
Termier, A., M.-C. Rousset and M. Sebag, 2002: Treefinder: a first step towards XML data mining. IEEE Int’l Conf. on Data Mining.
Google Scholar
Wang, C., M. Hong, J. Pei, H. Zhou, W. Wang and B. Shi, 2004: Efficient pattern-growth methods for frequent tree pattern mining. Pacific-Asia Conference on KDD.
Google Scholar
Wang, K., and H. Liu, 1998: Discovering typical structures of documents: A road map approach. ACM SIGIR Conference on Information Retrieval.
Google Scholar
Xiao, Y., J.-F. Yao, Z. Li and M. H. Dunham, 2003: Efficient data mining for maximal frequent subtrees. International Conference on Data Mining.
Google Scholar
Yan, X., and J. Han, 2002: gSpan: Graph-based substructure pattern mining. IEEE Int’l Conf. on Data Mining.
Google Scholar
— 2003: Closegraph: Mining closed frequent graph patterns. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining.
Google Scholar
Yoshida, K., and H. Motoda, 1995: CLIP: Concept learning from inference patterns. Artificial Intelligence, 75, 63–92.
Article Google Scholar
Zaki, M. J., 2001: Efficiently mining trees in a forest. Technical Report 01-7, Computer Science Dept., Rensselaer Polytechnic Institute.
Google Scholar
— 2002: Efficiently mining frequent trees in a forest. 8th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining.
Google Scholar
Zaki, M. J. and C. Aggarwal, 2003: Xrules: An effective structural classifier for XML data. 9th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining.
Google Scholar
Zhang, C., J. Naughton, D. DeWitt, Q. Luo and G. Lohman, 2001: On supporting containment queries in relational database managment systems. ACM Int’l Conf. on Management of Data.
Google Scholar

Download references

Authors

Mohammed J. Zaki
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zaki, M.J. (2005). TreeMiner: An Efficient Algorithm for Mining Embedded Ordered Frequent Trees. In: Advanced Methods for Knowledge Discovery from Complex Data. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/1-84628-284-5_5

Download citation

DOI: https://doi.org/10.1007/1-84628-284-5_5
Publisher Name: Springer, London
Print ISBN: 978-1-85233-989-0
Online ISBN: 978-1-84628-284-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics