Summary
Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called a scope-list. We contrast TreeMiner with a pattern-matching tree-mining algorithm (PatternMatcher). We conduct detailed experiments to test the performance and scalability of these methods. We find that TreeMiner outperforms the pattern matching approach by a factor of 4 to 20, and has good scale-up properties. We also present an application of tree mining to analyze real web logs for usage patterns.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abiteboul, S., H. Kaplan and T. Milo, 2001: Compact labeling schemes for ancestor queries. ACM Symp. on Discrete Algorithms.
Abiteboul, S., and V. Vianu, 1997: Regular path expressions with constraints. ACM Int’l Conf. on Principles of Database Systems.
Agrawal, R., H. Mannila, R. Srikant, H. Toivonen and A. I. Verkamo, 1996: Fast discovery of association rules. Advances in Knowledge Discovery and Data Mining, U. Fayyad et al., eds., AAAI Press, Menlo Park, CA, 307–28.
Agrawal, R., and R. Srikant, 1995: Mining sequential patterns. 11th Intl. Conf. on Data Engineering.
Asai, T., K. Abe, S. Kawasoe, H. Arimura, H. Satamoto and S. Arikawa, 2002: Efficient substructure discovery from large semi-structured data. 2nd SIAM Int’l Conference on Data Mining.
Asai, T., H. Arimura, T. Uno and S. Nakano, 2003: Discovering frequent substructures in large unordered trees. 6th Int’l Conf. on Discovery Science.
Chen, M., J. Park and P. Yu, 1996: Data mining for path traversal patterns in a web environment. International Conference on Distributed Computing Systems.
Chen, Z., H. Jagadish, F. Korn, N. Koudas, S. Muthukrishnan, R. Ng and D. Srivastava, 2001: Counting twig matches in a tree. 17th Intl. Conf. on Data Engineering.
Chi, Y., Y. Yang and R. R. Muntz, 2003: Indexing and mining free trees. 3rd IEEE International Conference on Data Mining.
— 2004: Hybridtreeminer: An efficient algorihtm for mining frequent rooted trees and free trees using canonical forms. 16th International Conference on Scientific and Statistical Database Management.
Cole, R., R. Hariharan and P. Indyk, 1999: Tree pattern matching and subset matching in deterministic o(n log3n)-time. 10th Symposium on Discrete Algorithms.
Cook, D., and L. Holder, 1994: Substructure discovery using minimal description length and background knowledge. Journal of Artificial Intelligence Research, 1, 231–55.
Cooley, R., B. Mobasher and J. Srivastava, 1997: Web mining: Information and pattern discovery on the world wide web. 8th IEEE Intl. Conf. on Tools with AI.
Dehaspe, L., H. Toivonen and R. King, 1998: Finding frequent substructures in chemical compounds. 4th Intl. Conf. Knowledge Discovery and Data Mining.
Fernandez, M., and D. Suciu, 1998: Optimizing regular path expressions using graph schemas. IEEE Int’l Conf. on Data Engineering.
Huan, J., W. Wang and J. Prins, 2003: Efficient mining of frequent subgraphs in the presence of isomorphism. IEEE Int’l Conf. on Data Mining.
Inokuchi, A., T. Washio and H. Motoda, 2000: An Apriori-based algorithm for mining frequent substructures from graph data. 4th European Conference on Principles of Knowledge Discovery and Data Mining.
— 2003: Complete mining of frequent patterns from graphs: Mining graph data. Machine Learning, 50, 321–54.
Kilpelainen, P., and H. Mannila, 1995: Ordered and unordered tree inclusion. SIAM J. of Computing, 24, 340–56.
Kuramochi, M., and G. Karypis, 2001: Frequent subgraph discovery. 1st IEEE Int’l Conf. on Data Mining.
— 2004: An efficient algorithm for discovering frequent subgraphs. IEEE Transactions on Knowledge and Data Engineering, 16, 1038–51.
Li, Q., and B. Moon, 2001: Indexing and querying XML data for regular path expressions. 27th Int’l Conf. on Very Large Databases.
Nijssen, S., and J. N. Kok, 2003: Efficient discovery of frequent unordered trees. 1st Int’l Workshop on Mining Graphs, Trees and Sequences.
— 2004: A quickstart in frequent structure mining can make a difference. ACM SIGKDD Int’l Conf. on KDD.
Punin, J., M. Krishnamoorthy and M. J. Zaki, 2001: LOGML: Log markup language for web usage mining. ACM SIGKDD Workshop on Mining Log Data Across All Customer TouchPoints.
Ruckert, U., and S. Kramer, 2004: Frequent free tree discovery in graph data. Special Track on Data Mining, ACM Symposium on Applied Computing.
Shamir, R., and D. Tsur, 1999: Faster subtree isomorphism. Journal of Algorithms, 33, 267–80.
Shapiro, B., and K. Zhang, 1990: Comparing multiple RNA secondary structures using tree comparisons. Computer Applications in Biosciences, 6(4), 309–18.
Shasha, D., J. Wang and S. Zhang, 2004: Unordered tree mining with applications to phylogeny. International Conference on Data Engineering.
Termier, A., M.-C. Rousset and M. Sebag, 2002: Treefinder: a first step towards XML data mining. IEEE Int’l Conf. on Data Mining.
Wang, C., M. Hong, J. Pei, H. Zhou, W. Wang and B. Shi, 2004: Efficient pattern-growth methods for frequent tree pattern mining. Pacific-Asia Conference on KDD.
Wang, K., and H. Liu, 1998: Discovering typical structures of documents: A road map approach. ACM SIGIR Conference on Information Retrieval.
Xiao, Y., J.-F. Yao, Z. Li and M. H. Dunham, 2003: Efficient data mining for maximal frequent subtrees. International Conference on Data Mining.
Yan, X., and J. Han, 2002: gSpan: Graph-based substructure pattern mining. IEEE Int’l Conf. on Data Mining.
— 2003: Closegraph: Mining closed frequent graph patterns. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining.
Yoshida, K., and H. Motoda, 1995: CLIP: Concept learning from inference patterns. Artificial Intelligence, 75, 63–92.
Zaki, M. J., 2001: Efficiently mining trees in a forest. Technical Report 01-7, Computer Science Dept., Rensselaer Polytechnic Institute.
— 2002: Efficiently mining frequent trees in a forest. 8th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining.
Zaki, M. J. and C. Aggarwal, 2003: Xrules: An effective structural classifier for XML data. 9th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining.
Zhang, C., J. Naughton, D. DeWitt, Q. Luo and G. Lohman, 2001: On supporting containment queries in relational database managment systems. ACM Int’l Conf. on Management of Data.
Rights and permissions
Copyright information
© 2005 Dr Sanghamitra Bandyopadhyay
About this chapter
Cite this chapter
Zaki, M.J. (2005). TreeMiner: An Efficient Algorithm for Mining Embedded Ordered Frequent Trees. In: Advanced Methods for Knowledge Discovery from Complex Data. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/1-84628-284-5_5
Download citation
DOI: https://doi.org/10.1007/1-84628-284-5_5
Publisher Name: Springer, London
Print ISBN: 978-1-85233-989-0
Online ISBN: 978-1-84628-284-3
eBook Packages: Computer ScienceComputer Science (R0)