Tree Model Guided Framework

Hadzic, Fedja; Tan, Henry; Dillon, Tharam S.

doi:10.1007/978-3-642-17557-2_4

Tree Model Guided Framework

Fedja Hadzic,
Henry Tan &
Tharam S. Dillon

Chapter

793 Accesses

Part of the book series: Studies in Computational Intelligence ((SCI,volume 333))

Abstract

In this chapter, we describe the main characteristics of the Tree Model Guided (TMG) Framework for frequent subtree mining. This framework has good extendibility to all of the current problems for frequent subtree mining (Hadzic 2008; Tan 2008). An algorithm is considered as extendible in the sense that minimal effort is required to adjust the general framework so that different but related problems can be solved. Furthermore, the results presented in works such as (Tan et al. 2005; 2006a, 2008, Hadzic et al. 2007, 2010) indicate that it currently exhibits the best or comparable performance among the current state-of-the-art methods. The TMG framework is also conceptually simple to understand, especially with respect to the small adjustments required to address different sub-problems within the tree mining field. The remainder of the algorithm development issues are addressed in such a way as to accommodate the most efficient execution of the TMG candidate generation. Hence, as mentioned in the previous chapter, the important aspects that need to be taken into account in addition to the candidate enumeration strategy are: tree representation, representative data structures and their operational use, and the frequency counting of generated candidate subtrees. As mentioned in Chapter 3, in the tree mining field a string-like representation is the most popular representation because each item in the string can be accessed in O(1) time, it is space efficient and easy to manipulate. In our framework, we utilize the depth-first or pre-order string encoding as described in Chapter 3. The problem of candidate subtree enumeration is to efficiently extract a complete and non-redundant set of subtrees from a given document tree. We explain the TMG approach to candidate subtree enumeration in Section 4.2. As the name implies, the enumeration phase is guided by the tree model of the document in order to generate only valid candidate subtrees. This tree model corresponds to the underlying structure of the document and a subtree is considered valid by conforming to it.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abe, K., Kawasoe, S., Asai, T., Arimura, H., Arikawa, S.: Optimized substructure discovery for semistructured data. Paper presented at the Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery, Helsinki, Finland, August 19-23 (2002)
Google Scholar
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), Santiago de Chile, Chile, Septemebr 12-15, pp. 487-499 (1994)
Google Scholar
Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N.: Frequent Subtree Mining - An Overview. Fundamenta Informaticae, Special Issue on Graph and Tree Mining 66(1-2), 161–198 (2005)
MATH MathSciNet Google Scholar
Hadzic, F., Tan, H., Dillon, T.S.: UNI3 - Efficient Algorithm for Mining Unordered Induced Subtrees using TMG Candidate Generation. In: Proceedings of IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Honolulu, Hawaii, USA, April 1-5, pp. 568–575. IEEE, Los Alamitos (2007)
Chapter Google Scholar
Hadzic, F., Tan, H., Dillon, T.S.: U3 – mining unordered embedded subtrees using TMG candidate generation. In: Proceedings of the IEEE / WIC / ACM International Conference on Web Intelligence, Sydney, Australia, December 9-12, pp. 285–292 (2008)
Google Scholar
Hadzic, F.: Advances in knowledge learning methodologies and their applications. Curtin University of Technology, Perth (2008)
Google Scholar
Hadzic, F., Tan, H., Dillon, T.S.: Tree Model Guided Algorithm for Mining Unordered Embedded Subtrees. Web Intelligence and Agent Systems: An International Journal (WIAS) 8(4) (2010)
Google Scholar
Inokuchi, A., Washio, T., Motoda, H.: Complete Mining of Frequent Patterns from Graphs: Mining Graph Data. Machine Learning 50(3), 321–354 (2003)
Article MATH Google Scholar
Kuramochi, M., Karypic, G.: Frequent Subgraph Discovery. Paper Presented at the Proceedings of the IEEE International Conference on Data Mining (ICDM 2001), San Jose, California, USA, November 29 - December 2 (2001)
Google Scholar
Pei, J., Han, J., and Lakshmanan, L.V.S, Mining frequent itemsets with convertible constraints. Paper presented at the Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, April 2-6 (2001)
Google Scholar
Tan, H., Hadzic, F., Feng, L., Chang, E.: MB3-Miner: mining eMBedded subTREEs using tree model guided candidate generation. In: Proceedings of the 1st International Workshop on Mining Complex Data in conjunction with ICDM 2005, Houston, Texas, USA, November 27-30, pp. 103–110 (2005)
Google Scholar
Tan, H., Dillon, T.S., Hadzic, F., Chang, E., Feng, L.: IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 450–461. Springer, Heidelberg (2006a)
Chapter Google Scholar
Tan, H., Dillon, T.S., Hadzic, F., Chang, E.: SEQUEST: Mining Frequent Subsequences using DMA Strips. Paper presented at the Proceeding of the 7th International Conference on Data Mining and Information Engineering, Prague, Czech Republic, July 11-13 (2006b)
Google Scholar
Tan, H.: Tree Model Guided (TMG) enumeration as the basis for mining frequent patterns from XML documents. University of Technology Sydney, Sydney (2008)
Google Scholar
Tan, H., Hadzic, F., Dillon, T.S., Feng, L., Chang, E.: Tree Model Guided Candidate Generation for Mining Frequent Subtrees from XML. ACM Transactions on Knowledge Discovery from Data 2(2) (2008)
Google Scholar
Tatikonda, S., Parthasarathy, S., Kurc, T.: TRIPS and TIDES: new algorithms for tree mining. Paper presented at the Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM), Arlington, Virginia, USA, November 6-11 (2006)
Google Scholar
Wang, C., Hong, M., Pei, J., Zhou, H., Wang, W., Shi, B.: Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 441–451. Springer, Heidelberg (2004)
Chapter Google Scholar
Yang, L.H., Lee, M.L., Hsu, W.: Efficient Mining of XML Query Patterns for Caching. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB), Berlin, Germany, September 9-12, pp. 69–80 (2003)
Google Scholar
Zaki, M.J.: Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications. IEEE Transactions on Knowledge and Data Engineering 17(8), 1021–1035 (2005)
Article Google Scholar

Download references

Authors

Fedja Hadzic
View author publications
You can also search for this author in PubMed Google Scholar
Henry Tan
View author publications
You can also search for this author in PubMed Google Scholar
Tharam S. Dillon
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hadzic, F., Tan, H., Dillon, T.S. (2011). Tree Model Guided Framework. In: Mining of Data with Complex Structures. Studies in Computational Intelligence, vol 333. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17557-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-17557-2_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17556-5
Online ISBN: 978-3-642-17557-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Abstract

Buying options