Skip to main content

Tree Model Guided Framework

  • Chapter
  • 793 Accesses

Part of the book series: Studies in Computational Intelligence ((SCI,volume 333))

Abstract

In this chapter, we describe the main characteristics of the Tree Model Guided (TMG) Framework for frequent subtree mining. This framework has good extendibility to all of the current problems for frequent subtree mining (Hadzic 2008; Tan 2008). An algorithm is considered as extendible in the sense that minimal effort is required to adjust the general framework so that different but related problems can be solved. Furthermore, the results presented in works such as (Tan et al. 2005; 2006a, 2008, Hadzic et al. 2007, 2010) indicate that it currently exhibits the best or comparable performance among the current state-of-the-art methods. The TMG framework is also conceptually simple to understand, especially with respect to the small adjustments required to address different sub-problems within the tree mining field. The remainder of the algorithm development issues are addressed in such a way as to accommodate the most efficient execution of the TMG candidate generation. Hence, as mentioned in the previous chapter, the important aspects that need to be taken into account in addition to the candidate enumeration strategy are: tree representation, representative data structures and their operational use, and the frequency counting of generated candidate subtrees. As mentioned in Chapter 3, in the tree mining field a string-like representation is the most popular representation because each item in the string can be accessed in O(1) time, it is space efficient and easy to manipulate. In our framework, we utilize the depth-first or pre-order string encoding as described in Chapter 3. The problem of candidate subtree enumeration is to efficiently extract a complete and non-redundant set of subtrees from a given document tree. We explain the TMG approach to candidate subtree enumeration in Section 4.2. As the name implies, the enumeration phase is guided by the tree model of the document in order to generate only valid candidate subtrees. This tree model corresponds to the underlying structure of the document and a subtree is considered valid by conforming to it.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abe, K., Kawasoe, S., Asai, T., Arimura, H., Arikawa, S.: Optimized substructure discovery for semistructured data. Paper presented at the Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery, Helsinki, Finland, August 19-23 (2002)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), Santiago de Chile, Chile, Septemebr 12-15, pp. 487-499 (1994)

    Google Scholar 

  3. Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N.: Frequent Subtree Mining - An Overview. Fundamenta Informaticae, Special Issue on Graph and Tree Mining 66(1-2), 161–198 (2005)

    MATH  MathSciNet  Google Scholar 

  4. Hadzic, F., Tan, H., Dillon, T.S.: UNI3 - Efficient Algorithm for Mining Unordered Induced Subtrees using TMG Candidate Generation. In: Proceedings of IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Honolulu, Hawaii, USA, April 1-5, pp. 568–575. IEEE, Los Alamitos (2007)

    Chapter  Google Scholar 

  5. Hadzic, F., Tan, H., Dillon, T.S.: U3 – mining unordered embedded subtrees using TMG candidate generation. In: Proceedings of the IEEE / WIC / ACM International Conference on Web Intelligence, Sydney, Australia, December 9-12, pp. 285–292 (2008)

    Google Scholar 

  6. Hadzic, F.: Advances in knowledge learning methodologies and their applications. Curtin University of Technology, Perth (2008)

    Google Scholar 

  7. Hadzic, F., Tan, H., Dillon, T.S.: Tree Model Guided Algorithm for Mining Unordered Embedded Subtrees. Web Intelligence and Agent Systems: An International Journal (WIAS) 8(4) (2010)

    Google Scholar 

  8. Inokuchi, A., Washio, T., Motoda, H.: Complete Mining of Frequent Patterns from Graphs: Mining Graph Data. Machine Learning 50(3), 321–354 (2003)

    Article  MATH  Google Scholar 

  9. Kuramochi, M., Karypic, G.: Frequent Subgraph Discovery. Paper Presented at the Proceedings of the IEEE International Conference on Data Mining (ICDM 2001), San Jose, California, USA, November 29 - December 2 (2001)

    Google Scholar 

  10. Pei, J., Han, J., and Lakshmanan, L.V.S, Mining frequent itemsets with convertible constraints. Paper presented at the Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, April 2-6 (2001)

    Google Scholar 

  11. Tan, H., Hadzic, F., Feng, L., Chang, E.: MB3-Miner: mining eMBedded subTREEs using tree model guided candidate generation. In: Proceedings of the 1st International Workshop on Mining Complex Data in conjunction with ICDM 2005, Houston, Texas, USA, November 27-30, pp. 103–110 (2005)

    Google Scholar 

  12. Tan, H., Dillon, T.S., Hadzic, F., Chang, E., Feng, L.: IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 450–461. Springer, Heidelberg (2006a)

    Chapter  Google Scholar 

  13. Tan, H., Dillon, T.S., Hadzic, F., Chang, E.: SEQUEST: Mining Frequent Subsequences using DMA Strips. Paper presented at the Proceeding of the 7th International Conference on Data Mining and Information Engineering, Prague, Czech Republic, July 11-13 (2006b)

    Google Scholar 

  14. Tan, H.: Tree Model Guided (TMG) enumeration as the basis for mining frequent patterns from XML documents. University of Technology Sydney, Sydney (2008)

    Google Scholar 

  15. Tan, H., Hadzic, F., Dillon, T.S., Feng, L., Chang, E.: Tree Model Guided Candidate Generation for Mining Frequent Subtrees from XML. ACM Transactions on Knowledge Discovery from Data 2(2) (2008)

    Google Scholar 

  16. Tatikonda, S., Parthasarathy, S., Kurc, T.: TRIPS and TIDES: new algorithms for tree mining. Paper presented at the Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM), Arlington, Virginia, USA, November 6-11 (2006)

    Google Scholar 

  17. Wang, C., Hong, M., Pei, J., Zhou, H., Wang, W., Shi, B.: Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 441–451. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  18. Yang, L.H., Lee, M.L., Hsu, W.: Efficient Mining of XML Query Patterns for Caching. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB), Berlin, Germany, September 9-12, pp. 69–80 (2003)

    Google Scholar 

  19. Zaki, M.J.: Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications. IEEE Transactions on Knowledge and Data Engineering 17(8), 1021–1035 (2005)

    Article  Google Scholar 

Download references

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hadzic, F., Tan, H., Dillon, T.S. (2011). Tree Model Guided Framework. In: Mining of Data with Complex Structures. Studies in Computational Intelligence, vol 333. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17557-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17557-2_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17556-5

  • Online ISBN: 978-3-642-17557-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics