Data Mining and Knowledge Discovery

, Volume 30, Issue 5, pp 1249–1272 | Cite as

Mining rooted ordered trees under subtree homeomorphism



Mining frequent tree patterns has many applications in different areas such as XML data, bioinformatics and World Wide Web. The crucial step in frequent pattern mining is frequency counting, which involves a matching operator to find occurrences (instances) of a tree pattern in a given collection of trees. A widely used matching operator for tree-structured data is subtree homeomorphism, where an edge in the tree pattern is mapped onto an ancestor-descendant relationship in the given tree. Tree patterns that are frequent under subtree homeomorphism are usually called embedded patterns. In this paper, we present an efficient algorithm for subtree homeomorphism with application to frequent pattern mining. We propose a compact data-structure, called occ, which stores only information about the rightmost paths of occurrences and hence can encode and represent several occurrences of a tree pattern. We then define efficient join operations on the occ data-structure, which help us count occurrences of tree patterns according to occurrences of their proper subtrees. Based on the proposed subtree homeomorphism method, we develop an effective pattern mining algorithm, called TPMiner. We evaluate the efficiency of TPMiner on several real-world and synthetic datasets. Our extensive experiments confirm that TPMiner always outperforms well-known existing algorithms, and in several cases the improvement with respect to existing algorithms is significant.


XML data Rooted ordered trees Frequent tree patterns Subtree homeomorphism Embedded subtrees 



We are grateful to Professor Mohammed Javeed Zaki for providing the VTreeMiner code, the CSLOGS datasets and the TreeGenerator program, to Dr Henry Tan for providing the MB3Miner code, to Dr Fedja Hadzic for providing the Prions dataset and to Professor Jun-Hong Cui for providing the NASA dataset. Finally, we would like to thank Dr Morteza Haghir Chehreghani for his discussion and suggestions.


  1. Asai T, Abe K, Kawasoe S, Arimura H, Satamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: Proceedings of the second SIAM international conference on data mining (SDM), SIAM, pp 158–174Google Scholar
  2. Balcazar JL, Bifet A, Lozano A (2010) Mining frequent closed rooted trees. Mach Learn 78(1–2):1–33MathSciNetMATHGoogle Scholar
  3. Bille P, Gortz I (2011) The tree inclusion problem: in linear space and faster. ACM Trans Algorithm 7(3):1–47MathSciNetCrossRefMATHGoogle Scholar
  4. Chalmers R, Almeroth K (2001) Modeling the branching characteristics and efficiency gains of global multicast trees. In: Proceedings of the 20th IEEE international conference on computer communications (INFOCOM), pp 449–458Google Scholar
  5. Chalmers RC, Member S, Almeroth KC (2003) On the topology of multicast trees. IEEE/ACM Trans Netw 11:153–165CrossRefGoogle Scholar
  6. Chaoji V, Hasan MA, Salem S, Zaki MJ (2008) An integrated, generic approach to pattern mining: data mining template library. Data Min Knowl Discov 17(3):457–495MathSciNetCrossRefGoogle Scholar
  7. Chehreghani MH (2011) Efficiently mining unordered trees. In: Proceedings of the 11th IEEE international conference on data mining (ICDM), pp 111–120Google Scholar
  8. Chehreghani MH, Chehreghani MH, Lucas C, Rahgozar M (2011) OInduced: an efficient algorithm for mining induced patterns from rooted ordered trees. IEEE Trans Syst Man Cybern A 41(5):1013–1025CrossRefGoogle Scholar
  9. Chi Y, Muntz RR, Nijssen S, Kok JN (2005) Frequent subtree mining—an overview. Fundam Inf 66(1–2):161–198MathSciNetMATHGoogle Scholar
  10. Chi Y, Yang Y, Muntz RR (2003) Indexing and mining free trees. In: Proceedings of the third IEEE international conference on data mining (ICDM), pp 509–512Google Scholar
  11. Cui J, Kim J, Maggiorini D, Boussetta K, Gerla M (2002) Aggregated multicast—a comparative study. In: Proceedings of the second international IFIP-TC6 networking conference on networking technologies, services, and protocols; performance of computer and communication networks; and mobile and wireless communications (NETWORKING), pp 1032–1044Google Scholar
  12. Diestel R (2010) Graph theory, 4th edn. Springer, HeidelbergCrossRefMATHGoogle Scholar
  13. Dietz PF (1982) Maintaining order in a linked list. In: Proceedings of the 14th ACM symposium on theory of computing (STOC), pp 122–127Google Scholar
  14. Ivancsy R, Vajk I (2006) Frequent pattern mining in web log data. Acta Polytech Hung 3(1):77–90Google Scholar
  15. Kilpelainen P, Mannila H (1995) Ordered and unordered tree inclusion. SIAM J Comput 24(2):340–356MathSciNetCrossRefMATHGoogle Scholar
  16. Miyahara T, Suzuki Y, Shoudai T, Uchida T, Takahashi K, Ueda H (2004) Discovery of maximally frequent tag tree patterns with contractible variables from semistructured documents. In: Proceedings of the 8th Pacific Asia conference on knowledge discovery and data mining (PAKDD), pp 133–144Google Scholar
  17. Nijssen S, Kok JN (2003) Efficient discovery of frequent unordered trees. In: Proceedings of the first international workshop on mining graphs, trees, and sequences (MGTS), pp 55–64Google Scholar
  18. Qin L, Yu JX, Ding B (2007) TwigList: make twig pattern matching fast. In: Proceedings of the 12th international conference on database systems for advanced applications (DASFAA), pp 850–862Google Scholar
  19. Sidhu AS, Dillon TS, Chang E (2006) Protein ontology. In: Ma Z, Chen JY (eds) Database modeling in biology: practices and challenges. Springer, New York, pp 39–60Google Scholar
  20. Tan H, Hadzic F, Dillon TS, Chang E, Feng L (2008) Tree model guided candidate generation for mining frequent subtrees from XML documents. ACM Trans Knowl Discov Data 2(2):43. doi: 10.1145/1376815.1376818 CrossRefGoogle Scholar
  21. Tatikonda S, Parthasarathy S (2009) Mining tree-structured data on multicore systems. Proc VLDB Endow 2(1):694–705CrossRefGoogle Scholar
  22. Tatikonda S, Parthasarathy S, Kurc TM (2006) TRIPS and TIDES: new algorithms for tree mining. In: Proceedings of the 15th ACM international conference on information and knowledge management (CIKM), pp 455–464 (2006)Google Scholar
  23. Wang C, Hong M, Pei J, Zhou H, Wang W, Shi B (2004) Efficient pattern-growth methods for frequent tree pattern mining. In: Proceedings of the 8th Pacific Asia conference on knowledge discovery and data mining (PAKDD), pp 441–451Google Scholar
  24. Xiao Y, Yao JF, Li Z, Dunham MH (2003) Efficient data mining for maximal frequent subtrees. In: Proceedings of the third IEEE international conference on data mining (ICDM), pp 379–386Google Scholar
  25. Zaki MJ (2005) Efficiently mining frequent embedded unordered trees. Fundam Inf 66(1–2):33–52MathSciNetMATHGoogle Scholar
  26. Zaki MJ (2005) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Tran Knowl Data Eng 17(8):1021–1035CrossRefGoogle Scholar
  27. Zaki MJ, Aggarwal CC (2006) XRules: an effective algorithm for structural classification of XML data. Mach Learn 62(1–2):137–170CrossRefGoogle Scholar

Copyright information

© The Author(s) 2015

Authors and Affiliations

  • Mostafa Haghir Chehreghani
    • 1
  • Maurice Bruynooghe
    • 1
  1. 1.Department of Computer ScienceKU LeuvenLeuvenBelgium

Personalised recommendations