Skip to main content

Efficiently Mining Homomorphic Patterns from Large Data Trees

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9642))

Abstract

Finding interesting tree patterns hidden in large datasets is a central topic in data mining with many practical applications. Unfortunately, previous contributions have focused almost exclusively on mining induced patterns from a set of small trees. The problem of mining homomorphic patterns from a large data tree has been neglected. This is mainly due to the challenging unbounded redundancy that homomorphic tree patterns can display. However, mining homomorphic patterns allows for discovering large patterns which cannot be extracted when mining induced or embedded patterns. Large patterns better characterize big trees which are important for many modern applications in particular with the explosion of big data.

In this paper, we address the problem of mining frequent homomorphic tree patterns from a single large tree. We propose a novel approach that extracts non-redundant maximal homomorphic patterns. Our approach employs an incremental frequency computation method that avoids the costly enumeration of all pattern matchings required by previous approaches. Matching information of already computed patterns is materialized as bitmaps a technique that not only minimizes the memory consumption but also the CPU time. We conduct detailed experiments to test the performance and scalability of our approach. The experimental evaluation shows that our approach mines larger patterns and extracts maximal homomorphic patterns from real datasets outperforming state-of-the-art embedded tree mining algorithms applied to a large data tree.

X. Wu—The research was supported by the NSF of China under Grant No. 61202035, 61272110, and 61232002.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://web.njit.edu/~dth/HomomorphicTreePattternMining.pdf.

  2. 2.

    http://www.cis.upenn.edu/~treebank.

  3. 3.

    http://monetdb.cwi.nl/xml/.

References

  1. Amer-Yahia, S., Cho, S., Lakshmanan, L.V.S., Srivastava, D.: Minimization of tree pattern queries. In: SIGMOD Conference (2001)

    Google Scholar 

  2. Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: SDM (2002)

    Google Scholar 

  3. Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD (2002)

    Google Scholar 

  4. Chi, Y., Xia, Y., Yang, Y., Muntz, R.R.: Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans. Knowl. Data Eng. 17(2), 190–202 (2005)

    Article  Google Scholar 

  5. Chi, Y., Yang, Y., Muntz, R.R.: Canonical forms for labelled trees and their applications in frequent subtree mining. Knowl. Inf. Syst. 8(2), 203–234 (2005)

    Article  Google Scholar 

  6. Dries, A., Nijssen, S.: Mining patterns in networks using homomorphism. In: SDM (2012)

    Google Scholar 

  7. Goethals, B., Hoekx, E., den Bussche, J.V.: Mining tree queries in a graph. In: KDD (2005)

    Google Scholar 

  8. Kilpeläinen, P., Mannila, H.: Ordered and unordered tree inclusion. SIAM J. Comput. 24(2), 340–356 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  9. Miklau, G., Suciu, D.: Containment and equivalence for a fragment of xpath. J. ACM 51(1), 2–45 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  10. Termier, A., Rousset, M.-C., Sebag, M.: TreeFinder: a first step towards XML data mining. In: ICDM (2002)

    Google Scholar 

  11. Wu, X., Theodoratos, D.: Leveraging homomorphisms and bitmaps to enable the mining of embedded patterns from large data trees. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M.A. (eds.) DASFAA 2015. LNCS, vol. 9049, pp. 3–20. Springer, Heidelberg (2015)

    Google Scholar 

  12. Wu, X., Theodoratos, D., Wang, W.H.: Answering XML queries using materialized views revisited. In: CIKM (2009)

    Google Scholar 

  13. Wu, X., Theodoratos, D., Wang, W.H., Sellis, T.: Optimizing XML queries: bitmapped materialized views vs. indexes. Inf. Syst. 38(6), 863–884 (2013)

    Article  Google Scholar 

  14. Zaki, M.J.: Efficiently mining frequent embedded unordered trees. Fundam. Inform. 66(1–2), 33–52 (2005)

    MathSciNet  MATH  Google Scholar 

  15. Zaki, M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8), 1021–1035 (2005)

    Article  Google Scholar 

  16. Zhu, F., Qu, Q., Lo, D., Yan, X., Han, J., Yu, P.S.: Mining top-k large structural patterns in a massive network. PVLDB 4(11), 807–818 (2011)

    Google Scholar 

  17. Zhu, F., Yan, X., Han, J., Yu, P.S., Cheng, H.: Mining colossal frequent patterns by core pattern fusion. In: ICDE, pp. 706–715 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dimitri Theodoratos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Wu, X., Theodoratos, D., Peng, Z. (2016). Efficiently Mining Homomorphic Patterns from Large Data Trees. In: Navathe, S., Wu, W., Shekhar, S., Du, X., Wang, X., Xiong, H. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9642. Springer, Cham. https://doi.org/10.1007/978-3-319-32025-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32025-0_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32024-3

  • Online ISBN: 978-3-319-32025-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics