Efficiently Mining Homomorphic Patterns from Large Data Trees

Wu, Xiaoying; Theodoratos, Dimitri; Peng, Zhiyong

doi:10.1007/978-3-319-32025-0_12

Efficiently Mining Homomorphic Patterns from Large Data Trees

Xiaoying Wu¹⁹,
Dimitri Theodoratos²⁰ &
Zhiyong Peng¹⁹

Conference paper
First Online: 25 March 2016

3476 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9642))

Abstract

Finding interesting tree patterns hidden in large datasets is a central topic in data mining with many practical applications. Unfortunately, previous contributions have focused almost exclusively on mining induced patterns from a set of small trees. The problem of mining homomorphic patterns from a large data tree has been neglected. This is mainly due to the challenging unbounded redundancy that homomorphic tree patterns can display. However, mining homomorphic patterns allows for discovering large patterns which cannot be extracted when mining induced or embedded patterns. Large patterns better characterize big trees which are important for many modern applications in particular with the explosion of big data.

In this paper, we address the problem of mining frequent homomorphic tree patterns from a single large tree. We propose a novel approach that extracts non-redundant maximal homomorphic patterns. Our approach employs an incremental frequency computation method that avoids the costly enumeration of all pattern matchings required by previous approaches. Matching information of already computed patterns is materialized as bitmaps a technique that not only minimizes the memory consumption but also the CPU time. We conduct detailed experiments to test the performance and scalability of our approach. The experimental evaluation shows that our approach mines larger patterns and extracts maximal homomorphic patterns from real datasets outperforming state-of-the-art embedded tree mining algorithms applied to a large data tree.

X. Wu—The research was supported by the NSF of China under Grant No. 61202035, 61272110, and 61232002.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Amer-Yahia, S., Cho, S., Lakshmanan, L.V.S., Srivastava, D.: Minimization of tree pattern queries. In: SIGMOD Conference (2001)
Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: SDM (2002)
Google Scholar
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD (2002)
Google Scholar
Chi, Y., Xia, Y., Yang, Y., Muntz, R.R.: Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans. Knowl. Data Eng. 17(2), 190–202 (2005)
Article Google Scholar
Chi, Y., Yang, Y., Muntz, R.R.: Canonical forms for labelled trees and their applications in frequent subtree mining. Knowl. Inf. Syst. 8(2), 203–234 (2005)
Article Google Scholar
Dries, A., Nijssen, S.: Mining patterns in networks using homomorphism. In: SDM (2012)
Google Scholar
Goethals, B., Hoekx, E., den Bussche, J.V.: Mining tree queries in a graph. In: KDD (2005)
Google Scholar
Kilpeläinen, P., Mannila, H.: Ordered and unordered tree inclusion. SIAM J. Comput. 24(2), 340–356 (1995)
Article MathSciNet MATH Google Scholar
Miklau, G., Suciu, D.: Containment and equivalence for a fragment of xpath. J. ACM 51(1), 2–45 (2004)
Article MathSciNet MATH Google Scholar
Termier, A., Rousset, M.-C., Sebag, M.: TreeFinder: a first step towards XML data mining. In: ICDM (2002)
Google Scholar
Wu, X., Theodoratos, D.: Leveraging homomorphisms and bitmaps to enable the mining of embedded patterns from large data trees. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M.A. (eds.) DASFAA 2015. LNCS, vol. 9049, pp. 3–20. Springer, Heidelberg (2015)
Google Scholar
Wu, X., Theodoratos, D., Wang, W.H.: Answering XML queries using materialized views revisited. In: CIKM (2009)
Google Scholar
Wu, X., Theodoratos, D., Wang, W.H., Sellis, T.: Optimizing XML queries: bitmapped materialized views vs. indexes. Inf. Syst. 38(6), 863–884 (2013)
Article Google Scholar
Zaki, M.J.: Efficiently mining frequent embedded unordered trees. Fundam. Inform. 66(1–2), 33–52 (2005)
MathSciNet MATH Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8), 1021–1035 (2005)
Article Google Scholar
Zhu, F., Qu, Q., Lo, D., Yan, X., Han, J., Yu, P.S.: Mining top-k large structural patterns in a massive network. PVLDB 4(11), 807–818 (2011)
Google Scholar
Zhu, F., Yan, X., Han, J., Yu, P.S., Cheng, H.: Mining colossal frequent patterns by core pattern fusion. In: ICDE, pp. 706–715 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Software Engineering, Wuhan University, Wuhan, China
Xiaoying Wu & Zhiyong Peng
New Jersey Institute of Technology, Newark, USA
Dimitri Theodoratos

Authors

Xiaoying Wu
View author publications
You can also search for this author in PubMed Google Scholar
Dimitri Theodoratos
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyong Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitri Theodoratos .

Editor information

Editors and Affiliations

Georgia Institute of Technology, Atlanta, Georgia, USA
Shamkant B. Navathe
University of Texas at Dallas, Richardson, Texas, USA
Weili Wu
University of Minnesota, Minneapolis, Minnesota, USA
Shashi Shekhar
Renmin University, Beijing, China
Xiaoyong Du
Fudan University, Shanghai, China
X. Sean Wang
Rutgers, The State University of New Jer, New Brunswick, New Jersey, USA
Hui Xiong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, X., Theodoratos, D., Peng, Z. (2016). Efficiently Mining Homomorphic Patterns from Large Data Trees. In: Navathe, S., Wu, W., Shekhar, S., Du, X., Wang, X., Xiong, H. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9642. Springer, Cham. https://doi.org/10.1007/978-3-319-32025-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-32025-0_12
Published: 25 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32024-3
Online ISBN: 978-3-319-32025-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics