Advertisement

Efficient Discovery of Embedded Patterns from Large Attributed Trees

  • Xiaoying Wu
  • Dimitri TheodoratosEmail author
Conference paper
  • 2.4k Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10828)

Abstract

Discovering informative patterns deeply hidden in large tree datasets is an important research area that has many practical applications. Many modern applications and systems represent, export and exchange data in the form of trees whose nodes are associated with attributes. In this paper, we address the problem of mining frequent embedded attributed patterns from large attributed data trees. Attributed pattern mining requires combining tree mining and itemset mining. This results in exploring a larger pattern search space compared to addressing each problem separately. We first design an interleaved pattern mining approach which extends the equivalence-class based tree pattern enumeration technique with attribute sets enumeration. Further, we propose a novel layered approach to discover all frequent attributed patterns in stages. This approach seamlessly integrates an itemset mining technique with a recent unordered embedded tree pattern mining algorithm to greatly reduce the pattern search space. Our extensive experimental results on real and synthetic large-tree datasets show that the layered approach displays, in most cases, orders of magnitude performance improvements over both the interleaved mining method and the attribute-as-node embedded tree pattern mining method and has good scaleup properties.

References

  1. 1.
    Aggarwal, C.C., Han, J. (eds.): Frequent Pattern Mining. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-07821-2CrossRefzbMATHGoogle Scholar
  2. 2.
    Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: SDM (2002)Google Scholar
  3. 3.
    Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD (2002)Google Scholar
  4. 4.
    Chehreghani, M.H., Bruynooghe, M.: Mining rooted ordered trees under subtree homeomorphism. Data Min. Knowl. Discov. 30(5), 1249–1272 (2016)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N.: Frequent subtree mining - an overview. Fundam. Inform. 66(1–2), 161–198 (2005)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Chi, Y., Yang, Y., Muntz, R.R.: Canonical forms for labelled trees and their applications in frequent subtree mining. Knowl. Inf. Syst. 8(2), 203–234 (2005)CrossRefGoogle Scholar
  7. 7.
    Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: GRAMI: frequent subgraph and pattern mining in a single large graph. PVLDB 7(7), 517–528 (2014)Google Scholar
  8. 8.
    Kilpeläinen, P., Mannila, H.: Ordered and unordered tree inclusion. SIAM J. Comput. 24(2), 340–356 (1995)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Knijf, J.D.: FAT-miner: mining frequent attribute trees. In: SAC, pp. 417–422 (2007)Google Scholar
  10. 10.
    Miklau, G., Suciu, D.: Containment and equivalence for a fragment of XPath. J. ACM 51(1), 2–45 (2004)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Miyoshi, Y., Ozaki, T., Ohkawa, T.: Frequent pattern discovery from a single graph with quantitative itemsets. In: ICDM Workshops, pp. 527–532 (2009)Google Scholar
  12. 12.
    Pasquier, C., Flouvat, F., Sanhes, J., Selmaoui-Folcher, N.: Attributed graph mining in the presence of automorphism. Knowl. Inf. Syst. 50(2), 569–584 (2017)CrossRefGoogle Scholar
  13. 13.
    Pasquier, C., Sanhes, J., Flouvat, F., Selmaoui-Folcher, N.: Frequent pattern mining in attributed trees: algorithms and applications. Knowl. Inf. Syst. 46(3), 491–514 (2016)CrossRefGoogle Scholar
  14. 14.
    Weis, M., Naumann, F., Brosy, F.: A duplicate detection benchmark for xml (and relational) data (2006)Google Scholar
  15. 15.
    Wu, X., Souldatos, S., Theodoratos, D., Dalamagas, T., Sellis, T.K.: Efficient evaluation of generalized path pattern queries on XML data. In: WWW (2008)Google Scholar
  16. 16.
    Wu, X., Theodoratos, D.: Leveraging homomorphisms and bitmaps to enable the mining of embedded patterns from large data trees. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M.A. (eds.) DASFAA 2015. LNCS, vol. 9049, pp. 3–20. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-18120-2_1CrossRefGoogle Scholar
  17. 17.
    Zaki, M.J.: Efficiently mining frequent embedded unordered trees. Fundam. Inform. 66(1–2), 33–52 (2005)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Zaki, M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8), 1021–1035 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Computer SchoolWuhan UniversityWuhanChina
  2. 2.New Jersey Institute of TechnologyNewarkUSA

Personalised recommendations