Advertisement

A Structure Preserving Flat Data Format Representation for Tree-Structured Data

  • Fedja Hadzic
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7104)

Abstract

Mining of semi-structured data such as XML is a popular research topic due to many useful applications. The initial work focused mainly on values associated with tags, while most of recent developments focus on discovering association rules among tree structured data objects to preserve the structural information. Other data mining techniques have had limited use in tree-structured data analysis as they were mainly designed to process flat data format with no need to capture the structural properties of data objects. This paper proposes a novel structure-preserving way for representing tree-structured document instances as records in a standard flat data structure to enable applicability of a wider range of data analysis techniques. The experiments using synthetic and real world data demonstrate the effectiveness of the proposed approach.

Keywords

XML mining tree mining decision tree learning from XML data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abe, K., Kawasoe, S., Asai, T., Arimura, H., Arikawa, S.: Optimized Substructure Discovery for Semistructured Data. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, p. 1. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. 2.
    Tatikonda, S., Parthasarathy, S., Kurc, T.: TRIPS and TIDES: new algorithms for tree mining. In: ACM CIKM 2006, Arlington, Virginia, USA (2006)Google Scholar
  3. 3.
    Chi, Y., Nijssen, S., Muntz, R.R., Kok, J.N.: Frequent Subtree Mining - An Overview. Fundamenta Informaticae, Special Issue on Graph and Tree Mining 66, 1–2 (2005)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Tan, H., Dillon, T.S., Hadzic, F., Feng, L., Chang, E.: IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding. In: 10th Pacific-Asia Conf. on Knowledge Discovery and Data Mining, Singapore, pp. 450–461 (2006)Google Scholar
  5. 5.
    Tan, H., Hadzic, F., Dillon, T.S., Feng, L., Chang, E.: Tree Model Guided Candidate Generation for Mining Frequent Subtrees from XML. ACM Transactions on Knowledge Discovery from Data (TKDD) 2(2) (2008)Google Scholar
  6. 6.
    Zaki, M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transaction on Knowledge and Data Engineering 17(8), 1021–1035 (2005)CrossRefGoogle Scholar
  7. 7.
    Chi, Y., Yang, Y., Xia, Y., Muntz, R.R.: CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 63–73. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Tan, H., Hadzic, F., Dillon, T.S., Chang, E.: State of the art of data mining of tree structured information. Int’l Journal Computer Systems Science and Eng. 23(2) (March 2008)Google Scholar
  9. 9.
    Hadzic, F., Tan, H., Dillon, T.S.: Mining of Data with Complex Structures. SCI, vol. 333. Springer, Heidelberg (2011)zbMATHGoogle Scholar
  10. 10.
    Chen, L., Bhowmick, S.S., Chia, L.-T.: Mining Association Rules from Structural Deltas of Historical XML Documents. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 452–457. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Braga, D., Campi, A., Ceri, S., Klemettinen, M., Lanzi, P.: Discovering interesting information in XML data with association rules. In: ACM Symposium on Applied Computing, Melbourne, Florida, pp. 450–454 (2003)Google Scholar
  12. 12.
    Kim, H., Kim, S., Weninger, T., Han, J., Abdelzaher, T.: DPMine: Efficiently Mining Discriminative Numerical Features for Pattern-Based Classification. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS, vol. 6322, pp. 35–50. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  13. 13.
    Zaki, M.J., Aggarwal, C.C.: XRules: An Effective Structural Classifier for XML Data. In: SIGKDD 2003, Washington DC, USA (2003)Google Scholar
  14. 14.
    Da San Martino, G., Sperduti, A.: Mining Structured Data. IEEE Computational Intelligence Magazine 5(1) (2010)Google Scholar
  15. 15.
    Ikasari, N., Hadzic, F., Dillon, T.S.: Incorporating Qualitative Information for Credit Risk Assessment through Frequent Subtree Mining for XML. In: Tagarelli, A. (ed.) XML Data Mining: Models, Method, and Applications. IGI Global (2011)Google Scholar
  16. 16.
    Zaki, M.J., Hsiao, C.-J.: CHARM: An Efficient Algorithm for Closed Itemsets Mining. In: 2nd SIAM Int’l Conf. on Data Mining, Arlington, VA, USA, April 11-13 (2002)Google Scholar
  17. 17.
    Gouda, K., Zaki, M.J.: GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery 11(3), 223–242 (2005)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)Google Scholar
  19. 19.
    Holmes, G., Donkin, A., Witten, I.H.: Weka: A machine learning workbench. In: 2nd Australia and New Zealand Intelligent Info. Systems Conf. Brisbane, Australia (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Fedja Hadzic
    • 1
  1. 1.DEBIICurtin UniversityPerthAustralia

Personalised recommendations