Skip to main content

A Structure Preserving Flat Data Format Representation for Tree-Structured Data

  • Conference paper
New Frontiers in Applied Data Mining (PAKDD 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7104))

Included in the following conference series:

Abstract

Mining of semi-structured data such as XML is a popular research topic due to many useful applications. The initial work focused mainly on values associated with tags, while most of recent developments focus on discovering association rules among tree structured data objects to preserve the structural information. Other data mining techniques have had limited use in tree-structured data analysis as they were mainly designed to process flat data format with no need to capture the structural properties of data objects. This paper proposes a novel structure-preserving way for representing tree-structured document instances as records in a standard flat data structure to enable applicability of a wider range of data analysis techniques. The experiments using synthetic and real world data demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abe, K., Kawasoe, S., Asai, T., Arimura, H., Arikawa, S.: Optimized Substructure Discovery for Semistructured Data. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, p. 1. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Tatikonda, S., Parthasarathy, S., Kurc, T.: TRIPS and TIDES: new algorithms for tree mining. In: ACM CIKM 2006, Arlington, Virginia, USA (2006)

    Google Scholar 

  3. Chi, Y., Nijssen, S., Muntz, R.R., Kok, J.N.: Frequent Subtree Mining - An Overview. Fundamenta Informaticae, Special Issue on Graph and Tree Mining 66, 1–2 (2005)

    MathSciNet  MATH  Google Scholar 

  4. Tan, H., Dillon, T.S., Hadzic, F., Feng, L., Chang, E.: IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding. In: 10th Pacific-Asia Conf. on Knowledge Discovery and Data Mining, Singapore, pp. 450–461 (2006)

    Google Scholar 

  5. Tan, H., Hadzic, F., Dillon, T.S., Feng, L., Chang, E.: Tree Model Guided Candidate Generation for Mining Frequent Subtrees from XML. ACM Transactions on Knowledge Discovery from Data (TKDD) 2(2) (2008)

    Google Scholar 

  6. Zaki, M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transaction on Knowledge and Data Engineering 17(8), 1021–1035 (2005)

    Article  Google Scholar 

  7. Chi, Y., Yang, Y., Xia, Y., Muntz, R.R.: CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 63–73. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  8. Tan, H., Hadzic, F., Dillon, T.S., Chang, E.: State of the art of data mining of tree structured information. Int’l Journal Computer Systems Science and Eng. 23(2) (March 2008)

    Google Scholar 

  9. Hadzic, F., Tan, H., Dillon, T.S.: Mining of Data with Complex Structures. SCI, vol. 333. Springer, Heidelberg (2011)

    MATH  Google Scholar 

  10. Chen, L., Bhowmick, S.S., Chia, L.-T.: Mining Association Rules from Structural Deltas of Historical XML Documents. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 452–457. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  11. Braga, D., Campi, A., Ceri, S., Klemettinen, M., Lanzi, P.: Discovering interesting information in XML data with association rules. In: ACM Symposium on Applied Computing, Melbourne, Florida, pp. 450–454 (2003)

    Google Scholar 

  12. Kim, H., Kim, S., Weninger, T., Han, J., Abdelzaher, T.: DPMine: Efficiently Mining Discriminative Numerical Features for Pattern-Based Classification. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS, vol. 6322, pp. 35–50. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  13. Zaki, M.J., Aggarwal, C.C.: XRules: An Effective Structural Classifier for XML Data. In: SIGKDD 2003, Washington DC, USA (2003)

    Google Scholar 

  14. Da San Martino, G., Sperduti, A.: Mining Structured Data. IEEE Computational Intelligence Magazine 5(1) (2010)

    Google Scholar 

  15. Ikasari, N., Hadzic, F., Dillon, T.S.: Incorporating Qualitative Information for Credit Risk Assessment through Frequent Subtree Mining for XML. In: Tagarelli, A. (ed.) XML Data Mining: Models, Method, and Applications. IGI Global (2011)

    Google Scholar 

  16. Zaki, M.J., Hsiao, C.-J.: CHARM: An Efficient Algorithm for Closed Itemsets Mining. In: 2nd SIAM Int’l Conf. on Data Mining, Arlington, VA, USA, April 11-13 (2002)

    Google Scholar 

  17. Gouda, K., Zaki, M.J.: GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Mining and Knowledge Discovery 11(3), 223–242 (2005)

    Article  MathSciNet  Google Scholar 

  18. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)

    Google Scholar 

  19. Holmes, G., Donkin, A., Witten, I.H.: Weka: A machine learning workbench. In: 2nd Australia and New Zealand Intelligent Info. Systems Conf. Brisbane, Australia (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hadzic, F. (2012). A Structure Preserving Flat Data Format Representation for Tree-Structured Data. In: Cao, L., Huang, J.Z., Bailey, J., Koh, Y.S., Luo, J. (eds) New Frontiers in Applied Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 7104. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28320-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28320-8_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28319-2

  • Online ISBN: 978-3-642-28320-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics