Skip to main content

A New XML Clustering for Structural Retrieval

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3288))

Abstract

XML becomes increasingly important in data exchange and information management. Starting point for retrieving the information and integrating the documents efficiently is clustering the documents that have similar structure. Thus, in this paper, we propose a new XML document clustering method based on similar structure. Our approach first extracts the representative structures of XML documents by sequential pattern mining. And then we cluster XML documents of similar structure using the clustering algorithm for transactional data, assuming that an XML document as a transaction and the frequent structure of documents as the items of the transaction. We also apply our technique to XML retrieval. Our experiments show the efficiency and good performance of the proposed clustering method.

This work was supported by University IT Research Center Project and ETRI in Korea.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kotasek, P., Zendulka, J.: An XML Framework Proposal for Knowledge Discovery in Database. In: 4th European Conference on Principles and Practice Knowledge Discovery in Databases (2000)

    Google Scholar 

  2. Wang, K., Liu, H.: Discovery Typical Structures of Documents: A Road Map Approach. In: Prof. of the ACM SIGIR (1998)

    Google Scholar 

  3. Widom, J.: Data Management for XML: Research Directions. In: IEEE Computer Society Technical Commitee on Data Engineering (1999)

    Google Scholar 

  4. Nayak, R., Witt, R., Tonev, A.: Data Mining and XML Documents. In: International Conference on Internet Computing (2002)

    Google Scholar 

  5. Shasha, D., Wang, J.T.L., Shan, H., Zhang, K.: TreeGrep: Approximate Searching in Unordered Trees. In: Proc. of the 14th International Conference on Scientific and Statistical Database Management (2002)

    Google Scholar 

  6. Cole, R., Hariharan, R., Indyk, P.: Tree Pattern Matching and Subset Matching in Deterministic O(nlog 3 m) Time. In: Prof. of the 10th Annual ACM-SIAM symposium on discrete algorithms (1999)

    Google Scholar 

  7. Wang, J.T., Shasha, D., Chang, G.J.S.: Structural Matching and Discovery in Document Databases. In: International Conference ACM SIGMOD on Management of Data (1997)

    Google Scholar 

  8. Pei, J., Han, J., Asi, B.M., Pinto, H.: PrefixSpan: Mining Sequential Pattern Efficiently by Prefix-Projected Pattern Growth. In: International Conference Data Engineering(ICDE) (2001)

    Google Scholar 

  9. Yang, Y., Guan, X., You, J.: CLOPE: A Fast and Effective Clustering Algorithm for Transaction Data. In: Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)

    Google Scholar 

  10. Wang, K., Xu, C.: Clustering Transactions Using Large Items. In: Proc. of ACM CIKM 1999 (1999)

    Google Scholar 

  11. Lee, J.W., Lee, K., Kim, W.: Preparation for Semantics-Based XML Mining. In: IEEE International Conference on Data Mining(ICDM) (2001)

    Google Scholar 

  12. Doucet, A., Myka, H.A.: Naive Clustering of a Large XML Document Collection. In: Proc. of the 1st INEX, Germany, (2002)

    Google Scholar 

  13. Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H.: Efficient Substructure Discovery from Large Semi-structured Data. In: Proc. of the Second SIAM International Conference on Data Mining (2002)

    Google Scholar 

  14. Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: Clustering XML Schemas for Effective Integration. In: Proc. 11th ACM International Conference on Information and Knowledge Management (2002)

    Google Scholar 

  15. Zaki, M.: Efficiently Mining Frequent Tree in a Forest. In: 6th ACM SIGKDD International Conference (2002)

    Google Scholar 

  16. Termier, A., Rouster, M.C., Sebag, M.: TreeFinder: A First Step towards XML Data Mining. In: IEEE International Conference on Data Mining, ICDM (2002)

    Google Scholar 

  17. Yoon, J., Raghavan, V., Chakilam, V.: BitCube: Clustering and Statistical Analysis for XML Documents. In: Proc. of the 13th International Conference on Scientific and Statistical Database Management (2001)

    Google Scholar 

  18. NIAGARA query engine, http://www.cs.wisc.edu/niagara/data.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hwang, J.H., Ryu, K.H. (2004). A New XML Clustering for Structural Retrieval. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, TW. (eds) Conceptual Modeling – ER 2004. ER 2004. Lecture Notes in Computer Science, vol 3288. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30464-7_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30464-7_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23723-5

  • Online ISBN: 978-3-540-30464-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics