A New XML Clustering for Structural Retrieval

Hwang, Jeong Hee; Ryu, Keun Ho

doi:10.1007/978-3-540-30464-7_30

A New XML Clustering for Structural Retrieval

Jeong Hee Hwang²¹ &
Keun Ho Ryu²¹

Conference paper

951 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3288))

Abstract

XML becomes increasingly important in data exchange and information management. Starting point for retrieving the information and integrating the documents efficiently is clustering the documents that have similar structure. Thus, in this paper, we propose a new XML document clustering method based on similar structure. Our approach first extracts the representative structures of XML documents by sequential pattern mining. And then we cluster XML documents of similar structure using the clustering algorithm for transactional data, assuming that an XML document as a transaction and the frequent structure of documents as the items of the transaction. We also apply our technique to XML retrieval. Our experiments show the efficiency and good performance of the proposed clustering method.

This work was supported by University IT Research Center Project and ETRI in Korea.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kotasek, P., Zendulka, J.: An XML Framework Proposal for Knowledge Discovery in Database. In: 4th European Conference on Principles and Practice Knowledge Discovery in Databases (2000)
Google Scholar
Wang, K., Liu, H.: Discovery Typical Structures of Documents: A Road Map Approach. In: Prof. of the ACM SIGIR (1998)
Google Scholar
Widom, J.: Data Management for XML: Research Directions. In: IEEE Computer Society Technical Commitee on Data Engineering (1999)
Google Scholar
Nayak, R., Witt, R., Tonev, A.: Data Mining and XML Documents. In: International Conference on Internet Computing (2002)
Google Scholar
Shasha, D., Wang, J.T.L., Shan, H., Zhang, K.: TreeGrep: Approximate Searching in Unordered Trees. In: Proc. of the 14th International Conference on Scientific and Statistical Database Management (2002)
Google Scholar
Cole, R., Hariharan, R., Indyk, P.: Tree Pattern Matching and Subset Matching in Deterministic O(nlog ³ m) Time. In: Prof. of the 10th Annual ACM-SIAM symposium on discrete algorithms (1999)
Google Scholar
Wang, J.T., Shasha, D., Chang, G.J.S.: Structural Matching and Discovery in Document Databases. In: International Conference ACM SIGMOD on Management of Data (1997)
Google Scholar
Pei, J., Han, J., Asi, B.M., Pinto, H.: PrefixSpan: Mining Sequential Pattern Efficiently by Prefix-Projected Pattern Growth. In: International Conference Data Engineering(ICDE) (2001)
Google Scholar
Yang, Y., Guan, X., You, J.: CLOPE: A Fast and Effective Clustering Algorithm for Transaction Data. In: Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)
Google Scholar
Wang, K., Xu, C.: Clustering Transactions Using Large Items. In: Proc. of ACM CIKM 1999 (1999)
Google Scholar
Lee, J.W., Lee, K., Kim, W.: Preparation for Semantics-Based XML Mining. In: IEEE International Conference on Data Mining(ICDM) (2001)
Google Scholar
Doucet, A., Myka, H.A.: Naive Clustering of a Large XML Document Collection. In: Proc. of the 1st INEX, Germany, (2002)
Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H.: Efficient Substructure Discovery from Large Semi-structured Data. In: Proc. of the Second SIAM International Conference on Data Mining (2002)
Google Scholar
Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: Clustering XML Schemas for Effective Integration. In: Proc. 11th ACM International Conference on Information and Knowledge Management (2002)
Google Scholar
Zaki, M.: Efficiently Mining Frequent Tree in a Forest. In: 6th ACM SIGKDD International Conference (2002)
Google Scholar
Termier, A., Rouster, M.C., Sebag, M.: TreeFinder: A First Step towards XML Data Mining. In: IEEE International Conference on Data Mining, ICDM (2002)
Google Scholar
Yoon, J., Raghavan, V., Chakilam, V.: BitCube: Clustering and Statistical Analysis for XML Documents. In: Proc. of the 13th International Conference on Scientific and Statistical Database Management (2001)
Google Scholar
NIAGARA query engine, http://www.cs.wisc.edu/niagara/data.html

Download references

Author information

Authors and Affiliations

Database Laboratory, Chungbuk National University, Korea
Jeong Hee Hwang & Keun Ho Ryu

Authors

Jeong Hee Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Keun Ho Ryu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Informatica e Automazione, Università Roma Tre, Via Vasca Navale 79, 00146, Roma, Italy
Paolo Atzeni
Computer Science Department, University of California, 3731 Boelter Hall, 90095, Los Angeles, CA, USA
Wesley Chu
Department of Computer Science, Tsinghua University, 100084, Beijing, P.R. China
Hongjun Lu
Department of Computer Science and Engineering, Fudan University, 200433, China
Shuigeng Zhou
School of Computing, National University of Singapore,
Tok-Wang Ling

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hwang, J.H., Ryu, K.H. (2004). A New XML Clustering for Structural Retrieval. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, TW. (eds) Conceptual Modeling – ER 2004. ER 2004. Lecture Notes in Computer Science, vol 3288. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30464-7_30

Download citation

DOI: https://doi.org/10.1007/978-3-540-30464-7_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23723-5
Online ISBN: 978-3-540-30464-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics