KCAM: Concentrating on Structural Similarity for XML Fragments

Kong, Lingbo; Tang, Shiwei; Yang, Dongqing; Wang, Tengjiao; Gao, Jun

doi:10.1007/11775300_4

KCAM: Concentrating on Structural Similarity for XML Fragments

Lingbo Kong¹⁹,
Shiwei Tang^19,20,
Dongqing Yang¹⁹,
Tengjiao Wang¹⁹ &
…
Jun Gao¹⁹

Conference paper

1199 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4016))

Abstract

This paper proposes a new method, KCAM, to measure the structural similarity of XML fragments satisfying given keywords. Its name is derived directly after the key structure in this method, Keyword Common Ancestor Matrix. One KCAM for one XML fragment is a k × k upper triangle matrix. Each element a _{i, j} stores the level information of the SLCA (Smallest Lowest Common Ancestor) node corresponding to the keywords k _i, k _j. The matrix distance between KCAMs, denoted as KDist(,), can be used as the approximate structural similarity. KCAM is independent of label information in fragments. It is powerful to distinguish the structural difference between XML fragments.

Supported by Project 2005AA4Z307 under the National High-tech Research and Development of China, Project 60503037 under National Natural Science Foundation of China (NSFC), Project 4062018 under Beijing Natural Science Foundation (BNSF).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Tatarinov, I., Viglas, S.D.: Storing and Querying Ordered XML Using a Relational Database System. In: ACM SIGMOD 2002 (2002)
Google Scholar
Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: SIGMOD (2005)
Google Scholar
Weigel, F., Meuss, H., Schulz, K.U., Bry, F.: Content and Structure in Indexing and Ranking XML. In: WebDB (2004)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Pearson Education Limited, London (1999)
Google Scholar
Wolff, J.E., Flörke, H., Cremers, A.B.: Searching and browsing collections of structural information. In: Proceedings of IEEE Advances in Digital Libraries (ADL 2000), pp. 141–150 (2000)
Google Scholar
Schlieder, T., Meuss, H.: Result ranking for structured queries against xml documents. In: DELOS Workshop: Information Seeking, Searching and Querying in Digital Libraries (2000)
Google Scholar
Chinenyanga, T., Kushmerick, N.: Expressive and Efficient Ranked Querying of XML Data. In: WebDB (2001)
Google Scholar
Kotsakis, E.: Structured Information Retrieval in XML documents. In: Proceedings of the 2002 ACM symposium on Applied computing, pp. 663–667 (2002)
Google Scholar
Guha, S., et al.: Approximate XML Joins. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD, June 3-6 (2002)
Google Scholar
Yu, C., Qi, H., Jagadish, H.V.: Integration of IR into an XML Database. In: INEX Workshop, pp. 162–169 (2002)
Google Scholar
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Document. In: SIGMOD 2003, June 9-12 (2003)
Google Scholar
Joshi, S., Agrawal, N., Krishnapuram, R., Negi, S.: A Bag of Paths Model for Measuring Structural Similarity in Web Documents. In: SIGKDD 2003, August 24-27 (2003)
Google Scholar
Kailing, K., Kriegel, H.-P., Schönauer, S., Seidl, T.: Efficient similarity search for hierarchical data in large databases. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 676–693. Springer, Heidelberg (2004)
Chapter Google Scholar
Amer-Yahia, S., et al.: Structure and Content Scoring for XML. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB, August 30-September 2, pp. 361–372 (2005)
Google Scholar
Yang, R., Kalnis, P., Tung, A.K.: Similarity Evaluation on Tree-structured Data. In: ACM SIGMOD Conference, June 13-16(2005)
Google Scholar
Augsten, N., Böhlen, M.H., Gamper, J.: Approximate Matching of Hierarchical Data Using pq-Grams. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB, August 30 - September 2, pp. 301–312 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Peking University, Beijing, 100871, China
Lingbo Kong, Shiwei Tang, Dongqing Yang, Tengjiao Wang & Jun Gao
National Laboratory on Machine Perception, Peking University, Beijing, 100871, China
Shiwei Tang

Authors

Lingbo Kong
View author publications
You can also search for this author in PubMed Google Scholar
Shiwei Tang
View author publications
You can also search for this author in PubMed Google Scholar
Dongqing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Tengjiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Gao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Chinese University of Hong Kong, Hong Kong, China
Jeffrey Xu Yu
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
Department of Computing, Hong Kong Polytechnic University, Hong Kong
Hong Va Leong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kong, L., Tang, S., Yang, D., Wang, T., Gao, J. (2006). KCAM: Concentrating on Structural Similarity for XML Fragments. In: Yu, J.X., Kitsuregawa, M., Leong, H.V. (eds) Advances in Web-Age Information Management. WAIM 2006. Lecture Notes in Computer Science, vol 4016. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11775300_4

Download citation

DOI: https://doi.org/10.1007/11775300_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35225-9
Online ISBN: 978-3-540-35226-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics