Abstract
This paper proposes a new method, KCAM, to measure the structural similarity of XML fragments satisfying given keywords. Its name is derived directly after the key structure in this method, Keyword Common Ancestor Matrix. One KCAM for one XML fragment is a k × k upper triangle matrix. Each element a i, j stores the level information of the SLCA (Smallest Lowest Common Ancestor) node corresponding to the keywords k i , k j . The matrix distance between KCAMs, denoted as KDist(,), can be used as the approximate structural similarity. KCAM is independent of label information in fragments. It is powerful to distinguish the structural difference between XML fragments.
Supported by Project 2005AA4Z307 under the National High-tech Research and Development of China, Project 60503037 under National Natural Science Foundation of China (NSFC), Project 4062018 under Beijing Natural Science Foundation (BNSF).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Tatarinov, I., Viglas, S.D.: Storing and Querying Ordered XML Using a Relational Database System. In: ACM SIGMOD 2002 (2002)
Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: SIGMOD (2005)
Weigel, F., Meuss, H., Schulz, K.U., Bry, F.: Content and Structure in Indexing and Ranking XML. In: WebDB (2004)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Pearson Education Limited, London (1999)
Wolff, J.E., Flörke, H., Cremers, A.B.: Searching and browsing collections of structural information. In: Proceedings of IEEE Advances in Digital Libraries (ADL 2000), pp. 141–150 (2000)
Schlieder, T., Meuss, H.: Result ranking for structured queries against xml documents. In: DELOS Workshop: Information Seeking, Searching and Querying in Digital Libraries (2000)
Chinenyanga, T., Kushmerick, N.: Expressive and Efficient Ranked Querying of XML Data. In: WebDB (2001)
Kotsakis, E.: Structured Information Retrieval in XML documents. In: Proceedings of the 2002 ACM symposium on Applied computing, pp. 663–667 (2002)
Guha, S., et al.: Approximate XML Joins. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD, June 3-6 (2002)
Yu, C., Qi, H., Jagadish, H.V.: Integration of IR into an XML Database. In: INEX Workshop, pp. 162–169 (2002)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Document. In: SIGMOD 2003, June 9-12 (2003)
Joshi, S., Agrawal, N., Krishnapuram, R., Negi, S.: A Bag of Paths Model for Measuring Structural Similarity in Web Documents. In: SIGKDD 2003, August 24-27 (2003)
Kailing, K., Kriegel, H.-P., Schönauer, S., Seidl, T.: Efficient similarity search for hierarchical data in large databases. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 676–693. Springer, Heidelberg (2004)
Amer-Yahia, S., et al.: Structure and Content Scoring for XML. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB, August 30-September 2, pp. 361–372 (2005)
Yang, R., Kalnis, P., Tung, A.K.: Similarity Evaluation on Tree-structured Data. In: ACM SIGMOD Conference, June 13-16(2005)
Augsten, N., Böhlen, M.H., Gamper, J.: Approximate Matching of Hierarchical Data Using pq-Grams. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB, August 30 - September 2, pp. 301–312 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kong, L., Tang, S., Yang, D., Wang, T., Gao, J. (2006). KCAM: Concentrating on Structural Similarity for XML Fragments. In: Yu, J.X., Kitsuregawa, M., Leong, H.V. (eds) Advances in Web-Age Information Management. WAIM 2006. Lecture Notes in Computer Science, vol 4016. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11775300_4
Download citation
DOI: https://doi.org/10.1007/11775300_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35225-9
Online ISBN: 978-3-540-35226-6
eBook Packages: Computer ScienceComputer Science (R0)