Abstract
Keyword search from Informational Retrieval (IR) can be seen as one most convenient processing mode catering for common users to obtain interesting information. As XML data becomes more and more widespread, the trend of adapting keyword search on XML data also becomes more and more active. In this paper, we first try nesting mechanism for XML keyword search, which just uses a little nesting skill. This attempt has several benefits. For example, it is convenient for common users, because they need not to know any organization knowledge of the target XML data. Secondly, the nesting pattern can be easily transformed into structural hints, which has same mechanism as what XML data model does. Finally, since there is no need of label information, we can retrieve XML fragments from different schemas. Besides, this paper also proposes a new similarity measuring method for retrieved XML fragments which can be from different schemas. Its kernel is KCAM (Keyword Common Ancestor Matrix) structure, which stores the level information of SLCA (Smallest Lowest Common Ancestor) node between two keywords. By mapping XML fragments into KCAMs, the structural similarity can be computed using matrix distance. KCAM distance can go well with the nesting keyword method.
Supported by Project 2005AA4Z307 under the National High-tech Research and Development of China, Project 60503037 under the National Natural Science Foundation of China (NSFC), Project 4062018 under Beijing Natural Science Foundation(BNSF).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Tatarinov, I., Viglas, S.D.: Storing and Querying Ordered XML Using a Relational Database System. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD), Madison, Wisconsin, June 3-6, pp. 204–215 (2002)
Clark, J., DeRose, S.: XML Path Language(XPath) version 1.0 w3c recommendation. World Wide Web Consortium (November 1999)
Chamberlin.D, et al.: XQuery: A Query Language for XML W3C working draft. Technical Report WD-xquery-20010215, World Wide Web Consortium (February 2001)
Schmidt, A., Kersten, L.M., Windhouwer, M.: Querying XML documents made easy: Nearest concept queries. In: Proceedings of the 17th International Conference on Data Engineering (ICDE), pp. 321–329 (April 2001)
Guo, L., et al.: XRANK: Ranked Keyword Search over XML Documents. In: SIGMOD 2003, June 9-12 (2003)
Cohen, S., et al.: Xsearch: A semantic search engine for XML. In: Proceedings of the 29th VLDB Conference, September 9-12, pp. 33–44 (2003)
Weigel, F., et al.: Content and Structure in Indexing and Ranking XML. WebDB (2004)
Botev, C., Shanmugasundaram, J.: Context-Sensitive Keyword Search and Ranking for XML. In: Eighth International Workshop on the Web and Databases (WebDB 2005), June 16-17 (2005)
Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: ACM SIGMOD 2005, June 14-16 (2005)
Schlieder, T., Meuss, H.: Result ranking for structured queries against XML documents. In: DELOS Workshop: Information Seeking, Searching and Querying in Digital Libraries (2000)
Guha, S., et al.: Approximate XML Joins. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD), June 3-6 (2002)
Yu, C., Qi, H., Jagadish, V.H.: Integration of IR into an XML Database. In: INEX Workshop, pp. 162–169 (2002)
Kailing, K., Kriegel, H.-P., Schönauer, S., Seidl, T.: Efficient similarity search for hierarchical data in large databases. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 676–693. Springer, Heidelberg (2004)
Amer-Yahia, S., et al.: Structure and Content Scoring for XML. In: Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), August 30 - September 2, pp. 361–372 (2005)
Yang, R., Kalnis, P., Tung, K.A.: Similarity Evaluation on Tree-structured Data. In: ACM SIGMOD Conference, June 13-16 (2005)
Augsten, N., Böhlen, H.M., Gamper, J.: Approximate Matching of Hierarchical Data Using pq-Grams. In: Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), August 30 - September 2, pp. 301–312 (2005)
Joshi, S., et al.: A Bag of Paths Model for Measuring Structural Similarity in Web Documents. In: SIGKDD 2003, August 24-27 (2003)
Carmel, D., et al.: Searching XML Documents via XML Fragments. In: SIGIR 2003, July 28-August 1 (2003)
Wolff, E.J., Flörke, H., Cremers, B.A.: XPRES: A ranking approach to retrieval on structured documents. University of Bonn. Technical Report IAI-TR-99- 12 (1999)
Florescu, D., Kossmann, D., Manolescu, I.: Integrating Keyword Search into XML Query Processing. In: WWW (2000)
Fuhr, N., Großjohann, K.: XIRQL: A query language for information retrieval in XML documents. In: International Conference on Information Retrieval, SIGIR (2001)
Bremer, M.J., Gertz, M.: XQuery/IR: Integrating XML Document and Data Retrieval. In: WebDB (2002)
Chinenyanga, T.T., Kushmerick, N.: An expressive and efficient language for XML information retrieval. Journal of the American Society for Information Science and Technology (JASIST) 53(6), 438–453 (2002)
Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance ranking. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 477–495. Springer, Heidelberg (2002)
Al-Khalifa, S., Yu, C., Jagadish, V.H.: Querying Structured Text in an XML Database. In: SIGMOD 2003, June 9-12 (2003)
Amer-Yahia, S., Botev, C., Shanmugasundaram, J.: TeXQuery: A FullText Search Extension to XQuery. In: Proceedings of the 13th conference on World Wide Web, May 17-22, pp. 583–594 (2004)
Amer-Yahia, S., Lakshmanan, V.L., Pandit, S.: FleXPath: Flexible Structure and Full- Text Querying for XML. In: SIGMOD 2004, June 13-18 (2004)
Curtmola, E., et al.: GalaTex: A Conformant Implementation of the XQuery FullText Language. In: Informal Proceedings of the Second International Workshop on XQuery Implementation, Experience, and Perspectives (XIME-P), June 16-17 (2005)
Wolff, E.J., Flörke, H., Cremers, B.A.: Searching and browsing collections of structural information. In: Proceedings of IEEE Advances in Digital Libraries (ADL 2000), pp. 141–150 (May 2000)
Woodley, A., Geva, S.: NLPX - An XML-IR System with a Natural Language Interface. In: Proceedings of the 9th Australian Document Computing Symposium, December 13 (2004)
Zhang, K.: On the editing distance between unordered labeled trees. Information Processing Letters 42(3), 133–139 (1992)
Shasha, D., Zhang, K.: Approximate Tree Pattern Matching. In: Apostolico, A., Galil, Z. (eds.) Pattern Matching Algorithms. Oxford University, Oxford (1997)
Bille, P.: A survey on tree edit distance and related problems. Theoretical Computer Science 337(1-3), 217–239 (2005)
Salton, G.: Automatic Information Organization and Retrieval. McGraw-Hill, New York (1968)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, pp. 19–73. Pearson Education Limited, London (1999)
Kotsakis, E.: Structured Information Retrieval in XML documents. In: Proceedings of the 2002 ACM symposium on Applied computing, pp. 663–667 (March 2002)
Schlieder, T., Meüss, H.: Querying and ranking XML documents. Journal of the American Society for Information Science and Technology 53(6), 489–503 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kong, L., Tang, S., Yang, D., Wang, T., Gao, J. (2006). No Tag, a Little Nesting, and Great XML Keyword Search. In: Ng, H.T., Leong, MK., Kan, MY., Ji, D. (eds) Information Retrieval Technology. AIRS 2006. Lecture Notes in Computer Science, vol 4182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880592_15
Download citation
DOI: https://doi.org/10.1007/11880592_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45780-0
Online ISBN: 978-3-540-46237-8
eBook Packages: Computer ScienceComputer Science (R0)