Skip to main content

No Tag, a Little Nesting, and Great XML Keyword Search

  • Conference paper
  • 954 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4182))

Abstract

Keyword search from Informational Retrieval (IR) can be seen as one most convenient processing mode catering for common users to obtain interesting information. As XML data becomes more and more widespread, the trend of adapting keyword search on XML data also becomes more and more active. In this paper, we first try nesting mechanism for XML keyword search, which just uses a little nesting skill. This attempt has several benefits. For example, it is convenient for common users, because they need not to know any organization knowledge of the target XML data. Secondly, the nesting pattern can be easily transformed into structural hints, which has same mechanism as what XML data model does. Finally, since there is no need of label information, we can retrieve XML fragments from different schemas. Besides, this paper also proposes a new similarity measuring method for retrieved XML fragments which can be from different schemas. Its kernel is KCAM (Keyword Common Ancestor Matrix) structure, which stores the level information of SLCA (Smallest Lowest Common Ancestor) node between two keywords. By mapping XML fragments into KCAMs, the structural similarity can be computed using matrix distance. KCAM distance can go well with the nesting keyword method.

Supported by Project 2005AA4Z307 under the National High-tech Research and Development of China, Project 60503037 under the National Natural Science Foundation of China (NSFC), Project 4062018 under Beijing Natural Science Foundation(BNSF).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tatarinov, I., Viglas, S.D.: Storing and Querying Ordered XML Using a Relational Database System. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD), Madison, Wisconsin, June 3-6, pp. 204–215 (2002)

    Google Scholar 

  2. Clark, J., DeRose, S.: XML Path Language(XPath) version 1.0 w3c recommendation. World Wide Web Consortium (November 1999)

    Google Scholar 

  3. Chamberlin.D, et al.: XQuery: A Query Language for XML W3C working draft. Technical Report WD-xquery-20010215, World Wide Web Consortium (February 2001)

    Google Scholar 

  4. Schmidt, A., Kersten, L.M., Windhouwer, M.: Querying XML documents made easy: Nearest concept queries. In: Proceedings of the 17th International Conference on Data Engineering (ICDE), pp. 321–329 (April 2001)

    Google Scholar 

  5. Guo, L., et al.: XRANK: Ranked Keyword Search over XML Documents. In: SIGMOD 2003, June 9-12 (2003)

    Google Scholar 

  6. Cohen, S., et al.: Xsearch: A semantic search engine for XML. In: Proceedings of the 29th VLDB Conference, September 9-12, pp. 33–44 (2003)

    Google Scholar 

  7. Weigel, F., et al.: Content and Structure in Indexing and Ranking XML. WebDB (2004)

    Google Scholar 

  8. Botev, C., Shanmugasundaram, J.: Context-Sensitive Keyword Search and Ranking for XML. In: Eighth International Workshop on the Web and Databases (WebDB 2005), June 16-17 (2005)

    Google Scholar 

  9. Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: ACM SIGMOD 2005, June 14-16 (2005)

    Google Scholar 

  10. Schlieder, T., Meuss, H.: Result ranking for structured queries against XML documents. In: DELOS Workshop: Information Seeking, Searching and Querying in Digital Libraries (2000)

    Google Scholar 

  11. Guha, S., et al.: Approximate XML Joins. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD), June 3-6 (2002)

    Google Scholar 

  12. Yu, C., Qi, H., Jagadish, V.H.: Integration of IR into an XML Database. In: INEX Workshop, pp. 162–169 (2002)

    Google Scholar 

  13. Kailing, K., Kriegel, H.-P., Schönauer, S., Seidl, T.: Efficient similarity search for hierarchical data in large databases. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 676–693. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  14. Amer-Yahia, S., et al.: Structure and Content Scoring for XML. In: Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), August 30 - September 2, pp. 361–372 (2005)

    Google Scholar 

  15. Yang, R., Kalnis, P., Tung, K.A.: Similarity Evaluation on Tree-structured Data. In: ACM SIGMOD Conference, June 13-16 (2005)

    Google Scholar 

  16. Augsten, N., Böhlen, H.M., Gamper, J.: Approximate Matching of Hierarchical Data Using pq-Grams. In: Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), August 30 - September 2, pp. 301–312 (2005)

    Google Scholar 

  17. Joshi, S., et al.: A Bag of Paths Model for Measuring Structural Similarity in Web Documents. In: SIGKDD 2003, August 24-27 (2003)

    Google Scholar 

  18. Carmel, D., et al.: Searching XML Documents via XML Fragments. In: SIGIR 2003, July 28-August 1 (2003)

    Google Scholar 

  19. Wolff, E.J., Flörke, H., Cremers, B.A.: XPRES: A ranking approach to retrieval on structured documents. University of Bonn. Technical Report IAI-TR-99- 12 (1999)

    Google Scholar 

  20. Florescu, D., Kossmann, D., Manolescu, I.: Integrating Keyword Search into XML Query Processing. In: WWW (2000)

    Google Scholar 

  21. Fuhr, N., Großjohann, K.: XIRQL: A query language for information retrieval in XML documents. In: International Conference on Information Retrieval, SIGIR (2001)

    Google Scholar 

  22. Bremer, M.J., Gertz, M.: XQuery/IR: Integrating XML Document and Data Retrieval. In: WebDB (2002)

    Google Scholar 

  23. Chinenyanga, T.T., Kushmerick, N.: An expressive and efficient language for XML information retrieval. Journal of the American Society for Information Science and Technology (JASIST) 53(6), 438–453 (2002)

    Article  Google Scholar 

  24. Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance ranking. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 477–495. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  25. Al-Khalifa, S., Yu, C., Jagadish, V.H.: Querying Structured Text in an XML Database. In: SIGMOD 2003, June 9-12 (2003)

    Google Scholar 

  26. Amer-Yahia, S., Botev, C., Shanmugasundaram, J.: TeXQuery: A FullText Search Extension to XQuery. In: Proceedings of the 13th conference on World Wide Web, May 17-22, pp. 583–594 (2004)

    Google Scholar 

  27. Amer-Yahia, S., Lakshmanan, V.L., Pandit, S.: FleXPath: Flexible Structure and Full- Text Querying for XML. In: SIGMOD 2004, June 13-18 (2004)

    Google Scholar 

  28. Curtmola, E., et al.: GalaTex: A Conformant Implementation of the XQuery FullText Language. In: Informal Proceedings of the Second International Workshop on XQuery Implementation, Experience, and Perspectives (XIME-P), June 16-17 (2005)

    Google Scholar 

  29. Wolff, E.J., Flörke, H., Cremers, B.A.: Searching and browsing collections of structural information. In: Proceedings of IEEE Advances in Digital Libraries (ADL 2000), pp. 141–150 (May 2000)

    Google Scholar 

  30. Woodley, A., Geva, S.: NLPX - An XML-IR System with a Natural Language Interface. In: Proceedings of the 9th Australian Document Computing Symposium, December 13 (2004)

    Google Scholar 

  31. Zhang, K.: On the editing distance between unordered labeled trees. Information Processing Letters 42(3), 133–139 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  32. Shasha, D., Zhang, K.: Approximate Tree Pattern Matching. In: Apostolico, A., Galil, Z. (eds.) Pattern Matching Algorithms. Oxford University, Oxford (1997)

    Google Scholar 

  33. Bille, P.: A survey on tree edit distance and related problems. Theoretical Computer Science 337(1-3), 217–239 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  34. Salton, G.: Automatic Information Organization and Retrieval. McGraw-Hill, New York (1968)

    Google Scholar 

  35. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, pp. 19–73. Pearson Education Limited, London (1999)

    Google Scholar 

  36. Kotsakis, E.: Structured Information Retrieval in XML documents. In: Proceedings of the 2002 ACM symposium on Applied computing, pp. 663–667 (March 2002)

    Google Scholar 

  37. Schlieder, T., Meüss, H.: Querying and ranking XML documents. Journal of the American Society for Information Science and Technology 53(6), 489–503 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kong, L., Tang, S., Yang, D., Wang, T., Gao, J. (2006). No Tag, a Little Nesting, and Great XML Keyword Search. In: Ng, H.T., Leong, MK., Kan, MY., Ji, D. (eds) Information Retrieval Technology. AIRS 2006. Lecture Notes in Computer Science, vol 4182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880592_15

Download citation

  • DOI: https://doi.org/10.1007/11880592_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45780-0

  • Online ISBN: 978-3-540-46237-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics