Skip to main content

Analyzing the Properties of XML Fragments Decomposed from the INEX Document Collection

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3493))

Abstract

In current keyword-based XML fragment retrieval systems, various granules of XML fragments are returned as retrieval results. The number of the XML fragments is huge, so this adversely affects the index construction time and query processing time of the XML fragment retrieval systems if they cannot extract only the answer XML fragments with certainty. In this paper, we propose a method for determining XML fragments that are appropriate in keyword-based XML fragment retrieval. This would help to improve overall performance of XML fragment retrieval systems. The proposed method utilizes and analyzes statistical information of XML fragments based on a technique of the dynamics of terminology in quantitative linguistics. Moreover, our keyword-based XML fragment retrieval system runs on a relational database system. In this paper, we briefly explain the implementation of our system.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, S., Chaudhuriand, S., Das, G.: DBXplorer: A System for Keyword-Based Search over Relational Databases. In: Proc. of the 18th International Conference on Data Engineering, pp. 5–16. IEEE CS Press, Los Alamitos (2002)

    Chapter  Google Scholar 

  2. Amer-Yahia, S., Botev, C., Buxton, S., Case, P., Doerre, J., McBeath, D., Rys, M., Shanmugasundaram, J.: XQuery 1.0 and XPath 2.0 Full-Text, W3C Working Draft 09 July 2004, http://www.w3.org/TR/xmlquery-full-text/ (July 2004)

  3. Amer-Yahia, S., Case, P.: XQuery 1.0 and XPath 2.0 Full-Text Use Cases, W3C Working Draft 09 July 2004, http://www.w3.org/TR/xmlquery-full-text-use-cases/ (July 2004)

  4. Boag, S., Chamberlin, D., Fernandez, M.F., Florescu, D., Robie, J., Siméon, J.: XQuery 1.0: An XML Query Language, W3C Working Draft 29 October, http://www.w3.org/TR/xquery (October 2004)

  5. Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E., Yergeau, F.: Extensible Markup Language (XML) 1.0 (Third Edition), W3C Recommendation 04 February, http://www.w3.org/TR/REC-xml (Febuary 2004)

  6. Bremer, J.-M., Gertz, M.: XQuery/IR: Integrating XML Document and Data Retrieval. In: Proc. of the 5th International Workshop on the Web and Databases (WebDB 2002), pp. 1–6 (June 2002)

    Google Scholar 

  7. Chien, S.-Y., Tsotras, V.J., Zaniolo, C., Zhang, D.: Storing and querying multiversion XML documents using durable node numbers. In: Proc. of the 2nd International Conference on Web Information Systems Engineering, pp. 270–279 (2001)

    Google Scholar 

  8. Clark, J., DeRose, S.: XML Path Language (XPath) Version 1.0. W3C Recommendation 16 November (1999), http://www.w3.org/TR/xpath (November 1999)

  9. Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: A Semantic Search Engine for XML. In: Proc. of 29th International Conference on Very Large Data Bases, pp. 45–56. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  10. Crouch, C.J., Apte, S., Bapat, H.: Using the Extended Vector Model for XML Retrieval. In: Proc. of the 1st Workshop of the Initiative for the Evaluation of XML Retrieval (INEX), March 2003. ERCIM, pp. 95–98 (2003)

    Google Scholar 

  11. Cui, H., Wen, J.-R., Chua, T.-S.: Hierarchical Indexing and Flexible Element Retrieval for Structured Document. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 73–87. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  12. Gövert, N., Fuhr, N., Abolhassani, M., Großjohann, K.: Content-Oriented XML Retrieval with HyREX. In: Proc. of the First Workshop of the Initiative for the Evaluation of XML Retrieval, March 2003. ERCIM, pp. 26–32 (2003)

    Google Scholar 

  13. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Documents. In: Proc. of the 2003 ACM SIGMOD International Conference on Management of Data, June 2003, pp. 16–27. ACM Press, New York (2003)

    Chapter  Google Scholar 

  14. Hatano, K., Kinutani, H., Watanabe, M., Mori, Y., Yoshikawa, M., Uemura, S.: Keyword-based XML Portion Retrieval: Experimental Evaluation based on INEX 2003 Relevance Assessments. In: Proc. of the Second Workshop of the Initiative for the Evaluation of XML Retrieval, March 2004, pp. 81–88 (2004)

    Google Scholar 

  15. Hatano, K., Kinutani, H., Yoshikawa, M., Uemura, S.: Information Retrieval System for XML Documents. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, pp. 758–767. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  16. Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword Proximity Search on XML Graphs. In: Proc. of the 19th International Conference on Data Engineering, pp. 367–378. IEEE CS Press, Los Alamitos (2003)

    Google Scholar 

  17. Kageura, K.: The Dynamics of Terminology. John Benjamins, Amsterdam (2002)

    Google Scholar 

  18. Kamps, J., de Rijke, M., Sigurbjörnsson, B.: Length Normalization in XML Retrieval. In: Proc. of the 27th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 80–87. ACM Press, New York (2004)

    Google Scholar 

  19. Kaszkiel, M., Zobel, J.: Passage Retrieval Revisited. In: Proc. of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 178–185. ACM Press, New York (1997)

    Chapter  Google Scholar 

  20. Nielsen, J.: Do Websites Have Increasing Returns? (April 1997) Jakob Nielsen’s Alertbox, April 15 (1997), http://www.useit.com/alertbox/9704b.html

  21. Shin, D., Jang, H., Jin, H.: BUS: An Effective Indexing and Retrieval Scheme in Structured Documents. In: Proc. of the 3rd ACM Conference on Digital libraries (DL 1998), pp. 235–243 (June 1998)

    Google Scholar 

  22. Yoshikawa, M., Amagasa, T., Shimura, T., Uemura, S.: XRel: A Path-Based Approach to Storage and Retrieval of XML Documents using Relational Databases. ACM Transactions on Internet Technology 1(1), 110–141 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hatano, K., Kinutani, H., Amagasa, T., Mori, Y., Yoshikawa, M., Uemura, S. (2005). Analyzing the Properties of XML Fragments Decomposed from the INEX Document Collection. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds) Advances in XML Information Retrieval. INEX 2004. Lecture Notes in Computer Science, vol 3493. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424550_14

Download citation

  • DOI: https://doi.org/10.1007/11424550_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26166-7

  • Online ISBN: 978-3-540-32053-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics