Skip to main content

A Framework for the Relational Implementation of Tree Algebra to Retrieve Structured Document Fragments

  • Conference paper
Web Information Systems – WISE 2004 (WISE 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3306))

Included in the following conference series:

  • 1180 Accesses

Abstract

Naive users typically query documents with keywords. The problem of retrieval unit when keyword queries are posed against a structured document consisting of several logical components has been studied in the past. We developed a new query model based on tree algebra, which successfully resolves this problem. However, one important issue any such effective theoretical model has to deal with, is the difficulty in its equally effective implementation. In this paper, we overview our query model and explore how this model can be successfully implemented using an existing relational database technology. Tree nodes representing logical components of a structured document are indexed with their pre-order and post-order rankings and stored as a relation. We then show how the basic algebraic operation defined in our query model can be transformed into a simple SQL query against this relation. We also discuss various issues regarding query optimization on the implementation level of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Burkowski, F.J.: Retrieval activities in a database consisting of heterogeneous collections of structured text. In: Proc. of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 112–125. ACM Press, New York (1992)

    Chapter  Google Scholar 

  • Clarke, C.L.A., Cormack, G.V., Burkowski, F.J.: An algebra for structured text search and a framework for its implementation. The Computer Journal 38(1), 43–56 (1995)

    Google Scholar 

  • Florescu, D., Kossman, D., Manolescu, I.: Integrating keyword search into XML query processing. In: International World Wide Web Conference, pp. 119–135 (2000)

    Google Scholar 

  • Grust, T.: Accelerating XPath location steps. In: Proc. of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 109–120. ACM, New York (2002)

    Chapter  Google Scholar 

  • Grust, T., van Keulen, M., Teubner, J.: Staircase join: Teach a relational DBMS to watch its (axis) steps. In: Proc. of the 29th VLDB Conference, September 2003, pp. 524–535 (2003)

    Google Scholar 

  • Jaakkola, J., Kilpelainen, P.: Nested text-region algebra. Technical Report C-1999- 2, Department of Computer Science, University of Helsinki (January 1999), Available at, http://www.cs.helsinki.fi/TR/C-1999/2/

  • Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proc. of 27th International Conference on Very Large Data Bases, pp. 361–370. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  • Navarro, G., Baeza-Yates, R.A.: Proximal nodes: A model to query document databases by content and structure. ACM Transactions on Information Systems 15(4), 400–435 (1997)

    Article  Google Scholar 

  • Pradhan, S., Tanaka, K.: Retrieval of relevant portions of structured documents. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 328–338. Springer, Heidelberg (2004) (to appear)

    Google Scholar 

  • Sacks-Davis, R., Arnold-Moore, T., Zobel, J.: Database systems for structured documents. In: International Symposium on Advanced Database Technologies and Their Integration, pp. 272–283 (1994)

    Google Scholar 

  • Salminen, A., Tompa, F.: Pat expressions: an algebra for text search. Acta Linguistica Hungar 41(1-4), 277–306 (1992)

    Google Scholar 

  • Tatarinov, I., Viglas, S., Beyer, K.S., Shanmugasundaram, J., Shekita, E.J., Zhang, C.: Storing and querying ordered XML using a relational database system. In: Proc. of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 204–215. ACM, New York (2002)

    Chapter  Google Scholar 

  • Yoshikawa, M., Amagasa, T., Shimura, T., Uemura, S.: XRel: a path-based approach to storage and retrieval of XML documents using relational databases. ACM Transactions on Internet Technology 1(1), 110–141 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pradhan, S. (2004). A Framework for the Relational Implementation of Tree Algebra to Retrieve Structured Document Fragments. In: Zhou, X., Su, S., Papazoglou, M.P., Orlowska, M.E., Jeffery, K. (eds) Web Information Systems – WISE 2004. WISE 2004. Lecture Notes in Computer Science, vol 3306. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30480-7_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30480-7_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23894-2

  • Online ISBN: 978-3-540-30480-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics