Retrieving Relevant Portions from Structured Digital Documents

Pradhan, Sujeet; Tanaka, Katsumi

doi:10.1007/978-3-540-30075-5_32

Sujeet Pradhan¹⁹ &
Katsumi Tanaka²⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3180))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

657 Accesses
3 Citations

Abstract

Retrieving relevant portions from structured documents consisting of logical components has been a challenging task in both the database and the information retrieval world, since an answer related to a query may be split across multiple components. In this paper, we propose a query mechanism that applies database style query evaluation in response to IR style keyword-based queries for retrieving relevant answers from a logically structured document. We first define an appropriate semantics of keywords-based queries and then propose an algebra that is capable of computing every relevant portion of a document, which can be considered answer to a set of arbitrary keywords. The ordering and structural relationship among the components are preserved in the answer. We also introduce several practically useful filters that saves users from having to deal with an overwhelming number of answers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Al-Khalifa, S., Yu, C., Jagadish, H.V.: Querying structured text in an XML database. In: SIGMOD 2003, pp. 4–15 (2003)
Google Scholar
Bhalotia, G., Nakhe, C., Hulgeri, A., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using BANKS. In: ICDE, pp. 431–440 (2002)
Google Scholar
Burkowski, F.J.: Retrieval activities in a database consisting of heterogeneous collections of structured text. In: Proc. of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 112–125. ACM Press, New York (1992)
Chapter Google Scholar
Clarke, C.L.A., Cormack, G.V., Burkowski, F.J.: An algebra for structured text search and a framework for its implementation. The Computer Journal 38(1), 43–56 (1995)
Google Scholar
Florescu, D., Kossman, D., Manolescu, I.: Integrating keyword search into XML query processing. In: International World Wide Web Conference, pp. 119–135 (2000)
Google Scholar
Jaakkola, J., Kilpelaine, P.: Nested text-region algebra. Technical Report C-1999-2, Department of Computer Science, University of Helsinki (January 1999), Available at http://www.cs.helsinki.fi/TR/C-1999/2/
Li, W.-S., Candan, K.S., Vu, Q., Agrawal, D.: Retrieving and organizing web pages by ‘Information Unit’. In: Tenth International WWW Conference, Hong Kong, China, pp. 230–244 (2001)
Google Scholar
Navarro, G., Baeza-Yates, R.A.: Proximal nodes: A model to query document databases by content and structure. ACM Transactions on Information Systems 15(4), 400–435 (1997)
Article Google Scholar
Sacks-Davis, R., Arnold-Moore, T., Zobel, J.: Database systems for structured documents. In: International Symposium on Advanced Database Technologies and Their Integration, pp. 272–283 (1994)
Google Scholar
Salminen, A., Tompa, F.: Pat expressions: an algebra for text search. Acta Linguistica Hungar 41(1-4), 277–306 (1992)
Google Scholar
Tanaka, K., Tajima, K., Sogo, T., Pradhan, S.: Algebraic retrieval of fragmentarily indexed video. New Generation Computing 18(4), 359–374 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Kurashiki University of Science and the Arts, Kurashiki, Japan
Sujeet Pradhan
Kyoto University, Kyoto, Japan
Katsumi Tanaka

Authors

Sujeet Pradhan
View author publications
You can also search for this author in PubMed Google Scholar
Katsumi Tanaka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Zaragoza, Ciudad Universitaria, Plaza San Francisco, 50009, Zaragoza
Fernando Galindo
Seikei University, Japan
Makoto Takizawa
Institute of Informatics in Business and Government, University of Linz, Altenbergerstr. 69, 4040, Linz, Austria
Roland Traunmüller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pradhan, S., Tanaka, K. (2004). Retrieving Relevant Portions from Structured Digital Documents. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2004. Lecture Notes in Computer Science, vol 3180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30075-5_32

Download citation

DOI: https://doi.org/10.1007/978-3-540-30075-5_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22936-0
Online ISBN: 978-3-540-30075-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics