Skip to main content

DocBase – The INEX Evaluation Experience

  • Conference paper
Advances in XML Information Retrieval (INEX 2004)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3493))

Abstract

Can a system designed primarily for the purpose of database-type storage and retrieval be used for information-retrieval tasks? This was one of the questions that led us to participate in the INEX 2004 initiative. DocBase, a prototype database system developed initially for SGML, and adapted to work with XML, was used for the purpose of answering the queries. DocBase uses DSQL, an adaptation of SQL to provide a mechanism for querying XML using existing database and indexing technologies. The INEX evaluation experience was encouraging - although it did show the limitations of database query languages for classic information retrieval tasks, it also demonstrated that several interesting results can be obtained by using database query languages for information retrieval, especially for queries involving both content and structure. Our results demonstrate the adaptability and scalability of a database system for processing IR queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Salton, G.: Developments in automatic text retrieval. Science 253, 974–980 (1991)

    Article  MathSciNet  Google Scholar 

  2. Tomasic, A., Garcia-Molina, H., Shoens, K.: Incremental updates of inverted lists for text document retrieval. SIGMOD RECORD 23, 289–300 (1994)

    Article  Google Scholar 

  3. Gonnet, G.H., Baeza-Yates, R.: Lexicographical indices for text: Inverted files vs pat trees. Technical Report TR-OED-91-01, University of Waterloo (1991)

    Google Scholar 

  4. Sengupta, A.: DocBase - A Database Environment for Structured Documents. PhD thesis, Indiana University (1997)

    Google Scholar 

  5. Sengupta, A., Dillon, A.: Query by templates: A generalized approach for visual query formulation for text dominated databases. In: Aho, A. (ed.) Proceedings: Symposium on Advanced Digital Libraries, Library of Congress, pp. 36–47. IEEE Computer Scociety Press, Washington (1997)

    Google Scholar 

  6. Sengupta, A., Dalkilic, M.: DSQL - an SQL for structured documents. In: Pidduck, A.B., Mylopoulos, J., Woo, C.C., Ozsu, M.T. (eds.) CAiSE 2002. LNCS, vol. 2348, pp. 757–760. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  7. Carey, M.J., DeWitt, D.J., Frank, D., Graefe, G., Muralikrishna, M., Richardson, J.E., Shikita, E.J.: The architecture of the EXODUS extensible DBMS. In: Dittrich, K.R., Dayal, U. (eds.) Proceedings, 1996 International Workshop on Object-Oriented Database Ssytems, Pacific Grove, California, USA, pp. 52–65. IEEE-CS, Los Alamitos (1986)

    Google Scholar 

  8. Open Text Corporation Waterloo, Ontario, Canada: Open Text 5.0 (1994)

    Google Scholar 

  9. Layman, A.: Element-normal form for serializing graphs of data in XML. In: Bosworth, A., Layman, A., Rys, M. (eds.) Europe 1999, Granada, April 1999. Based in part on an earlier paper, Serializing Graphs of Data in XML (1999)

    Google Scholar 

  10. Schmidt, A.R., Waas, F., Kersten, M.L., Carey, M.J., Manolescu, I., Busse, R.: XMark: A Benchmark for XML Data Management. In: Proceedings of the International Conference on Very Large Data Bases, VLDB (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mohan, S., Sengupta, A. (2005). DocBase – The INEX Evaluation Experience. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds) Advances in XML Information Retrieval. INEX 2004. Lecture Notes in Computer Science, vol 3493. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424550_21

Download citation

  • DOI: https://doi.org/10.1007/11424550_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26166-7

  • Online ISBN: 978-3-540-32053-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics