Skip to main content

Distributed IR for Digital Libraries

  • Conference paper
Book cover Research and Advanced Technology for Digital Libraries (ECDL 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2769))

Included in the following conference series:

Abstract

This paper examines technology developed to support large-scale distributed digital libraries. We describe the method used for harvesting collection information using standard information retrieval protocols and how this information is used in collection ranking and retrieval. The system that we have developed takes a probabilistic approach to distributed information retrieval using a Logistic regression algorithm for estimation of distributed collection relevance and fusion techniques to combine multiple sources of evidence. We discuss the harvesting method used and how it can be employed in building collection representatives using features of the Z39.50 protocol. The extracted collection representatives are ranked using a fusion of probabilistic retrieval methods. The effectiveness of our algorithm is compared to other distributed search methods using test collections developed for distributed search evaluation. We also describe how this system in currently being applied to operational systems in the U.K.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Buckland, M.K., Plaunt, C.: Selecting libraries, selecting documents, selecting data. In: Proceedings of the International Symposium on Research, Development & Practice in Digital Libraries 1997, ISDL 1997, Tsukuba, Japan, Novomber 18-21, pp. 85–91 (1997) University of Library and Information Science.

    Google Scholar 

  2. Callan, J.: Distributed information retrieval. In: Croft, W.B. (ed.) Advances in Information Retrieval: Recent research from the Center for Intelligent Information Retrieval, ch. 5, pp. 127–150. Kluwer, Boston (2000)

    Google Scholar 

  3. Cooper, W.S., Gey, F.C., Chen, A.: Full text retrieval based on a probabilistic equation with coefficients fitted by logistic regression. In: Harman, D. K. (ed.) The Second Text Retrieval Conference (TREC-2), pp. 57–66, Gaithersburg, MD, NIST (1994)

    Google Scholar 

  4. French, J.C., Powell, A.L., Callan, J.P., Viles, C.L., Emmitt, T., Prey, K.J., Mou, Y.: Comparing the performance of database selection algorithms. In: SIGIR 1999, pp. 238–245 (1999)

    Google Scholar 

  5. French, J.C., Powell, A.L., Viles, C.L., Emmitt, T., Prey, K.J.: Evaluating database selection techniques: A testbed and experiment. In: SIGIR 1998, pp. 121–129 (1998)

    Google Scholar 

  6. Gravano, L., García-Molina, H., Tomasic, A.: GlOSS: text-source discovery over the Internet. ACM Transactions on Database Systems 24(2), 229–264 (1999)

    Article  Google Scholar 

  7. Callan, J., Connell, M.: Query-based sampling of text databases. Technical report, Center for Intelligent Information Retrieval, Dept. of Computer Science, University of Massachusetts (1999) Technical Report IR-180

    Google Scholar 

  8. Larson, R.R.: Distributed resource discovery: Using Z39.50 to build cross-domain information servers. In: JCDL 2001, pp. 52–53. ACM Press, New York (2001)

    Chapter  Google Scholar 

  9. Larson, R.R.: Cheshire II at INEX: Using a hybrid logistic regression and boolean model for XML retrieval. In: Proceedings of the First Annual Workshop of the Initiative for the Evaluation of XML retrieval (INEX), page IN PRESS. DELOS workshop series (2003)

    Google Scholar 

  10. Lin, Y., Xu, J., Lim, E.-P., Ng, W.-K.: Zbroker: A query routing broker for z39.50 databases (1999)

    Google Scholar 

  11. Powell, A.L.: Database Selection in Distributed Information Retrieval: A Study of Multi-Collection Information Retrieval. PhD thesis, University of Virginia, Virginia (2001)

    Google Scholar 

  12. Varian, H., Lyman, P.: How much information? (2002), Available as http://sims.berkeley.edu/research/projects/how-much-info/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Larson, R.R. (2003). Distributed IR for Digital Libraries. In: Koch, T., Sølvberg, I.T. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2003. Lecture Notes in Computer Science, vol 2769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45175-4_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45175-4_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40726-3

  • Online ISBN: 978-3-540-45175-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics