Skip to main content

Text Segmentation for Efficient Information Retrieval

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2002)

Abstract

Previous works in Information Retrieval show that using pieces of text obtain better results than using the whole document as the basic unit to compare with the user’s query. This kind of IR systems is usually called Passage Retrieval (PR). However, there is not a general agreement about how one should define those pieces of text (also known as passages), in order to obtain an optimum performance. This paper proposes a PR system based on a novel selection of variable size passages. It presents an evaluation that shows better results than a standard IR system and several well-known PR systems.

This paper has been partially supported by the Spanish Government (CICYT) project number TIC2000-0664-C02-02.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Callan, J. Passage-Level Evidence in Document Retrieval. In Proceedings of the 17 th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, 1994, pp. 302–310.

    Google Scholar 

  2. Hearst, M. and Plaunt, C. Subtopic structuring for full-length document access. Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, June 1993, Pittsburgh, PA, pp 59–68

    Google Scholar 

  3. Kaskiel, M. and Zobel, J. Passage Retrieval Revisited SIGIR’97: Proceedings of the 20th Annual International ACM July, 1997, Philadelphia, PA, USA, pp 27–31

    Google Scholar 

  4. KaszKiel, M. and Zobel, J. Effective Ranking with Arbitrary Passages. Journal of the American Society for Information Science, Vol 52, No. 4, February 2001, pp 344–364.

    Article  Google Scholar 

  5. Kaszkiel, M.; Zobel, J. and. Sacks-Davis, R.. Efficient Passage Ranking for Document Databases. ACM transactions on Information Systems, Vol 17, N° 4, October 1999, pp 406–439

    Article  Google Scholar 

  6. Llopis, F. and Vicedo, J. Ir-n system, a passage retrieval system at CLEF 2001 Working Notes for the Clef 2001 Darmstdt, Germany, pp 115–120

    Google Scholar 

  7. Namba, I Fujitsu Laboratories TREC9 Report. Proceedings of the Tenth Text REtrieval Conference, TREC-10. Gaithersburg,USA. November 2001, pp 203–208

    Google Scholar 

  8. Salton G. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison Wesley Publishing, New York. 1989

    Google Scholar 

  9. Salton, G.; Allan, J. Buckley Approaches to passage retrieval in full text information systems. In R Korfhage, E Rasmussen & P Willet (Eds.) Prodeedings of the 16 th annual international ACM-SIGIR conference on research and development in information retrieval. Pittsburgh PA, pp 49–58

    Google Scholar 

  10. Singhal, A.; Buckley, C. and Mitra, M. Pivoted document length normalization. Proceedings of the 19th annual international ACM-SIGIR conference on research and development in information retrieval, 1996.

    Google Scholar 

  11. Venner, G. and Walker, S. Okapi’ 84: ‘Best match’ system. Microcomputer networking in libraries II. Vine, 48,1983, pp 22–26.

    Google Scholar 

  12. Zprise developed by Darrin Dimmick (NIST) Available on demand at http://itl.nist.gov./iaui/894.02/works/papers/zp2/zp2.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Llopis, F., Ferrández, A., Vicedo, J.L. (2002). Text Segmentation for Efficient Information Retrieval. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2002. Lecture Notes in Computer Science, vol 2276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45715-1_39

Download citation

  • DOI: https://doi.org/10.1007/3-540-45715-1_39

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43219-7

  • Online ISBN: 978-3-540-45715-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics