Skip to main content

Processing Overlaps in Structured Text Retrieval

  • Reference work entry
  • First Online:
  • 14 Accesses

Synonyms

Controlling overlap; Removing overlap

Definition

In semi-structured text retrieval, processing overlap techniques are used to reduce the amount of overlapping (thus redundant) information returned to the user. The existence of redundant information in result lists is caused by the nested structure of semi-structured documents, where the same text fragment may appear in several of the marked up elements (see Fig. 1). In consequence, when retrieval systems perform a focused search on this type of document and use the marked up elements as retrieval objects, very often result lists contain overlapping elements. In retrieval applications where it is assumed that the user does not want to see the same information twice, it may be necessary to reduce or completely remove this overlap and return a ranked list of no overlapping elements. Thus, depending on the underlying user model and retrieval application, different processing overlap techniques are used in order to decide, given a...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Clarke CLA. Controlling overlap in content-oriented XML retrieval. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2008. p. 314–21.

    Google Scholar 

  2. Geva S. GPX – gardens point XML IR at INEX 2005. In: Proceedings of the 4th International Workshop of the Initiative for the Evaluation of XML Retrieval; 2006. p. 240–53.

    Google Scholar 

  3. Kazai G, Lalmas M, de Vries AP. The overlap problem in content-oriented XML retrieval evaluation. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2007. p. 72–9.

    Google Scholar 

  4. Mass Y, Mandelbrod M. Using the INEX environment as a test bed for various user models for XML retrieval. In: Proceedings of the 4th International Workshop of the Initiative for the Evaluation of XML Retrieval; 2006. p. 187–95.

    Google Scholar 

  5. Mihajlovi V, Ramírez G, Westerveld T, Hiemstra D, Blok HE, de Vries AP. TIJAH scratches INEX 2005: vague element selection, image search, overlap and relevance feedback. 2006. p. 72–87.

    Google Scholar 

  6. Sauvagnat K, Hlaoua L, Boughanem M. XFIRM at INEX 2005: ad-hoc and relevance feedback tracks. In: Proceedings of the 4th International Workshop of the Initiative for the Evaluation of XML Retrieval; 2006. p. 88–103.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgina Ramírez .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Ramírez, G. (2018). Processing Overlaps in Structured Text Retrieval. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_279

Download citation

Publish with us

Policies and ethics