Skip to main content

Processing Overlaps in Structured Text Retrieval

  • Living reference work entry
  • First Online:
Encyclopedia of Database Systems
  • 21 Accesses

Synonyms

Controlling overlap; Removing overlap

Definition

In semi-structured text retrieval, processing overlap techniques are used to reduce the amount of overlapping (thus redundant) information returned to the user. The existence of redundant information in result lists is caused by the nested structure of semi-structured documents, where the same text fragment may appear in several of the marked up elements (see Fig. 1). In consequence, when retrieval systems perform a focused search on this type of document and use the marked up elements as retrieval objects, very often result lists contain overlapping elements. In retrieval applications where it is assumed that the user does not want to see the same information twice, it may be necessary to reduce or completely remove this overlap and return a ranked list of no overlapping elements. Thus, depending on the underlying user model and retrieval application, different processing overlap techniques are used in order to decide, given a...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Recommended Reading

  1. Clarke CLA. Controlling overlap in content-oriented XML retrieval. In: Proceeding 31st annual international ACM SIGIR conference on research and development in information retrieval. 2005. p. 314–21.

    Google Scholar 

  2. Geva S. GPX – gardens point XML IR at INEX 2005. In: Proceeding 4th international workshop of the initiative for the evaluation of XML retrievals. 2006. p. 240–53.

    Google Scholar 

  3. Kazai G, Lalmas M, de Vries AP. The overlap problem in content-oriented XML retrieval evaluation. In: Proceeding 30th annual international ACM SIGIR conference on research and development in information retrieval. 2004. p. 72–9.

    Google Scholar 

  4. Mass Y, Mandelbrod M. Using the INEX environment as a test bed for various user models for XML retrieval. In: Proceeding 4th international workshop of the initiative for the evaluation of XML retrievals. 2006. p. 187–95.

    Google Scholar 

  5. Mihajlovi V, Ramírez G, Westerveld T, Hiemstra D, Blok HE, de Vries AP. TIJAH scratches INEX 2005: vague element selection, image search, overlap and relevance feedback. 2006. p. 72–87.

    Google Scholar 

  6. Sauvagnat K, Hlaoua L, Boughanem M. XFIRM at INEX 2005: ad-hoc and relevance feedback tracks. In: Proceeding 4th international workshop of the initiative for the evaluation of XML retrievals. 2006. p. 88–103.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgina Ramírez .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media LLC

About this entry

Cite this entry

Ramírez, G. (2016). Processing Overlaps in Structured Text Retrieval. In: Liu, L., Özsu, M. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4899-7993-3_279-2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4899-7993-3_279-2

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, New York, NY

  • Online ISBN: 978-1-4899-7993-3

  • eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering

Publish with us

Policies and ethics