Skip to main content

Experiments with N-Gram Prefixes on a Multinomial Language Model versus Lucene’s Off-the-Shelf Ranking Scheme and Rocchio Query Expansion (TEL@CLEF Monolingual Task)

  • Conference paper
Multilingual Information Access Evaluation I. Text Retrieval Experiments (CLEF 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6241))

Included in the following conference series:

  • 699 Accesses

Abstract

We describe our participation in the TEL@CLEF task of the CLEF 2009 ad-hoc track, where we measured the retrieval performance of LGTE, an index engine for Geo-Temporal collections which is mostly based on Lucene, together with extensions for query expansion and multinomial language modelling. We experiment an N-Gram stemming model to improve our last year experiments which consisted in combinations of query expansion, Lucene’s off-the-shelf ranking scheme and the ranking scheme based on multinomial language modeling. The N-Gram stemming model was based in a linear combination of N-Grams, with N between 2 and 5, using weight factors obtained by learning from last year topics and assessments. The Rocchio ranking function was also adapted to implement this N-Gram model. Results show that this stemming technique together with query expansion and multinomial language modeling both result in increased performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Porter, M.F.: An algorithm for suffix stripping. In: Sparck Jones, K., Willett, P. (eds.) Readings in Information Retrieval, pp. 313–316. Morgan Kaufmann, San Francisco (1980)

    Google Scholar 

  2. Hiemstra, D.: Using Language Models for Information Retrieval: Ph.D. Thesis, Centre for Telematics and Information Technology, University of Twente (2001)

    Google Scholar 

  3. Rocchio, J.J.: Relevance Feedback in Information Retrieval. In: The SMART Retrieval System. Experiments in Automatic Document Processing, Prentice Hall, Englewood Cliffs (1971)

    Google Scholar 

  4. Machado, J.: Mitra: A Metadata Aware Web Search Engine for Digital Libraries: M.Sc. Thesis, Departamento de Engenharia Informática, Technical University of Lisbon (2008)

    Google Scholar 

  5. Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, CIKM 2004, Washington, D.C., USA, November 08 - 13, pp. 42–49. ACM, New York (2004)

    Chapter  Google Scholar 

  6. Ahn, D.D., Azzopardi, L., Balog, K., Fissaha, A.S., Jijkoun, V., Kamps, J., Müller, K., de Rijke, M., Sang, E.T.K.: The University of Amsterdam at TREC 2005: Working Notes for the 2005 Text Retrieval Conference (2005)

    Google Scholar 

  7. Pedrosa, G., Luzio, J., Manguinhas, H., Martins, B.: DIGMAP: A service for searching and browsing old maps. In: Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2008, Pittsburgh PA, PA, USA, June 16 - 20, p. 431. ACM, New York (2008)

    Chapter  Google Scholar 

  8. Machado, J., Martins, B., Borbinha, J.: LGTE: Lucene Extensions for Geo-Temporal Information Retrieval. In: European Conference on Information Retrieval, at Workshop on Geographic Information on Internet, Toulouse (April 2009)

    Google Scholar 

  9. Parapar, J., Freire, A., Barreiro, Á.: Revisiting N-gram Based Models for Retrieval in Degraded Large Collections. In: European Conference on Information Retrieval, Toulouse (April 2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Machado, J., Martins, B., Borbinha, J. (2010). Experiments with N-Gram Prefixes on a Multinomial Language Model versus Lucene’s Off-the-Shelf Ranking Scheme and Rocchio Query Expansion (TEL@CLEF Monolingual Task). In: Peters, C., et al. Multilingual Information Access Evaluation I. Text Retrieval Experiments. CLEF 2009. Lecture Notes in Computer Science, vol 6241. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15754-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15754-7_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15753-0

  • Online ISBN: 978-3-642-15754-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics