Abstract
This paper addresses the impact of structure on terms weighting function in the context of focused Information Retrieval (IR). Our model considers a certain kind of structural information: tags that represent logical structure (title, section, paragraph, etc.) and tags related to formatting (bold, italic, center, etc.). We take into account the tags influence by estimating the probability that a tag distinguishes relevant terms. This weight is integrated in the terms weighting function. Experiments on a large collection during INEX 2008 IR competition showed improvements for focused retrieval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Denoyer, L., Gallinari, P.: The wikipedia XML corpus. In: SIGIR forum, vol. 40, pp. 64–69 (2006)
Fuller, M., Mackie, E., Sacks-Davis, R., Wilkinson, R.: Coherent answers for a large structured document collection. In: SIGIR, pp. 204–213 (1993)
Géry, M., Largeron, C., Thollard, F.: Integrating structure in the probabilistic model for information retrieval. In: Web Intelligence, pp. 763–769 (2008)
Kamps, J., Pehcevski, J., Kazai, G., Lalmas, M., Robertson, S.: INEX 2007 evaluation measures. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 24–33. Springer, Heidelberg (2008)
Rapela, J.: Automatically combining ranking heuristics for html documents. In: WIDM, pp. 61–67 (2001)
Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: CIKM, New York, USA, pp. 42–49 (2004)
Robertson, S.E., Sparck Jones, K.: Relevance weighting of search terms. JASIST 27(3), 129–146 (1976)
Trotman, A.: Choosing document structure weights. IPM 41(2), 243–264 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Géry, M., Largeron, C., Thollard, F. (2009). UJM at INEX 2008: Pre-impacting of Tags Weights. In: Geva, S., Kamps, J., Trotman, A. (eds) Advances in Focused Retrieval. INEX 2008. Lecture Notes in Computer Science, vol 5631. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03761-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-03761-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03760-3
Online ISBN: 978-3-642-03761-0
eBook Packages: Computer ScienceComputer Science (R0)