Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary

Kamps, Jaap

doi:10.1007/978-3-540-24752-4_21

Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary

Jaap Kamps⁶

Conference paper

810 Accesses
19 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2997))

Abstract

There is a common availability of classification terms in online text collections and digital libraries, such as manually assigned keywords or key-phrases from a controlled vocabulary in scientific collections. Our goal is to explore the use of additional classification information for improving retrieval effectiveness. Earlier research explored the effect of adding classification terms to user queries, leading to little or no improvement. We explore a new feedback technique that reranks the set of initially retrieved documents based on the controlled vocabulary terms assigned to the documents. Since we do not want to rely on the availability of special dictionaries or thesauri, we compute the meaning of controlled vocabulary terms based on their occurrence in the collection. Our reranking strategy significantly improves retrieval effectiveness in domain-specific collections. Experimental evaluation is done on the German GIRT and French Amaryllis collections, using the test-suite of the Cross-Language Evaluation Forum (CLEF).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lewis, D.D.: An evaluation of phrasal and clustered representations on a text categorization task. In: Belkin, N., Ingwersen, P., Pejtersen, A.M., Fox, E. (eds.) Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 37–50. ACM Press, New York (1992)
Chapter Google Scholar
Sparck Jones, K., Needham, R.: Automatic term classification and retrieval. Information Processing & Management 4, 91–100 (1968)
Google Scholar
Sparck Jones, K.: Automatic Keyword Classification for Information Retrieval. Butterworth, London (1971)
Google Scholar
Attar, R., Fraenkel, A.S.: Local feedback in full-text retrieval systems. Journal of the Association of Computing Machinery 24, 397–417 (1977)
MATH Google Scholar
Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: Frei, H.P., Harman, D., Schaübie, P., Wilkinson, R. (eds.) Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 4–11. ACM Press, New York (1996)
Chapter Google Scholar
Srinivasan, P.: Query expansion and MEDLINE. Information Processing & Management 34, 431–443 (1996)
Article Google Scholar
Hersh, W., Price, S., Donohoe, L.: Assessing thesaurus-based query expansion using the UMLS metathesaurus. In: Proceedings of the 2000 AMIA Annual Fall Symposium, pp. 344–348 (2000)
Google Scholar
French, J.C., Powell, A.L., Gey, F., Perelman, N.: Exploiting a controlled vocabulary to improve collection selection and retrieval effectiveness. In: Proceedings of the tenth International Conference on Information and Knowledge Management, pp. 199–206. ACM Press, New York (2001)
Chapter Google Scholar
Gey, F.C., Jiang, H.: English-German cross-language retrieval for the GIRT collection–exploiting a multilingual thesaurus. In: Proceedings of the Eighth Text REtrieval Conference (TREC-8), National Institute for Standards and Technology, Washington, DC (1999)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41, 391–407 (1990)
Article Google Scholar
CLEF: Cross language evaluation forum (2003), http://www.clef-campaign.org/
Kluck, M., Gey, F.C.: The domain-specific task of CLEF - specific evaluation strategies in cross-language information retrieval. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 48–56. Springer, Heidelberg (2001)
Chapter Google Scholar
Schott, H. (ed.): Thesaurus Sozialwissenschaften. Informationszentrum Sozialwissenschaften, Bonn, vol. 2. Alphabetischer und systematischer Teil, Bände (2002)
Google Scholar
Gower, J.C., Legendre, P.: Metric and euclidean properties of dissimilarity coefficients. Journal of Classification 3, 5–48 (1986)
Article MATH MathSciNet Google Scholar
Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. Chapman & Hall, London (1994)
MATH Google Scholar
Monz, C., de Rijke, M.: Shallow morphological analysis in monolingual information retrieval for Dutch, German and Italian. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 262–277. Springer, Heidelberg (2002)
Chapter Google Scholar
Buckley, C., Singhal, A., Mitra, M.: New retrieval approaches using SMART: TREC 4. In: Harman, D.K. (ed.) The Fourth Text REtrieval Conference (TREC-4), National Institute for Standards and Technology, pp. 25–48. NIST Special Publication 500-236 (1996)
Google Scholar
Hollink, V., Kamps, J., Monz, C., de Rijke, M.: Monolingual document retrieval for European languages. Information Retrieval 7, 33–52 (2004)
Article Google Scholar
Lee, J.H.: Combining multiple evidence from different properties of weighting schemes. In: Fox, E.A., Ingwersen, P., Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 180–188. ACM Press, New York (1995)
Chapter Google Scholar
Fox, E.A., Shaw, J.A.: Combination of multiple searches. In: Harman, D.K. (ed.) The Second Text REtrieval Conference (TREC-2), National Institute for Standards and Technology, pp. 243–252. NIST Special Publication 500-215 (1994)
Google Scholar
Rocchio Jr., J.J.: Relevance feedback in information retrieval. In: Salton, G. (ed.) The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall Series in Automatic Computation, pp. 313–323. Prentice-Hall, Englewood Cliffs (1971)
Google Scholar
Efron, B.: Bootstrap methods: Another look at the jackknife. Annals of Statistics 7, 1–26 (1979)
Article MATH MathSciNet Google Scholar
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman and Hall, New York (1993)
MATH Google Scholar
Wilbur, J.: Non-parametric significance tests of retrieval performance comparisons. Journal of Information Science 20, 270–284 (1994)
Article Google Scholar
Savoy, J.: Statistical inference in retrieval effectiveness evaluation. Information Processing and Management 33, 495–512 (1997)
Article Google Scholar
Jijkoun, V., Kamps, J., Mishne, G., Monz, C., de Rijke, M., Schlobach, S., Tsur, O.: The University of Amsterdam at TREC 2003. In: TREC 2003 Working Notes, National Institute for Standards and Technology (2003)
Google Scholar
Gauch, S., Wang, J.: A corpus analysis approach for automatic query expansion. In: Proceedings of the Sixth International Conference on Information and Knowledge Management, pp. 278–284. ACM Press, New York (1997)
Google Scholar
Gauch, S., Wang, J., Rachakonda, S.M.: A corpus analysis approach for automatic query expansion and its extension to multiple databases. ACM Transactions on Information Systems (TOIS) 17, 250–269 (1999)
Article Google Scholar
Schütze, H., Pedersen, J.O.: A cooccurrence-based thesaurus and two applications to information retrieval. Information Processing & Management 3, 307–318 (1997)
Article Google Scholar
Jin, R., Si, L., Hauptman, A.G., Callan, J.: Language model for IR using collection information. In: Järvelin, K., Beaulieu, M., Baeza-Yates, R., Myaeng, S.H. (eds.) Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 419–420. ACM Press, New York (2002)
Chapter Google Scholar
Robertson, S.E., Walker, S., Beaulieu, M.: Experimentation as a way of life: Okapi at TREC. Information Processing & Management 36, 95–108 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Language & Inference Technology Group ILLC, University of Amsterdam,
Jaap Kamps

Authors

Jaap Kamps
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing and Technology, David Goldman Informatics Centre, University of Sunderland, St. Peter’s Campus, SR6 0DD, Sunderland, UK
Sharon McDonald
School of Computing and Technology, University of Sunderland, St. Peter’s Campus, St. Peter’s Way, SR6 0DD, Sunderland, United Kingdom
John Tait

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kamps, J. (2004). Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary. In: McDonald, S., Tait, J. (eds) Advances in Information Retrieval. ECIR 2004. Lecture Notes in Computer Science, vol 2997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24752-4_21

Download citation

DOI: https://doi.org/10.1007/978-3-540-24752-4_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21382-6
Online ISBN: 978-3-540-24752-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics