Skip to main content

Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2997))

Abstract

There is a common availability of classification terms in online text collections and digital libraries, such as manually assigned keywords or key-phrases from a controlled vocabulary in scientific collections. Our goal is to explore the use of additional classification information for improving retrieval effectiveness. Earlier research explored the effect of adding classification terms to user queries, leading to little or no improvement. We explore a new feedback technique that reranks the set of initially retrieved documents based on the controlled vocabulary terms assigned to the documents. Since we do not want to rely on the availability of special dictionaries or thesauri, we compute the meaning of controlled vocabulary terms based on their occurrence in the collection. Our reranking strategy significantly improves retrieval effectiveness in domain-specific collections. Experimental evaluation is done on the German GIRT and French Amaryllis collections, using the test-suite of the Cross-Language Evaluation Forum (CLEF).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lewis, D.D.: An evaluation of phrasal and clustered representations on a text categorization task. In: Belkin, N., Ingwersen, P., Pejtersen, A.M., Fox, E. (eds.) Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 37–50. ACM Press, New York (1992)

    Chapter  Google Scholar 

  2. Sparck Jones, K., Needham, R.: Automatic term classification and retrieval. Information Processing & Management 4, 91–100 (1968)

    Google Scholar 

  3. Sparck Jones, K.: Automatic Keyword Classification for Information Retrieval. Butterworth, London (1971)

    Google Scholar 

  4. Attar, R., Fraenkel, A.S.: Local feedback in full-text retrieval systems. Journal of the Association of Computing Machinery 24, 397–417 (1977)

    MATH  Google Scholar 

  5. Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: Frei, H.P., Harman, D., Schaübie, P., Wilkinson, R. (eds.) Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 4–11. ACM Press, New York (1996)

    Chapter  Google Scholar 

  6. Srinivasan, P.: Query expansion and MEDLINE. Information Processing & Management 34, 431–443 (1996)

    Article  Google Scholar 

  7. Hersh, W., Price, S., Donohoe, L.: Assessing thesaurus-based query expansion using the UMLS metathesaurus. In: Proceedings of the 2000 AMIA Annual Fall Symposium, pp. 344–348 (2000)

    Google Scholar 

  8. French, J.C., Powell, A.L., Gey, F., Perelman, N.: Exploiting a controlled vocabulary to improve collection selection and retrieval effectiveness. In: Proceedings of the tenth International Conference on Information and Knowledge Management, pp. 199–206. ACM Press, New York (2001)

    Chapter  Google Scholar 

  9. Gey, F.C., Jiang, H.: English-German cross-language retrieval for the GIRT collection–exploiting a multilingual thesaurus. In: Proceedings of the Eighth Text REtrieval Conference (TREC-8), National Institute for Standards and Technology, Washington, DC (1999)

    Google Scholar 

  10. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41, 391–407 (1990)

    Article  Google Scholar 

  11. CLEF: Cross language evaluation forum (2003), http://www.clef-campaign.org/

  12. Kluck, M., Gey, F.C.: The domain-specific task of CLEF - specific evaluation strategies in cross-language information retrieval. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 48–56. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  13. Schott, H. (ed.): Thesaurus Sozialwissenschaften. Informationszentrum Sozialwissenschaften, Bonn, vol. 2. Alphabetischer und systematischer Teil, Bände (2002)

    Google Scholar 

  14. Gower, J.C., Legendre, P.: Metric and euclidean properties of dissimilarity coefficients. Journal of Classification 3, 5–48 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  15. Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. Chapman & Hall, London (1994)

    MATH  Google Scholar 

  16. Monz, C., de Rijke, M.: Shallow morphological analysis in monolingual information retrieval for Dutch, German and Italian. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 262–277. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  17. Buckley, C., Singhal, A., Mitra, M.: New retrieval approaches using SMART: TREC 4. In: Harman, D.K. (ed.) The Fourth Text REtrieval Conference (TREC-4), National Institute for Standards and Technology, pp. 25–48. NIST Special Publication 500-236 (1996)

    Google Scholar 

  18. Hollink, V., Kamps, J., Monz, C., de Rijke, M.: Monolingual document retrieval for European languages. Information Retrieval 7, 33–52 (2004)

    Article  Google Scholar 

  19. Lee, J.H.: Combining multiple evidence from different properties of weighting schemes. In: Fox, E.A., Ingwersen, P., Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 180–188. ACM Press, New York (1995)

    Chapter  Google Scholar 

  20. Fox, E.A., Shaw, J.A.: Combination of multiple searches. In: Harman, D.K. (ed.) The Second Text REtrieval Conference (TREC-2), National Institute for Standards and Technology, pp. 243–252. NIST Special Publication 500-215 (1994)

    Google Scholar 

  21. Rocchio Jr., J.J.: Relevance feedback in information retrieval. In: Salton, G. (ed.) The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall Series in Automatic Computation, pp. 313–323. Prentice-Hall, Englewood Cliffs (1971)

    Google Scholar 

  22. Efron, B.: Bootstrap methods: Another look at the jackknife. Annals of Statistics 7, 1–26 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  23. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman and Hall, New York (1993)

    MATH  Google Scholar 

  24. Wilbur, J.: Non-parametric significance tests of retrieval performance comparisons. Journal of Information Science 20, 270–284 (1994)

    Article  Google Scholar 

  25. Savoy, J.: Statistical inference in retrieval effectiveness evaluation. Information Processing and Management 33, 495–512 (1997)

    Article  Google Scholar 

  26. Jijkoun, V., Kamps, J., Mishne, G., Monz, C., de Rijke, M., Schlobach, S., Tsur, O.: The University of Amsterdam at TREC 2003. In: TREC 2003 Working Notes, National Institute for Standards and Technology (2003)

    Google Scholar 

  27. Gauch, S., Wang, J.: A corpus analysis approach for automatic query expansion. In: Proceedings of the Sixth International Conference on Information and Knowledge Management, pp. 278–284. ACM Press, New York (1997)

    Google Scholar 

  28. Gauch, S., Wang, J., Rachakonda, S.M.: A corpus analysis approach for automatic query expansion and its extension to multiple databases. ACM Transactions on Information Systems (TOIS) 17, 250–269 (1999)

    Article  Google Scholar 

  29. Schütze, H., Pedersen, J.O.: A cooccurrence-based thesaurus and two applications to information retrieval. Information Processing & Management 3, 307–318 (1997)

    Article  Google Scholar 

  30. Jin, R., Si, L., Hauptman, A.G., Callan, J.: Language model for IR using collection information. In: Järvelin, K., Beaulieu, M., Baeza-Yates, R., Myaeng, S.H. (eds.) Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 419–420. ACM Press, New York (2002)

    Chapter  Google Scholar 

  31. Robertson, S.E., Walker, S., Beaulieu, M.: Experimentation as a way of life: Okapi at TREC. Information Processing & Management 36, 95–108 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kamps, J. (2004). Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary. In: McDonald, S., Tait, J. (eds) Advances in Information Retrieval. ECIR 2004. Lecture Notes in Computer Science, vol 2997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24752-4_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24752-4_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21382-6

  • Online ISBN: 978-3-540-24752-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics