Abstract
There is a common availability of classification terms in online text collections and digital libraries, such as manually assigned keywords or key-phrases from a controlled vocabulary in scientific collections. Our goal is to explore the use of additional classification information for improving retrieval effectiveness. Earlier research explored the effect of adding classification terms to user queries, leading to little or no improvement. We explore a new feedback technique that reranks the set of initially retrieved documents based on the controlled vocabulary terms assigned to the documents. Since we do not want to rely on the availability of special dictionaries or thesauri, we compute the meaning of controlled vocabulary terms based on their occurrence in the collection. Our reranking strategy significantly improves retrieval effectiveness in domain-specific collections. Experimental evaluation is done on the German GIRT and French Amaryllis collections, using the test-suite of the Cross-Language Evaluation Forum (CLEF).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Lewis, D.D.: An evaluation of phrasal and clustered representations on a text categorization task. In: Belkin, N., Ingwersen, P., Pejtersen, A.M., Fox, E. (eds.) Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 37–50. ACM Press, New York (1992)
Sparck Jones, K., Needham, R.: Automatic term classification and retrieval. Information Processing & Management 4, 91–100 (1968)
Sparck Jones, K.: Automatic Keyword Classification for Information Retrieval. Butterworth, London (1971)
Attar, R., Fraenkel, A.S.: Local feedback in full-text retrieval systems. Journal of the Association of Computing Machinery 24, 397–417 (1977)
Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: Frei, H.P., Harman, D., Schaübie, P., Wilkinson, R. (eds.) Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 4–11. ACM Press, New York (1996)
Srinivasan, P.: Query expansion and MEDLINE. Information Processing & Management 34, 431–443 (1996)
Hersh, W., Price, S., Donohoe, L.: Assessing thesaurus-based query expansion using the UMLS metathesaurus. In: Proceedings of the 2000 AMIA Annual Fall Symposium, pp. 344–348 (2000)
French, J.C., Powell, A.L., Gey, F., Perelman, N.: Exploiting a controlled vocabulary to improve collection selection and retrieval effectiveness. In: Proceedings of the tenth International Conference on Information and Knowledge Management, pp. 199–206. ACM Press, New York (2001)
Gey, F.C., Jiang, H.: English-German cross-language retrieval for the GIRT collection–exploiting a multilingual thesaurus. In: Proceedings of the Eighth Text REtrieval Conference (TREC-8), National Institute for Standards and Technology, Washington, DC (1999)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41, 391–407 (1990)
CLEF: Cross language evaluation forum (2003), http://www.clef-campaign.org/
Kluck, M., Gey, F.C.: The domain-specific task of CLEF - specific evaluation strategies in cross-language information retrieval. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 48–56. Springer, Heidelberg (2001)
Schott, H. (ed.): Thesaurus Sozialwissenschaften. Informationszentrum Sozialwissenschaften, Bonn, vol. 2. Alphabetischer und systematischer Teil, Bände (2002)
Gower, J.C., Legendre, P.: Metric and euclidean properties of dissimilarity coefficients. Journal of Classification 3, 5–48 (1986)
Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. Chapman & Hall, London (1994)
Monz, C., de Rijke, M.: Shallow morphological analysis in monolingual information retrieval for Dutch, German and Italian. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 262–277. Springer, Heidelberg (2002)
Buckley, C., Singhal, A., Mitra, M.: New retrieval approaches using SMART: TREC 4. In: Harman, D.K. (ed.) The Fourth Text REtrieval Conference (TREC-4), National Institute for Standards and Technology, pp. 25–48. NIST Special Publication 500-236 (1996)
Hollink, V., Kamps, J., Monz, C., de Rijke, M.: Monolingual document retrieval for European languages. Information Retrieval 7, 33–52 (2004)
Lee, J.H.: Combining multiple evidence from different properties of weighting schemes. In: Fox, E.A., Ingwersen, P., Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 180–188. ACM Press, New York (1995)
Fox, E.A., Shaw, J.A.: Combination of multiple searches. In: Harman, D.K. (ed.) The Second Text REtrieval Conference (TREC-2), National Institute for Standards and Technology, pp. 243–252. NIST Special Publication 500-215 (1994)
Rocchio Jr., J.J.: Relevance feedback in information retrieval. In: Salton, G. (ed.) The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall Series in Automatic Computation, pp. 313–323. Prentice-Hall, Englewood Cliffs (1971)
Efron, B.: Bootstrap methods: Another look at the jackknife. Annals of Statistics 7, 1–26 (1979)
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman and Hall, New York (1993)
Wilbur, J.: Non-parametric significance tests of retrieval performance comparisons. Journal of Information Science 20, 270–284 (1994)
Savoy, J.: Statistical inference in retrieval effectiveness evaluation. Information Processing and Management 33, 495–512 (1997)
Jijkoun, V., Kamps, J., Mishne, G., Monz, C., de Rijke, M., Schlobach, S., Tsur, O.: The University of Amsterdam at TREC 2003. In: TREC 2003 Working Notes, National Institute for Standards and Technology (2003)
Gauch, S., Wang, J.: A corpus analysis approach for automatic query expansion. In: Proceedings of the Sixth International Conference on Information and Knowledge Management, pp. 278–284. ACM Press, New York (1997)
Gauch, S., Wang, J., Rachakonda, S.M.: A corpus analysis approach for automatic query expansion and its extension to multiple databases. ACM Transactions on Information Systems (TOIS) 17, 250–269 (1999)
Schütze, H., Pedersen, J.O.: A cooccurrence-based thesaurus and two applications to information retrieval. Information Processing & Management 3, 307–318 (1997)
Jin, R., Si, L., Hauptman, A.G., Callan, J.: Language model for IR using collection information. In: Järvelin, K., Beaulieu, M., Baeza-Yates, R., Myaeng, S.H. (eds.) Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 419–420. ACM Press, New York (2002)
Robertson, S.E., Walker, S., Beaulieu, M.: Experimentation as a way of life: Okapi at TREC. Information Processing & Management 36, 95–108 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kamps, J. (2004). Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary. In: McDonald, S., Tait, J. (eds) Advances in Information Retrieval. ECIR 2004. Lecture Notes in Computer Science, vol 2997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24752-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-24752-4_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21382-6
Online ISBN: 978-3-540-24752-4
eBook Packages: Springer Book Archive