Assessing the Impact of Thesaurus-Based Expansion Techniques in QA-Centric IR

  • Luís Sarmento
  • Jorge Teixeira
  • Eugénio Oliveira
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5706)


We study the impact of using thesaurus-based query expansion methods at the Information Retrieval (IR) stage of a Question Answering (QA) system. We focus on expanding queries for questions regarding actions and events, where verbs have a central role. Two different thesaurus are used: the OpenOffice thesaurus and an automatically generated verb thesaurus. The performance of thesaurus-based methods is compared against what is obtained by (i) executing no expansion and (ii) applying a simple query generalization method. Results show that thesaurus-based approaches help improving recall at retrieval, while keeping satisfactory precision. However, we confirm that positive impact for the final QA performance is mostly achieved due to increase in recall, which can also be obtained by using simpler methods. Nevertheless, because of its better relative precision thesaurus-based expansion is effective in selectively reducing the number of irrelevant text passages retrieved, thus reducing computational load in the answer extraction stage.


Query Expansion Question Answering Statistical Machine Translation Text Passage Question Answering System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bilotti, M.W., Katz, B., Lin, J.: What works better for question answering: Stemming or morphological query expansion? In: Proceedings of the Information Retrieval for Question Answering (IR4QA) Workshop. SIGIR 2004, Sheffield, England (July 2004)Google Scholar
  2. 2.
    Sarmento, L., Teixeira, J., Oliveira, E.: Experiments with query expansion in the raposa (fox) question answering system. In: Borri, F., Nardi, A., Peters, C. (eds.) Working Notes for the CLEF 2008 Workshop, Aarhus, Denmark, September 17-19 (2008)Google Scholar
  3. 3.
    Tellex, S., Katz, B., Lin, J., Fern, A., Marton, G.: Quantitative evaluation of passage retrieval algorithms for question answering. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR, pp. 41–47. ACM Press, New York (2003)Google Scholar
  4. 4.
    Costa, L., Sarmento, L.: Component evaluation in a question answering system. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy (May 2006)Google Scholar
  5. 5.
    Monz, C.: Document retrieval in the context of question answering. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 571–579. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    Curtis, J., Matthews, G., Baxter, D.: On the effective use of cyc in a question answering system. In: IJCAI Workshop on Knowledge and Reasoning for Answering Questions (KRAQ 2005), Edinburgh, Scotland (2005)Google Scholar
  7. 7.
    Hovy, E., Gerber, L., Hermjakob, U., Junk, M., Lin, C.Y.: Question answering in webclopedia. In: Proceedings of the 9th Text REtrieval Conference, Gaithersburg, MD, USA, November 2000, pp. 655–664 (2000)Google Scholar
  8. 8.
    Negri, M.: Sense-based blind relevance feedback for question answering. In: SIGIR 2004 Workshop on Information Retrieval For Question Answering (IR4QA), Sheffield, UK (July 2004)Google Scholar
  9. 9.
    Riezler, S., Vasserman, A., Tsochantaridis, I., Mittal, V.O., Liu, Y.: Statistical machine translation for query expansion in answer retrieval. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, June 23-30 (2007)Google Scholar
  10. 10.
    Sarmento, L.: A first step to address biography generation as an iterative QA task. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730, pp. 473–482. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  11. 11.
    Sarmento, L., Oliveira, E.: Making RAPOSA (FOX) smarter. In: Nardi, A., Peters, C. (eds.) Working Notes of the Cross-Language Evaluation Forum (CLEF) Workshop 2007, Budapest, Hungary (September 2007)Google Scholar
  12. 12.
    Lin, D.: Automatic Retrieval and Clustering of Similar Words. In: Proceedings of COLING-ACL 1998, Montreal, vol. 2, pp. 768–773 (1998)Google Scholar
  13. 13.
    Sarmento, L.: BACO - A large database of text and co-occurrences. In: Calzolari, N., Choukri, K., Gangemi, A., Maegaard, B., Mariani, J., Odjik, J., Tapias, D. (eds.) Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy, May 22-28, pp. 1787–1790 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Luís Sarmento
    • 1
  • Jorge Teixeira
    • 1
  • Eugénio Oliveira
    • 1
  1. 1.Laboratorio de Inteligência Artificial e Ciências de ComputadoresFaculdade de Engenharia da Universidade do PortoPortoPortugal

Personalised recommendations