Advertisement

Journal of Intelligent Information Systems

, Volume 34, Issue 2, pp 113–134 | Cite as

Answering questions with an n-gram based passage retrieval engine

  • Davide Buscaldi
  • Paolo Rosso
  • José Manuel Gómez-Soriano
  • Emilio Sanchis
Article

Abstract

In this paper, we present a Question Answering system based on redundancy and a Passage Retrieval method that is specifically oriented to Question Answering. We suppose that in a large enough document collection the answer to a given question may appear in several different forms. Therefore, it is possible to find one or more sentences that contain the answer and that also include tokens from the original question. The Passage Retrieval engine is almost language-independent since it is based on n-gram structures. Question classification and answer extraction modules are based on shallow patterns.

Keywords

Question answering Information retrieval and extraction Passage retrieval 

Notes

Acknowledgements

We would like to thank the TIN2006-15265-C06-04 research project for partially supporting this work.

References

  1. Abney, S., Collins, M., & Singhal, A. (2000). Answer extraction. In Proceedings of the sixth conference on applied natural language processing, applied natural language conferences (pp. 296–301). Seattle, Washington: Morgan Kaufmann Publishers.Google Scholar
  2. Aceves, R., Villaseñor, L., & Montes, M. (2005). Towards a multilingual QA system based on the web data redundancy. In AWIC, 2005 (pp. 32–37). Lodz, Poland.Google Scholar
  3. Ahn, K., Alex, B., Bos, J., Dalmas, T., Leidner, J. L., & Smillie, M. B. (2005). Cross-lingual question answering using off-the-shelf machine translation. In Multilingual information access for text, speech and images, LNCS (Vol. 3491, pp. 446–457). Springer.Google Scholar
  4. Aunimo, L., Kuuskoski, R., & Makkonen, J. (2005). Finnish as source language in bilingual question answering. In Multilingual information access for text, speech and images, LNCS (Vol. 3491, pp. 482–493). Springer.Google Scholar
  5. Benajiba, Y., Rosso, P., & Gómez, J. M. (2007). Adapting JIRS passage retrieval system to the Arabic. In Proc. 8th int. conf. on comput. linguistics and intelligent text processing, CICLing-2007, LNCS (Vol. 4394, pp. 530–541). Springer.Google Scholar
  6. Bilotti, M. W., Ogilvie, P., Callan, J., & Nyberg, E. (2007). Structured retrieval for question answering. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’07), 23–27 July 2007 (pp. 351–358). Amsterdam, The Netherlands: ACM.Google Scholar
  7. Brill, E., Lin, J., Banko, M., Dumais, S. T., & Ng, A. Y. (2001). Data-intensive question answering. In Proceedings of the 10th text retrieval conference (TREC-10) (pp. 393–400). Gaithersburg, Maryland.Google Scholar
  8. Buchholz, S. (2001). Using grammatical relations, answer frequencies and the World Wide Web for TREC question answering. In Proceedings of the 10th text retrieval conference (TREC-10) (pp. 502–506). Gaithersburg, Maryland.Google Scholar
  9. Cao, J., Roussinov, D., Robles-Flores, J. A., & Nunamaker, J. F., Jr. (2005). Automated question answering from lecture videos: NLP vs. pattern matching. In Proceedings of the 38th Hawaii international conference on system sciences (HICSS 2005). Big Island, Hawaii, USA: IEEE Computer Society.Google Scholar
  10. Clarke, C., Cormack, G., & Lynam, T. (2001). Exploiting redundancy in question answering. In 24th ACM SIGIR conference (pp. 358–365).Google Scholar
  11. Del Castillo, A., Gómez, M. M., & Villaseñor-Pineda, L. (2004). QA on the web: A preliminary study for Spanish language. In Proceedings of the fifth Mexican international conference in computer science (ENC’04) (pp. 322–328). Colima, Mexico.Google Scholar
  12. Giménez, J., & Márquez, L. (2004). SVMTool: A general POS Tagger generator based on support vector machines. In Proceedings of 4th LREC. Lisbon, Portugal.Google Scholar
  13. Gómez, J. M., Buscaldi, D., Bisbal, E., Sanchis, E., & Rosso, P. (2005). A multilingual question answering system using an n-grams based passage retrieval. In Proc. workshop on natural language processing for information retrieval, 2nd Indian int. conf. on artificial intelligence (IICAI-2005) (pp. 686–672). Pune, India.Google Scholar
  14. Gómez, J. M., Buscaldi, D., Rosso, P., & Sanchis, E. (2007a). JIRS Language-independent Passage Retrieval system: A comparative study. In Proc. 5th int. conf. on natural language processing (ICON-2007), 4–6 January. Hyderabad, India.Google Scholar
  15. Gómez, J. M., Rosso, P., & Sanchis, E. (2007b). Re-ranking of Yahoo snippets with the JIRS Passage Retrieval system. In Proc. workshop on cross lingual information access (CLIA-2007), 20th int. joint conf. on artificial intelligence (IJCAI-07), 6–12 January 2007. Hyderabad, India.Google Scholar
  16. Greenwood, M. A. (2004). Using pertainyms to improve passage retrieval for questions requesting information about a location. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2004). Sheffield, UK.Google Scholar
  17. Hacioglu, K., & Ward, W. (2003). Question classification with support vector machines and error correcting codes. In Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology: Companion volume of the proceedings of HLT-NAACL 2003–Short papers - Volume 2 (Edmonton, Canada, May 27– June 1, 2003) (pp. 28–30). North American Chapter Of The Association For Computational Linguistics. Association for Computational Linguistics, Morristown, NJ. doi: 10.3115/1073483.1073493.
  18. Hermjakob, U. (2001). Parsing and question classification for question answering. In Proceedings of the ACL 2001 workshop on open-domain question answering (pp. 17–22). Toulouse, France.Google Scholar
  19. Hess, M. (1996). The 1996 international conference on tools with artificial intelligence (TAI 96). In Proc. conference on research and development in information retrieval (SIGIR 1996). Zürich, Switzerland.Google Scholar
  20. Hovy, E., Gerber, L., Hermjakob, U., Junk, M., & Lin, C. (2000). Question answering in webclopedia. In Proceedings of the ninth text retrieval conference (TREC-9). Gaithersburg, Maryland.Google Scholar
  21. Juárez, A., Téllez, A., Delicia, C., Montes, M., Villaseñor, L. (2007). Using machine learning and text mining in question answering. In 7th workshop of the cross-language evaluation forum (CLEF 2006), LNCS (Vol. 4730). Springer 2007.Google Scholar
  22. Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics, Doklady, 10, 707–710.MathSciNetGoogle Scholar
  23. Li, X., & Roth, D. (2002). Learning question classifiers. In Proc. international conference on computational linguistics (COLING 2002). Taipei, Taiwan.Google Scholar
  24. Liu, X., & Croft, W. (2002). Passage retrieval based on language models. In Proceedings of the eleventh international conference on information and knowledge management (CIKM 02) (pp. 375–382). McLean, Virginia.Google Scholar
  25. Llopis, F., & Vicedo, J. L. (2002). IR-n: A passage retrieval system at CLEF-2001. Revised papers from the second workshop of the cross-language evaluation forum on evaluation of cross-language information retrieval systems (September 03–04, 2001). In C. Peters, M. Braschler, J. Gonzalo, & M. Kluck (Eds.) Lecture notes in computer science (Vol. 2406, pp. 244–252). London: Springer.Google Scholar
  26. Magnini, B., Negri, M., Prevete, R., & Tanev, H. (2001). Multilingual question/answering: The DIOGENE system. In Proceedings of the 10th text retrieval conference (TREC-10). Gaithersburg, Maryland.Google Scholar
  27. Magnini, B., Vallin, S., Ayache, C., Erbach, G., Peñas, A., De Rijke, M., et al. (2005). Overview of the CLEF 2004 multilingual question answering track. In Multilingual information access for text, speech and images, LNCS (Vol. 3491, pp. 371–391). Springer 2005.Google Scholar
  28. Magnini, B., Giampiccolo, D., Forner, P., Ayache, C., Osenova, P., Peñas, A., et al. (2007). Overview of the CLEF 2006 multilingual question answering track. In Evaluation of multilingual and multi-modal information retrieval, LNCS (Vol. 4730, pp. 223–256). Springer.Google Scholar
  29. Moldovan, D. I., Pasca, M., Harabagiu, S. M., & Surdeanu, M. (2003). Performance issues and error analysis in an open-domain question answering system. ACM Transactions on Information Systems, 21, 133–154. doi: 10.1145/763693.763694.CrossRefGoogle Scholar
  30. Narayanan, S., & Harabagiu, S. (2004). Question answering based on semantic structures, international conference on computational linguistics (COLING 2004) (pp. 693–702). Geneva, Switzerland.Google Scholar
  31. Neumann, G., & Sacaleanu, B. (2005). Experiments on robust nl question interpretation and multi-layered document annotation for a cross-language question/answering system. In Multilingual information access for text, speech and images, LNCS (Vol. 3491, pp. 411–422). Springer 2005.Google Scholar
  32. Pérez, M., Montes, M., López, A., & Villaseñor, L. (2006) The role of lexical features in question answering for Spanish. In Accessing multilingual information repositories: 6th workshop of the cross-language evaluation forum, CLEF 2005, LNCS (Vol. 4022). Revised Selected Papers. Springer 2006.Google Scholar
  33. Roberts, I., & Gaizauskas, R. J. (2004). Evaluating passage retrieval approaches for question answering. In Advances in information retrieval, 26th European conference on IR research (ECIR 2004) (pp. 72–84). Sunderland, UK.Google Scholar
  34. Robertson, E., Walker, S., & Beaulieu, M. (2000). Experimentation as a way of life: Okapi at TREC. Information Processing & Management, 36(1), 95–108. doi: 10.1016/S0306-4573(99)00046-1.CrossRefGoogle Scholar
  35. Roussinov, D., Fan, W., & Robles-Flores, J. (2008). Beyond keywords: Automated question answering on the web. Communications of the ACM, 51(9), 60–65. doi: 10.1145/1378727.1378743.CrossRefGoogle Scholar
  36. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523. doi: 10.1016/0306-4573(88)90021-0.CrossRefGoogle Scholar
  37. Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the conference on new methods in language processing. Manchester, UK.Google Scholar
  38. Vallin, S., Magnini, B., Giampiccolo, D., Aunimo, L., Ayache, C., Osenova, P., et al. (2006). Overview of the CLEF 2005 multilingual question answering track. In Accessing multilingual information repositories, LNCS (Vol. 4022, pp. 307–331). Springer 2006.Google Scholar
  39. Vicedo, J. L., Izquierdo, R., Llopis, F., & Munoz, R. (2003). Question answering in Spanish. In Working notes of the Cross-Lingual Evaluation Forum (CLEF 2003). Trondheim, Norway.Google Scholar
  40. Voorhees, E.M. (1999). The TREC-8 question answering track report. In Proceedings of the eighth text retrieval conference (TREC-8). Gaithersburg, Maryland.Google Scholar
  41. Voorhees, E. M. (2000). Overview of the TREC-9 question answering track. In Proceedings of the ninth text retrieval conference (TREC-9). Gaithersburg, Maryland.Google Scholar
  42. Voorhees, E. M. (2001) Overview of TREC 2001. In Proceedings of the tenth text retrieval conference (TREC-10). Gaithersburg, Maryland.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Davide Buscaldi
    • 1
  • Paolo Rosso
    • 1
  • José Manuel Gómez-Soriano
    • 2
  • Emilio Sanchis
    • 1
  1. 1.ELiRF Research Group - Departamento de Sistemas Informáticos y ComputaciónUniversidad Politécnica de ValenciaValenciaSpain
  2. 2.GPLSI Research Group - Departamento de Lenguajes y Sistemas InformáticosUniversidad de AlicanteAlicanteSpain

Personalised recommendations