Skip to main content

Adapting the JIRS Passage Retrieval System to the Arabic Language

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4394))

Abstract

The need of having a Passage Retrieval (PR) system for Arabic texts is due essentially to our aim to build an Arabic Question Answering (QA) system in our research team. We have chosen working on the PR system to be our first step to pursue our aim because being the core component and its quality will affect directly the performance of the QA system. JAVA Information Retrieval System (JIRS) is a PR QA-oriented system, multi-platform, open source and free to use. JIRS uses an n-gram model and it is language-independent. It separates language configuration files to make easier its adaptation to any language. In this paper, we report the different challenges when adapting the JIRS to the Arabic language.In order to evaluate JIRS on Arabic, we had to develop an Arabic test-bed using the multilingual CLEF QA one as guideline. We also report the results obtained in our experiments where we retrieved Arabic passages with JIRS first without any text preprocessing and second performing a prior light-stemming on the documents of the test-bed. The preliminary results show that it is possible to obtain a first Arabic passage retrieval system adapting JIRS on pre-processed text with a light-stemmer.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aceves-Pérez, R.M., Villaseñor-Pineda, L., Montes-y-Gómez, M.: Using N-gram Models to Combine Query Translations in Cross-Language Question Answering. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Adriani, M., Rinawati: Finding Answers to Indonesian Questions from English Documents. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 510–516. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Amaral, C., Figueira, H., Martins, A., Mendes, A., Mendes, P., Pinto, C.: Priberams Question Answering System for Poteguese. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 410–419. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Bouma, G., Mur, J., Van Noord, G., Van Der Plas, L., Tiedemann, J.: Question Answering for Dutch Using Dependency Relations. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Burger, J., Cardie, C., Chaudhri, V., Gaizauskas, R., Harabagiu, S., Israel, D., Jacquemin, C., Lin, C., Maiorano, S., Miller, G., Moldovan11, D., Ogden, B., Prager, J., Riloff, E., Singhal, A., Shrihari, R., Strzalkowski1, T., Voorhees, E., Weishedel, R.: Issues, Tasks and Program Structures to Roadmap Research in Question & Answering (Q&A). Technical report, National Institute of Standards and Technology

    Google Scholar 

  6. Buscaldi, D., Gómez, J.M., Rosso, P., Sanchis, E.: The UPV at QA@CLEF 2006. In: Working Notes for the CLEF 2006 Workshop (2006)

    Google Scholar 

  7. Chen, A., Gey, F.C.: Building an Arabic Stemmer for Information Retrieval. In: Proceedings of the TREC 2002, p. 631 (2002)

    Google Scholar 

  8. Chu-Carroll, J., Czuba, K., Duboue, P., Prager, J.: IBM’s PIQUANT II in TREC2005. In: The Proceedings of the Fourteenth Text REtrieval Conference (2005)

    Google Scholar 

  9. Ferrés, D., Kanaan, S., González, E., Ageno, A., Rodríguez, H., Turmo, J.: The TALP-QA System for Spanish at CLEF 2005. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 400–409. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Ferrés, D., Rodríguez, H.: TALP at GeoCLEF-2006: Experiments Using JIRS and Lucene with the ADL Feature Type Thesaurus. In: Working Notes for the CLEF 2006 Workshop (2006)

    Google Scholar 

  11. Gillard, L., Sitbon, L., Blaudez, E., Bellot, P., El-Béze, M.: The LIA at QA@CLEF-2006. In: Working Notes for the CLEF 2006 Workshop (2006)

    Google Scholar 

  12. Gómez, J.M., Buscaldi, D., Bisbal-Asensi, E., Rosso, P., Sanchis, E.: QUASAR, The Question Answering System of the Universidad Politecnica de Valencia. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 439–448. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  13. Gómez, J.M., Montes-y-Gómez, M., Sanchis, E., Rosso, P.: A Passage Retrieval System for Multilingual Question Answering. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 443–450. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  14. Hammou, B., Abu-salem, H., Lytinen, S., Evens, M.: QARAB: A question answering system to support the Arabic language. In: The Proceedings of the workshop on computational approaches to Semitic languages, ACL, Philadelphia, pp. 55–65 (2002)

    Google Scholar 

  15. Harabagiu, S., Moldovan, D., Clark, C., Bowden, M., Hickl, A., Wang, P.: Employing Two Question Answering Systems in TREC-2005. In: The Proceedings of the Fourteenth Text REtrieval Conference (2005)

    Google Scholar 

  16. Hartrumpf, S.: Extending Knowledge and Deepening Linguistic Processing for the Question Answering System InSicht. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 361–369. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  17. Ittycheriah, A., Franz, M., Zhu, W.-J., Ratnaparkhi, A.: IBM’s Statistical Question Answering System. In: Proceedings of the Ninth Text Retrieval Conference (TREC-2002), pp. 229–234 (2002)

    Google Scholar 

  18. Laurent, D., Séguéla, P., Négre, S.: Cross Lingual Question Answering using QRISTAL for CLEF 2006. In: Working Notes for the CLEF 2006 Workshop (2006)

    Google Scholar 

  19. Leah, S., Larkey, A.J., Margaret, E., Connell, B.A., Wade, C.: UMass at TREC 2002: Cross Language and Novelty Tracks. In: The Proceedings of the TREC 2002, p. 721 (2002)

    Google Scholar 

  20. Lee, G.G., Seo, J., Lee, S., Jung, H., Cho, B.-H., Lee, C., Kwak, B.-K., Cha, J., Kim, D., An, J., Kim, H., Kim, K.: SiteQ: Engineering high performance QA system using lexico-semantic pattern matching and shallow NLP. In: Proceedings of the Tenth Text Retrieval Conference (TREC-2002), pp. 422–451 (2002)

    Google Scholar 

  21. Lee, Y., Papineni, K., Roukos, S., Emam, O., Hassan, H.: Language Model based Arabic Word Segmentation. In: The Proceedings of the 41st Annual Meeting on Association for Computational Linguistics

    Google Scholar 

  22. Llopis, F., Vicedo, J.L., Ferrandez, A.: Passage Selection to Improve Question Answering. In: Proceedings of the COLING 2002 Workshop on Multilingual Summarization and Question Answering (2002)

    Google Scholar 

  23. Mohammed, F.A., Nasser, K., Harb, H.M.: A knowledge based Arabic question answering system (AQAS). In: ACM SIGART Bulletin, pp. 21–33. ACM Press, New York (1993)

    Google Scholar 

  24. Montes-y-Gómez, M., Villaseñor-Pineda, L., Pérez-Coutiño, M., Gómez-Soriano, J.M., Sanchis, E., Rosso, P.: A Full Data-Driven System for Multiple Language Question Answering. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  25. Pérez-Coutiño, M., Montes-y-Gómez, M., López-López, A., Villaseñor-Pineda, L., Pancardo-Rodríguez, A.: A Shallow Approach for Answer Selection based on Dependency Trees and Term Density. In: Working Notes for the CLEF 2006 Workshop (2006)

    Google Scholar 

  26. Sun, R., Jiang, J., Fan Tan, Y., Cui, H., Chua, T., Kan, M.: Using Syntactic and Semantic Relation Analysis in Question Answering. In: The Proceedings of the Fourteenth Text REtrieval Conference (2005)

    Google Scholar 

  27. Tomlinson, S.: Experiments in Named Page Finding and Arabic Retrieval with Hummingbird SearchServerTM at TREC 2002. In: The Proceedings of the TREC 2002, p. 248 (2002)

    Google Scholar 

  28. Voorhees, E.: Over TREC 2005. In: The Proceeding of TREC 2005 (2005)

    Google Scholar 

  29. Xu, J., Fraser, A., Weischedel, R.: Empirical Studies in Strategies for Arabic Retrieval. In: The Proceedings of the 25th Annual Conference on Research and Development in Information Retrieval (ACM SIGIR), ACM Press, New York (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Benajiba, Y., Rosso, P., Gómez Soriano, J.M. (2007). Adapting the JIRS Passage Retrieval System to the Arabic Language. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70939-8_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70938-1

  • Online ISBN: 978-3-540-70939-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics