Advertisement

Cross Language Experiments at Persian@CLEF 2008

  • Abolfazl AleAhmad
  • Ehsan Kamalloo
  • Arash Zareh
  • Masoud Rahgozar
  • Farhad Oroumchian
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5706)

Abstract

In this study we will discuss our cross language text retrieval experiments of Persian ad hoc track at CLEF 2008. Two teams from University of Tehran were involved in cross language text retrieval part of the track using two different CLIR approaches that are query translation and document translation. For query translation we use a method named Combinatorial Translation Probability (CTP) calculation for estimation of translation probabilities. In the document translation part, we use the Shiraz machine translation system for translation of documents into English. Then we create a Hybrid CLIR system by score-based merging of the two retrieval system results. In addition, we investigated N-grams and a light stemmer in our monolingual experiments.

Keywords

Persian English cross language Farsi bilingual text retrieval 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Taghva, K., Coombs, J., Pareda, R., Nartker, T.: Language Model-Based Retrieval for Persian Documents. In: International Conference on Information Technology: Coding and Computing, ITCC 2004 (2004)Google Scholar
  2. 2.
    Aleahmad, A., Amiri, H., Rahgozar, M., Oroumchian, F.: Hamshahri: a standard persian text collection. Journal of Knowledge-based systems (2008) (submitted)Google Scholar
  3. 3.
    Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V.: Evaluating Systems for Multilingual and Multimodal Information Access. In: 9th Workshop of the Cross-Language Evaluation Forum, Aarhus, Denmark (2008)Google Scholar
  4. 4.
    Apache Lucene project, http://lucene.apache.org/ (cited September 1, 2008)
  5. 5.
    Lemur Toolkit, http://www.lemurproject.org/ (cited September 1, 2008)
  6. 6.
    Tashakori, M., Meybodi, M.R., Oroumchian, F.: Bon: The Persian Stemmer. In: Shafazand, H., Tjoa, A.M. (eds.) EurAsia-ICT 2002. LNCS, vol. 2510, pp. 487–494. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Mokhtaripour, A., Jahanpour, S.: Introduction to a new Farsi stemmer. In: 15th ACM international conference on Information and knowledge management, Arlington, Virginia, USA, pp. 826–827 (2006)Google Scholar
  8. 8.
    Amiri, H., Hojjat, H., Oroumchian, F.: Investigation on a Feasible Corpus for Persian POS Tagging. In: 12th international CSI computer conference, Tehran, Iran (2007)Google Scholar
  9. 9.
    Bijankhan Corpus, http://ece.ut.ac.ir/dbrg/bijankhan/ (cited September 1, 2008)
  10. 10.
    Pirkola, A.: The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In: 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 55–63. ACM Press, New York (1998)Google Scholar
  11. 11.
    Darwish, K., Oard, D.W.: Probabilistic structured query methods. In: 26th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 338–344. ACM Press, New York (2003)Google Scholar
  12. 12.
    Amtrup, J.W., Mansouri Rad, H., Megerdoomian, K., Zajac, R.: Persian-English Machine Translation: An Overview of the Shiraz Project. In: Memoranda in Computer and Cognitive Science, New Mexico State University (2000)Google Scholar
  13. 13.
    Shiraz Project, http://crl.nmsu.edu/Research/Projects/shiraz (cited September 1, 2008)

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Abolfazl AleAhmad
    • 1
  • Ehsan Kamalloo
    • 1
  • Arash Zareh
    • 1
  • Masoud Rahgozar
    • 1
  • Farhad Oroumchian
    • 2
  1. 1.School of Electrical and Computer EngineeringUniversity of TehranIran
  2. 2.Faculty of Computer Science and EngineeringUniversity of Wollongong in DubaiUnited Arab Emirates

Personalised recommendations