Advertisement

Improving Performance of English-Hindi CLIR System using Linguistic Tools and Techniques

  • Anurag Seetha
  • Sujoy Das
  • M. Kumar

Abstract

World Wide Web is growing rapidly and the content on Web of languages other than English is also increasing rapidly compared to English. Hindi is most widely spoken language in India. In past few years Hindi content has also increased rapidly on the Web. To ensure complete information exchange, in the era of globalization the information retrieval systems need to be multilingual or cross lingual. We have designed and developed an English-Hindi Cross Language Information Retrieval (CLIR) System using Dictionary based query translation method. Our previous experiments [5] showed reasonable 64.80% performance of the monolingual retrieval with this system using the TREC style test collection created especially for this research. This paper describes results of the English-Hindi CLIR experiments using some specialized query formulation strategies like stopword removal, stemming of query terms, transliteration of out of vocabulary words etc. The results demonstrated that the performance gradually improved when we applied NLP tools and techniques in short queries. Performance was dropped down to some extent when using query expansion and structuring as well using long queries to obtained cross-language results. The best performance result we obtained from these experiments was 82.91% compared to the monolingual retrieval.

Keywords

Query Term Query Expansion Information Retrieval System Query Formulation Translation Equivalent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ministry of Information Technology, New Delhi, India.: TDIL Vision 2010. Vishwabharat@TDIL Newsletter of Technology Development for Indian Languages (TDIL). January, 2003. (2003)Google Scholar
  2. 3.
    The ISCII document IS13194:1991, Bureau of Indian Standards, BIS (1991)Google Scholar
  3. 4.
    The Unicode Standard, Version 4.0, http://www.unicode.orgGoogle Scholar
  4. 5.
    Seetha, A., Das, S., Kumar, M.: Evaluation of the English-Hindi Cross Language Information Retrieval System Based on Dictionary Based Query Translation Method. In: Proceedings of 10th International Conference on Information Technology (ICIT 2007), http://doi.ieeecomputersociety.org/10.1109/ICIT.2007.40 (2007)Google Scholar
  5. 6.
    Seetha, A., Das, S., Kumar, M.: Construction of Hindi test collection for CLIR research. In: Proceedings of International Conference on Cognitive Systems (ICCS 2004). New Delhi, December 14–15 (2004)Google Scholar
  6. 7.
    Jansen, B., Spink, A., Saracevic, T.: Real life, real users, and real needs: A study and analysis of user queries on the Web. Information Proceeding and Management. 36(2), pp 207–227 (2000)CrossRefGoogle Scholar
  7. 8.
    Demner-Fushman, D., Oard, D. W.: The effect of bilingual term list size on dictionarybased cross-language information retrieval. In: 36th Annual Hawaii International Conference on System Sciences (HICSS’03)-Track 4. Hawaii. (2003)Google Scholar
  8. 9.
    Al-Onaizan, Y., Knight, K.: Machine Transliteration of Names in Arabic Text. In: Proceedings of ACL workshop on Computational Approaches to Semitic Languages.Google Scholar
  9. 10.
    Knight, K., Graehl, J.: Machine Transliteration. Computational Linguistics. 24 (4), 599–612. (1998)Google Scholar
  10. 11.
    Stalls, B. G., Knight, K.: Translating Names and Technical Terms in Arabic Text. In: Proceedings of the COLING/ACL Workshop on Computational Approaches to Semitic Languages. pp. 34–41, Montreal: ACL. (1998)Google Scholar
  11. 12.
    Qu, Y., Grefenstette, G., Evans, D. A.: Automatic Transliteration for Japanese-to-English Text Retrieval. In: Proceedings of the 26th Annual Inter-national ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 353–360. New York: ACM Press. (2003)Google Scholar
  12. 14.
    Adriani, M., van Rijsbergen, C. J.: Term Similarity Based Query Expansion for Cross Language Information Retrieval. In: Proceedings of Research and Advanced Technology for Digital Libraries. Third European Conference (ECDL’’99). pp. 311–322. Springer Verlag: Paris, September (1999)Google Scholar
  13. 15.
    Ballesteros, L., Croft, W. B.: Resolving Ambiguity for Cross-language Retrieval. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval. (1998)Google Scholar
  14. 16.
    Kekäläinen, J., Järvelin, K.: The impact of query structure and query expansion on retrieval performance. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, (1998)Google Scholar
  15. 17.
    Kristensen J.: Expanding end-users query statements for free text searching with a search-aid thesaurus. Information Proceeding and Management 29(6), 733–744Google Scholar

Copyright information

© Indian Institute of Information Technology, India 2009

Authors and Affiliations

  • Anurag Seetha
    • 1
  • Sujoy Das
    • 2
  • M. Kumar
    • 3
  1. 1.Computer Sc. & ApplicationsMCRPSVBhopalIndia
  2. 2.Deptt. of MCAMANITBhopalIndia
  3. 3.Deptt. of Computer Sc. & ITSIRTBhopalIndia

Personalised recommendations