Disambiguation Strategies for English-Hindi Cross Language Information Retrieval System

  • Sujoy Das
  • Anurag Seetha
  • M. Kumar
  • J. L. Rana
Conference paper


The information content of languages other than English are increasing rapidly on WWW. To access information of a language other than the native language we need Cross-Language Information Retrieval (CLIR). The approaches to CLIR can be classified into three different categories • document translation, query translation and interlingua matching. The dictionary based query translation approach has been widely used by researchers of CLIR. The translation ambiguity and target polysemy are the two major problems of dictionary based CLIR. In this paper, we have investigated part of speech and co-occurrence based disambiguation techniques for English-Hindi CLIR system.


Information Retrieval Machine Translation Query Term Stop Word Query Translation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Douglas W.: A Comparative Study of Query and Document Translation for Cross Language Information Retrieval, Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup, pp. 472–483 (1998)Google Scholar
  2. 2.
    Hsin-Hsi Chen, Guo-Wei Bian and Wen-Cheng Lin,: Resolving Translation Ambiguity and Target Polysemy in Cross-Language Information Retrieval in proceedings of 27th Annual Meeting of the Association for Computational Linguistics, University of Maryland, College Park, Maryland, USA, ACL (1999)Google Scholar
  3. 3.
    Ballesteros L, Croft B.: Dictionary Methods for Cross-Lingual Information Retrieval. 7th DEXA Conf. on Database and Expert Systems Applications. Pages 791–801 (1996)Google Scholar
  4. 4.
    Ballesteros L., Bruce C.W.: Resolving Ambiguity for Cross-language Retrieval. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval (1998)Google Scholar
  5. 5.
    Pirkola A.: The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 55–63 (1998)Google Scholar
  6. 6.
    Davis M., Dunning T.: Query Translation using Evolutionary Programming for Multilingual Information Retrieval. The 41h Evolutionary Programming Conf., (1995).Google Scholar
  7. 7.
    Hull. D.A.: Using structured queries for disambiguation in cross-language information retrieval. In Proc. of AAAI spring symposium on cross-language text and speech retrieval, Stanford, CA (1997)Google Scholar
  8. 8.
    Jianfeng Gao, Jian-Yun Nie, Endong Xun, Jian Zhang, Ming Zhou, Changning Huang: Improving Query Translation for Cross-Lan guage Information Retrieval using Statistical Models In Proceeding of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2001)Google Scholar
  9. 9.
    Sadat F., Maeda A., Yoshikawa M, Uemura S.: A Combined Statistical Query Term Disambiguation in Cross-Language Information Retrieval, Proceedings of the 13th International Workshop on Database and Expert Systems Applications (DEXA’02) 1529-4188/02 (2002)Google Scholar
  10. 10.
    Clough Paul, and Mark Stevenson,: “Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-language Information Retrieval” In: Proceedings of the Second Global WordNet Conference, pp. 97–105 (2004)Google Scholar
  11. 11.
    Adriani M., van Rijsbergen C.J.,: Term Similarity Based Query Expansion for Cross Language Information Retrieval. In Proceedings of Research and Advanced Technology for Digital Libraries, Third European Conference (ECDL’99), p. 311–322. Springer Verlag, Paris, September (1999)Google Scholar
  12. 12.
    Kekäläinen J., Järvelin K.: The impact of query structure and query expansion on retrieval performance. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia (1998)Google Scholar
  13. 13.
    Davis M.W., Ogden W.C.: Free Resources And Advanced Alignment For Cross-Language Text Retrieval. TREC 1997:385–395(1997)Google Scholar
  14. 14.
    Monz C., Dorr B.J.: Iterative translation disambiguation for cross-language information retrievalin Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (2005)Google Scholar
  15. 15.
    Seetha A., Das S., Kumar M.: Evaluation of the English-Hindi Cross Language Information Retrieval System Based on Dictionary Based Query Translation Method. In proceedings of 10th International Conference on Information Technology (ICIT 2007), http://doi.ieeecomputersociety.org/10.1109/ICIT.2007.40Google Scholar
  16. 16.
    Daqing He, Oard D.W., Wang J., Jun Luo, Demner-Fushman D., Darwish K., Resnik P., Khudanpur S., Nossal M., Subotin M., Leuski A.: Making MIRACLEs: Interactive translingual search for Cebuano and Hindi September ACM Transactions on Asian Language Information Processing (TALIP), Volume 2 Issue 3 (2003)Google Scholar
  17. 17.
    Pingali P., Varma V.: IIIT Hyderabad at CLEF 2007-Adhoc Indian Language CLIR task 2007 CLEF-2007, Cross Language Evaluation Forum 2007 Workshop at Budapest Hungary, At Eleventh European Conference on Digital Libraries (2007).Google Scholar
  18. 18.
    Mandal D., Dandapat S., Gupta M., Banerjee P., Sarkar S.: Bengali and Hindi to English Cross-language Text Retrieval un der Limited Resources in CLEF 2007 working notes (2007).Google Scholar
  19. 19.
    Davis M.W., Ogden W.C.: Free Resources And Advanced Alignment For Cross-Language Text Retrieval. TREC: Gaithersburg, Maryland, 385–395 (1997)Google Scholar
  20. 20.
    Seetha A., Das S., Kumar M.,: Construction of Hindi test collection for CLIR research. In Proceedings of International Conference on Cognitive Systems (ICCS 2004) New Delhi, December 14–15, (available at www.niitcrcs.com/iccs/iccs2004/Papers/240%20Anurag%20Sheetha.pdf) (2004)Google Scholar
  21. 21.
    Croft W.B., Cook R., Wilder D: Providing Government Information on the Internet: Experiences with THOMAS. in Proceedings of DL. pp. 19–24 (1995)Google Scholar
  22. 22.
    Kamps J, Monz C., Maarten de Rijke Sigurbjörnsson B.: Monolingual Document Retrieval: English versus other European Language s. In Proceedings of the Fourth Dutch Belgian Information Retrieval Workshop (DIR-2003). Pages: 35–39 (2003)Google Scholar
  23. 23.
    Porter M.F.: An algorithm for suffix stripping, in Program—automated library and information systems, 14(3): 130–137 (1980)CrossRefGoogle Scholar
  24. 25.
    Demner-Fushman D., Oard D.W.: The effect of bilingual term list size on dictionary based cross-language information retrieval. In 36th Annual Hawaii International Conference on System Sciences (HICSS’03)—Track 4. Hawaii (2003)Google Scholar
  25. 26.
    Larkey L. S., Allan J., Connell, M. E., Bolivar A., Wade, C.: UMass at TREC 2002: Cross language and novelty tracks The 11th Text Retrieval Conference TREC 2002 NIST (2003)Google Scholar

Copyright information

© Indian Institute of Information Technology, India 2009

Authors and Affiliations

  • Sujoy Das
    • 1
  • Anurag Seetha
    • 2
  • M. Kumar
    • 3
  • J. L. Rana
    • 4
  1. 1.Deptt. of MCAMANITBhopalIndia
  2. 2.Computer Sc. & ApplicationsMCRPSVBhopalIndia
  3. 3.Deptt. of Computer Sc. & ITSIRTBhopalIndia
  4. 4.RITSBhopalIndia

Personalised recommendations