Abstract
The performance of transliteration from a source language to a target language builds the ground work in support of proper name Cross Lingual Information Retrieval (CLIR). Traditionally, this task is accomplished by two separate modules: transliteration and retrieval. Queries are first transliterated to target language using one or multiple hypotheses. The retrieval is then carried out based on translated queries. The transliteration often results in 30-50% errors with top 1 hypothesis, thus leading to significant performance degradation in CLIR. Therefore, we proposed a unified transliteration retrieval model that incorporates the transliteration similarity measurement into the relevance scoring function. In addition, we presented an efficient and robust method in similarity measurement for a given proper name pair using the Hidden Markov Model (HMM) based alignment and a Statistical Machine Translation (SMT) framework. Experimental data showed significant results with the proposed integrated method on the NTCIR7 IR4QA task, which demonstrated a greater flexibility and acceptance in transliteration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Larkey, L., AbdulJaleel, N., Connell, M.: What’s in a Name?: Proper Names in Arabic Cross Language Information Retrieval. CIIR Technical Report, IR-278, Univ. of Amherst (2003)
Darwish, K., Doermann, D., Jones, R., Oard, D., Rautiainen, M.: TREC-10 Experiments at University of Maryland CLIR and Video. In: 10th TREC, pp. 549–561 (2002)
Meng, H., Chen, B., Lo, W.K., Tang, K.: Generating Phonetic Cognates to Handle Named Entities in English-Chinese Cross-Language Spoken Document Retrieval. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 311–314 (2001)
Virga, P., Khudanpur, S.: Transliteration of Proper Names in Cross-lingual Information Retrieval. In: ACL Workshop on Multilingual and Mixed-Language Named Entity Recognition, pp. 57–64 (2003)
Bellaachia, A., Amor-Tijani, G.: Proper Nouns in English–Arabic Cross Language Information Retrieval. J. American Society for Information Science and Technology 59(12), 1925–1935 (2008)
Chen, H.-S., Huang, S.-J., Ding, Y.-W., Tasi, S.C.: Proper Name Translation in Cross-Language Information Retrieval. In: 17th COLING-ACL 1998, pp. 232–235 (1998)
Kishida, K.: Technical Issues of Cross-Language Information Retrieval: A Review. Information Processing & Management 41(3), 433–455 (2005)
Xu, J., Weischedel, R., Nguyen, C.: Evaluating a Probabilistic Model for Cross-Lingual Information Retrieval. In: 24th ACM SIGIR, pp. 105–110 (2001)
Kraaij, W., Pohlmann, R., Hiemstra, D.: Twenty-one at TREC-8: Using Language Technology for Information Retrieval. In: 8th TREC, pp. 285–300 (2000)
Lavrenko, V., Choquette, M., Croft, W.B.: Cross-lingual relevance models. In: 25th ACM SIGIR, pp. 175–182 (2002)
Knight, K., Graehl, J.: Machine Transliteration. Computational Linguistics 24(4), 509–612 (1997)
Brown, P.E., Pietra, S.A.D., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2), 263–311 (1993)
Gao, W., Wong, K.F., Lam, W.: Improving Transliteration with Precise Alignment of Phoneme Chunks and Using Context Features. In: Myaeng, S.-H., Zhou, M., Wong, K.-F., Zhang, H.-J. (eds.) AIRS 2004. LNCS, vol. 3411, pp. 106–117. Springer, Heidelberg (2005)
Li, H.Z., Zhang, M., Su, J.: A Joint Source-Channel Model for Machine Transliteration. In: 42nd ACL, pp. 159–166 (2004)
Kumaran, A., Kellner, T.: A Generic Framework for Machine Transliteration. In: 30th ACM SIGIR, pp. 721–722 (2008)
Klementiev, A., Roth, D.: Weakly Supervised Named Entity Transliteration and Discovery from Multi-lingual Comparable Corpora. In: 44th ACL, pp. 817–824 (2006)
Jiang, L., Zhou, M., Chien, L.F., Niu, C.: Named Entity Translation with Web Mining and Transliteration. In: 20th ICJAI, pp. 1629–1634 (2007)
Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: 10th ACM SIGIR, pp. 275–281 (1998)
Berger, A., Lafferty, J.: Information Retrieval as Statistical Translation. In: 22nd ACM SIGIR, pp. 222–229 (1999)
Sakai, T., Kando, N., Lin, C.J., Mitamura, T., Shima, H., Ji, D., Chen, K.H., Nyberg, E.: Overview of the NTCIR-7 ACLIA IR4QA Task. In: NTCIR-7 Workshop Meeting, pp. 77–114 (2008)
Zhai, C.X., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Information retrieval. ACM Trans. on Information Systems 22(2), 179–214 (2004)
Och, F., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)
Papeneni, K.A., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: 40th ACL, pp. 311–318 (2001)
Jan, E., Ge, N., Lin, S.H., Roukos, S., Sorensen, J.: A Novel Approach to Proper Name Transliteration. Submitted to ISCSLP 2010
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jan, EE., Lin, SH., Chen, B. (2010). Transliteration Retrieval Model for Cross Lingual Information Retrieval. In: Cheng, PJ., Kan, MY., Lam, W., Nakov, P. (eds) Information Retrieval Technology. AIRS 2010. Lecture Notes in Computer Science, vol 6458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17187-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-17187-1_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17186-4
Online ISBN: 978-3-642-17187-1
eBook Packages: Computer ScienceComputer Science (R0)