Abstract
Automatic transliteration of foreign names is basically regarded as a diminutive clone of the machine translation (MT) problem. It thus follows IBM’s conventional MT models under the source-channel framework. Nonetheless, some parameters of this model dealing with zero-fertility words in the target sequences, can negatively impact transliteration effectiveness because of the inevitable inverted conditional probability estimation. Instead of source-channel, this paper presents a direct probabilistic transliteration model using contextual features of phonemes with a tailored alignment scheme for phoneme chunks. Experiments demonstrate superior performance over the source-channel for the task of English-Chinese transliteration.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Berger, A.L., Della Pietra, S.A., Della Pietra, V.J.: A Maximum entropy approach to natural language processing. Computational Linguistics 22, 39–71 (1996)
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter Estimation. Computational Linguistics 19, 261–311 (1993)
Clarkson, P., Ronsenfeld, R.: Statistical language modeling using the CMU-Cambridge toolkit. In: Proc. of the 5th EuroSpeech, pp. 2707–2710 (1997)
Gao, W., Wong, K.F., Lam, W.: Phoneme-based transliteration of foreign names for OOV problem. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 374–381. Springer, Heidelberg (2005)
Germann, U., Jahr, M., Knight, K., Marcu, D., Yamada, K.: Fast decoding and optimal decoding for machine translation. In: Proc. of ACL, pp. 228–235 (2001)
Kang, I.H., Kim, C.C.: English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks. In: Proc. of COLING, pp. 418–424 (2000)
Knight, K., Graehl, J.: Machine transliteration. In: Proc. of ACL, pp. 128–135 (1997)
Lee, J.S., Choi, K.S.: English to Korean statistical transliteration for information retrieval. Computer Processing of Oriental Languages 12, 17–27 (1998)
Och, F.J., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Proc. of ACL, pp. 295–302 (2002)
Ratnaparkhi, A.: A maximum entropy model for Part-Of-Speech tagging. In: Proc. of EMNLP, pp. 133–141 (1996)
Rabiner, L.R.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc. of IEEE 77, 257–286 (1989)
Stalls, B.G., Knight, K.: Translating names and technical terms in Arabic text. In: Proc. of COLING/ACL Workshop on Computational Approaches to Semitic Languages (1998)
Virga, P., Khudanpur, S.: Transliteration of proper names in cross-lingual information retrieval. In: Proc. of ACL Workshop on Multi-lingual Named Entity Recognition (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gao, W., Wong, KF., Lam, W. (2005). Improving Transliteration with Precise Alignment of Phoneme Chunks and Using Contextual Features. In: Myaeng, S.H., Zhou, M., Wong, KF., Zhang, HJ. (eds) Information Retrieval Technology. AIRS 2004. Lecture Notes in Computer Science, vol 3411. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31871-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-31871-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25065-4
Online ISBN: 978-3-540-31871-2
eBook Packages: Computer ScienceComputer Science (R0)