Abstract
Transliteration of Arabic numerals is not easily resolved. Arabic numerals occur frequently in scientific and informative texts and deliver significant meanings. Since readings of Arabic numerals depend largely on their context, generating accurate pronunciation of Arabic numerals is one of the critical criteria in evaluating TTS systems. In this paper, (1) contextual, pattern, and arithmetic features are extracted from a transliterated corpus; (2) ambiguities of homographic classifiers are resolved based on the semantic relations in KorLex1.0 (Korean Lexico-Semantic Network); (3) a classification model for accurate and efficient transliteration of Arabic numerals is proposed in order to improve Korean TTS systems. The proposed model yields 97.3% accuracy, which is 9.5% higher than that of a customized Korean TTS system.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agirre, E., et al.: Word Sense Disambiguation using Conceptual Density. In: Pro-ceedings of the 16th International Confernce on Computational Linguistics (COLING 1996), pp. 16–22 (1996)
Castillo, M., et al.: Automatic Assignment of Domain Labels to WordNet. In: Pro-ceeding of the 2nd International WordNet Conference, pp. 75–82 (2004)
Jung, Y.I.: Imprementation of an Automatic Transliteration System of Arabic Numerals for Korean TTS, Master’s thesis, Pusan National University (2004)
Kim, J.S., et al.: Disambiguation model of Homographs based on Statistic using Weight. Korean Information Science: Softwares and Applications 30(11), 1112–1123 (2003)
Leacock, C., et al.: Combining Local Context and WordNet Similarity for Word Sense Identification. In: WordNet - An electronic lexical database, pp. 265–283. MIT Press, Cambridge (1998)
Manning, C.D., et al.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (2001)
Fellbaum, C.: WordNet - An electronic lexical database. MIT Press, Cambridge (1998)
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo (1993)
Sproat, R., et al.: Normalization of Non-Standard Word. Computer Speech and Language 15(3), 287–333 (2001)
Tetschner, W.: Text-to-Speech - Naturalness and Accuracy. ASR News (July 2003), http://www.asrnews.com/ttsap/ttspap11.htm (referred to on June 7, 2004)
Witten, I.H., et al.: Data Mining. Morgan Kaufmann Publishers, San Diego (1999)
Yarowsky, D.: Homograph Disambiguation in Text-to-speech Synthesis. In: Pro-gress in Speech Synthesis, pp. 159–174. Springer, New York (1997)
Yoon, A.S., et al.: An Automatic Transcription System for Arabic Numerals in Korean. In: Proceedings of 2003 International Conference on Natural Language Processing and Knowledge Engineering, pp. 221–226 (2003)
Yoon, A.S., et al.: Automatic Transcription of Three Ambiguous Symbols Used with Arabic Numerals: Period, Colon and Slash. Language and Information 8, 117–136 (2004)
Yu, M.S., et al.: Disambiguating the senses of non-text symbols for Mandarin TTS systems with a three-layer classifier. Speech communication 39(3/4), 191–229 (2003); Learning Tool: Weka 3: http://www.cs.waikato.ac.nz/ml/weka/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jung, Y., Yoon, A., Kwon, HC. (2006). Disambiguation Based on Wordnet for Transliteration of Arabic Numerals for Korean TTS. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299_38
Download citation
DOI: https://doi.org/10.1007/11671299_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32205-4
Online ISBN: 978-3-540-32206-1
eBook Packages: Computer ScienceComputer Science (R0)