Disambiguation Based on Wordnet for Transliteration of Arabic Numerals for Korean TTS

Jung, Youngim; Yoon, Aesun; Kwon, Hyuk-Chul

doi:10.1007/11671299_38

Disambiguation Based on Wordnet for Transliteration of Arabic Numerals for Korean TTS

Youngim Jung¹⁷,
Aesun Yoon¹⁸ &
Hyuk-Chul Kwon¹⁷

Conference paper

1376 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3878))

Abstract

Transliteration of Arabic numerals is not easily resolved. Arabic numerals occur frequently in scientific and informative texts and deliver significant meanings. Since readings of Arabic numerals depend largely on their context, generating accurate pronunciation of Arabic numerals is one of the critical criteria in evaluating TTS systems. In this paper, (1) contextual, pattern, and arithmetic features are extracted from a transliterated corpus; (2) ambiguities of homographic classifiers are resolved based on the semantic relations in KorLex1.0 (Korean Lexico-Semantic Network); (3) a classification model for accurate and efficient transliteration of Arabic numerals is proposed in order to improve Korean TTS systems. The proposed model yields 97.3% accuracy, which is 9.5% higher than that of a customized Korean TTS system.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agirre, E., et al.: Word Sense Disambiguation using Conceptual Density. In: Pro-ceedings of the 16th International Confernce on Computational Linguistics (COLING 1996), pp. 16–22 (1996)
Google Scholar
Castillo, M., et al.: Automatic Assignment of Domain Labels to WordNet. In: Pro-ceeding of the 2nd International WordNet Conference, pp. 75–82 (2004)
Google Scholar
Jung, Y.I.: Imprementation of an Automatic Transliteration System of Arabic Numerals for Korean TTS, Master’s thesis, Pusan National University (2004)
Google Scholar
Kim, J.S., et al.: Disambiguation model of Homographs based on Statistic using Weight. Korean Information Science: Softwares and Applications 30(11), 1112–1123 (2003)
Google Scholar
Leacock, C., et al.: Combining Local Context and WordNet Similarity for Word Sense Identification. In: WordNet - An electronic lexical database, pp. 265–283. MIT Press, Cambridge (1998)
Google Scholar
Manning, C.D., et al.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (2001)
Google Scholar
Fellbaum, C.: WordNet - An electronic lexical database. MIT Press, Cambridge (1998)
MATH Google Scholar
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo (1993)
Google Scholar
Sproat, R., et al.: Normalization of Non-Standard Word. Computer Speech and Language 15(3), 287–333 (2001)
Article Google Scholar
Tetschner, W.: Text-to-Speech - Naturalness and Accuracy. ASR News (July 2003), http://www.asrnews.com/ttsap/ttspap11.htm (referred to on June 7, 2004)
Witten, I.H., et al.: Data Mining. Morgan Kaufmann Publishers, San Diego (1999)
Google Scholar
Yarowsky, D.: Homograph Disambiguation in Text-to-speech Synthesis. In: Pro-gress in Speech Synthesis, pp. 159–174. Springer, New York (1997)
Google Scholar
Yoon, A.S., et al.: An Automatic Transcription System for Arabic Numerals in Korean. In: Proceedings of 2003 International Conference on Natural Language Processing and Knowledge Engineering, pp. 221–226 (2003)
Google Scholar
Yoon, A.S., et al.: Automatic Transcription of Three Ambiguous Symbols Used with Arabic Numerals: Period, Colon and Slash. Language and Information 8, 117–136 (2004)
Google Scholar
Yu, M.S., et al.: Disambiguating the senses of non-text symbols for Mandarin TTS systems with a three-layer classifier. Speech communication 39(3/4), 191–229 (2003); Learning Tool: Weka 3: http://www.cs.waikato.ac.nz/ml/weka/

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Pusan National University, Jangjeon-dong Geumjeong-gu, 609-735, Busan, S. Korea
Youngim Jung & Hyuk-Chul Kwon
Department of French, Pusan National University, Jangjeon-dong Geumjeong-gu, 609-735, Busan, S. Korea
Aesun Yoon

Authors

Youngim Jung
View author publications
You can also search for this author in PubMed Google Scholar
Aesun Yoon
View author publications
You can also search for this author in PubMed Google Scholar
Hyuk-Chul Kwon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jung, Y., Yoon, A., Kwon, HC. (2006). Disambiguation Based on Wordnet for Transliteration of Arabic Numerals for Korean TTS. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299_38

Download citation

DOI: https://doi.org/10.1007/11671299_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32205-4
Online ISBN: 978-3-540-32206-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics