Abstract
This paper deals with a lemmatization technique and its using for phonetic transcription of exceptional words. The lemmatizer is based on language morphology and uses a lexicon of basic word forms and a set of inversion derivation rules to acquire lemmatization rules, which are essential for finding word bases. The lemmatization algorithm and its necessary modifications for transcription of exceptional words are described. The main goal of the designed system is to save computer memory for exceptional lexicon storing. The experimental results showed that it is possible to save from 18.3% (English) to 98.4% (Finnish) of the full lexicon size. Hence, the described technique can be applied with advantage for high inflectional and agglutinative languages.
Support for this work was provided by Ministry of Education of the Czech Republic, project No. MSM235200004.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Matoušek, J., Psutka, J.: ARTIC: A New Czech Text-to-Speech System Using Statistical Approach to Speech Segment Database Construction. In: The Proceedings of the 6th International Conference on Spoken Language Processing ICSLP 2000, Beijing, China, vol. IV, pp. 612–615 (2000)
Sedláček, R., Smrž, P.: A New Czech Morphological Analyser ajka. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, Springer, Heidelberg (2001)
Hajič, J.: A Statistical Modeling and An Automatic Analyze of Natural Language, http://ufal.mff.cuni.cz/publications/year2001/slovko1.doc (only in Czech: Statistické Modelování a automatická analýza přirozeného jazyka)
Strossa, P.: Czech Lemmatizer. Why and How? Computerworld 13(29), 9–11 (2002) (only in Czech: Proč a hlavně jak?)
The online manual for the program ISPELL, http://h30097.www3.hp.com/demos/ossc/man-html/man4/ispell.4.html#lbAB
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kanis, J., Müller, L. (2004). Using the Lemmatization Technique for Phonetic Transcription in Text-to-Speech System. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2004. Lecture Notes in Computer Science(), vol 3206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30120-2_45
Download citation
DOI: https://doi.org/10.1007/978-3-540-30120-2_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23049-6
Online ISBN: 978-3-540-30120-2
eBook Packages: Springer Book Archive