Skip to main content

Using the Lemmatization Technique for Phonetic Transcription in Text-to-Speech System

  • Conference paper
Text, Speech and Dialogue (TSD 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3206))

Included in the following conference series:

Abstract

This paper deals with a lemmatization technique and its using for phonetic transcription of exceptional words. The lemmatizer is based on language morphology and uses a lexicon of basic word forms and a set of inversion derivation rules to acquire lemmatization rules, which are essential for finding word bases. The lemmatization algorithm and its necessary modifications for transcription of exceptional words are described. The main goal of the designed system is to save computer memory for exceptional lexicon storing. The experimental results showed that it is possible to save from 18.3% (English) to 98.4% (Finnish) of the full lexicon size. Hence, the described technique can be applied with advantage for high inflectional and agglutinative languages.

Support for this work was provided by Ministry of Education of the Czech Republic, project No. MSM235200004.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Matoušek, J., Psutka, J.: ARTIC: A New Czech Text-to-Speech System Using Statistical Approach to Speech Segment Database Construction. In: The Proceedings of the 6th International Conference on Spoken Language Processing ICSLP 2000, Beijing, China, vol. IV, pp. 612–615 (2000)

    Google Scholar 

  2. Sedláček, R., Smrž, P.: A New Czech Morphological Analyser ajka. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, Springer, Heidelberg (2001)

    Google Scholar 

  3. Hajič, J.: A Statistical Modeling and An Automatic Analyze of Natural Language, http://ufal.mff.cuni.cz/publications/year2001/slovko1.doc (only in Czech: Statistické Modelování a automatická analýza přirozeného jazyka)

  4. Strossa, P.: Czech Lemmatizer. Why and How? Computerworld 13(29), 9–11 (2002) (only in Czech: Proč a hlavně jak?)

    Google Scholar 

  5. The online manual for the program ISPELL, http://h30097.www3.hp.com/demos/ossc/man-html/man4/ispell.4.html#lbAB

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kanis, J., Müller, L. (2004). Using the Lemmatization Technique for Phonetic Transcription in Text-to-Speech System. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2004. Lecture Notes in Computer Science(), vol 3206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30120-2_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30120-2_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23049-6

  • Online ISBN: 978-3-540-30120-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics