Skip to main content

Accuracy Analysis of Generalized Pronunciation Variant Selection in ASR Systems

  • Conference paper
Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5641))

  • 1568 Accesses

Abstract

Automated speech recognition systems work typically with pronunciation dictionary for generating expected phonetic content of particular words in recognized utterance. But the pronunciation can vary in many situations. Besides the cases with more possible pronunciation variants specified manually in the dictionary there are typically many other possible changes in the pronunciation depending on word context or speaking style, very typical for our case of Czech language. In this paper we have studied the accuracy of proper selection of automatically predicted pronunciation variants in Czech HMM ASR based systems. We have analyzed correctness of pronunciation variant selection in forced alignment of known utterances used as an ASR training data. Using the proper pronunciation variant, more exact transcriptions of utterances were created for further purposes, mainly for the more accurate training of acoustic HMM models. Finally, as the target and the most important application are LVCSR systems, the accuracy of LVCSR results using different levels of automated pronunciation generation were tested.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Psutka, J., Ircing, P., Psutka, J.V., Hajič, J., Byrne, W.J., Mírovský, J.: Automatic transcription of Czech, Russian, and Slovak spontaneous speech in the MALACH project. In: Proc. Interspeech 2005, Lisbon, Portugal, pp. 1349–1352 (2005)

    Google Scholar 

  2. Nouza, J., Ždánský, J., David, P., Červa, P., Kolorenč, J., Nejedlová, D.: Fully automated system for Czech spoken broadcast transcription with very large (300K+) lexicon. In: Proc. Interspeech 2005, Lisbon, Portugal, pp. 1681–1684 (2005)

    Google Scholar 

  3. Dupont, S., Ris, C., Couvreur, L., Boite, J.-M.: A study if implicit and explicit modeling of coarticulation and pronunciation variation. In: Proc. Interspeech 2005, Lisbon, Portugal, pp. 1353–1356 (2005)

    Google Scholar 

  4. Wolff, M.: On representation and training of pronunciation dictionaries. In: 8th Czech-German Workshop ’Speech Processing’, Prague, Czech Republic (1998)

    Google Scholar 

  5. Wolff, M., Eichner, M., Hoffmann, R.: Evaluation of automatically trained pronunciation dictionaries. In: Proc. Czech-German WS on Speech Processing, Prague, Czech Republic (2002)

    Google Scholar 

  6. Pollák, P., Hanžl, V.: Tool for Czech pronunciation generation combining fixed rules with pronunciation lexicon and lexicon management tool. In: Proc. of LREC 2002, Third International Conference on Language Resources and Evaluation, Las Palmas, Spain (May 2002)

    Google Scholar 

  7. Pollák, P., Volí, J., Skarnitzl, R.: Influence of hmm’s parameters on the accuracy of phone segmentation - evaluation baseline. In: ESSP 2005, Electronic Speech Signal Processing, Prague (September 2005)

    Google Scholar 

  8. Wells, J.C., et al.: Czech SAMPA home page (2003), http://www.phon.ucl.ac.uk/home/sampa/czech-uni.htm

  9. Palková, Z.: Czech phonetics and phonology. In: Czech language - Fonetika a fonologie češtiny, Charles University. Karolinum (1994)

    Google Scholar 

  10. Pollák, P., Volí, J., Skarnitzl, R.: Analysis of glottal stop presence in large speech corpus and influence of its modelling on segmentation accuracy. In: 16th Czech-German Workshop on Speech Processing, Prague (September 2006)

    Google Scholar 

  11. Psutka, J., Müller, L., Matoušek, J., Radová, V.: Mluvíme s počítačem česky (Talking to the Computer in Czech). Academia, Prague (2006)

    Google Scholar 

  12. Pollák, P., Černocký, J.: Czech SPEECON adult database (November 2003), http://www.speechdat.org/speecon

  13. Young, S., et al.: The HTK Book, Version 3.3, Cambridge (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hanžl, V., Pollák, P. (2009). Accuracy Analysis of Generalized Pronunciation Variant Selection in ASR Systems. In: Esposito, A., Vích, R. (eds) Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions. Lecture Notes in Computer Science(), vol 5641. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03320-9_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03320-9_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03319-3

  • Online ISBN: 978-3-642-03320-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics