Skip to main content

Pronunciation Modeling

Automatic Learning of Finite-state Automata

  • Chapter
The Integration of Phonetic Knowledge in Speech Technology

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 25))

  • 420 Accesses

Abstract

The great variability of word pronunciation in spontaneous speech is one of the reasons for the low performance of the present speech recognition systems. The generation of dictionaries which take this variability into account may increase the robustness of such systems. A word pronunciation is a possible phoneme-like sequence that can appear in a real utterance, and represents a possible acoustic production of the word.

In this paper, word pronunciations are modeled using stochastic finite-state automata. The use of such models allows the application of grammatical inference methods and an easy integration with the other knowledge sources. The training samples are obtained from the alignment between the phoneme-like decoding of each training utterance and the corresponding canonical transcription.

The models proposed in this work were applied in a translation-oriented speech task. The improvements achieved by these new models ranged from 2.7 to 0.6 points depending on the language model used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Amengual, J.C., Benedí, J.M., Casacuberta, F., Castaño, A., Castellanos, A., Jiménez, V.M., Llorens, D., Marzal, A., Pastor, M., Prat, F., Vidal, E., and Vilar, J.M. The EuTrans-I Speech Tranlation System. In Machine Translation, 15(1–2) (2000): 75–103.

    Article  Google Scholar 

  • Bordel, G., Varona, A., and Torres, I. K-TLSS(S) Language Models for Speech Recognition. In Proceedings of ICASSP’97, 1997: 819–822.

    Google Scholar 

  • Casacuberta, F. Some Relations Among Stochastic Finite—State Networks Used in Automatic Speech Recognition. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(7) (1990): 691–695.

    Article  Google Scholar 

  • Casacuberta, F., Llorens, D., Martínez, C., Molau, S., Nevado, F., Ney, H., Pastor, M., Picó, D., Sanchis, A., Vidal, E., and Vilar, J.M. Speech-to-Speech Translation Based on Finite-State Transducers. In: Proceedings of ICASSP’01, 2001.

    Google Scholar 

  • De Mori, R., Snow, Ch., and Galler, M. On the Use of Stochastic Inference Networks for Representing Multiple Word Pronunciations. In: Proceedings of ICASSP’95, 1995.

    Google Scholar 

  • Fosler-Lussier, E., Weintraub, M., Wegmann, S., Kao, Y., Khudanpur, S., Galles, C., and Saraclar, M. Automatic Learning of Word Pronunciation from Data. In: Proceedings of ICSLP’96, 1996.

    Google Scholar 

  • Fosler-Lussier, E. Dynamic Pronunciation Models for Automatic Speech Recognition. PhD thesis, U.C. Berkeley, 1999.

    Google Scholar 

  • García, P. and Vidal, E. Inference of k-testables languages in the strict sense and applications to syntactic pattern recognition. In: IEEE Transaction on Pattern Analysis and Machine Intelligence, 12(9) (1990): 920–925.

    Article  Google Scholar 

  • Gonzalez, R. and Thomason, M.G. Syntactic Pattern Recognition: An Introduction. Addison-Wesley, Reading, Massachusetts, 1978.

    Google Scholar 

  • Hanna, P., Stewart, D., and Ming, J. The application of an Improved DP Match for Automatic Lexicon Generation. In: Proc. of EUROSPEECH’99, 1999: 475–478.

    Google Scholar 

  • Jelinek, F. Speech Recognition by Statistical Methods. MIT Press, Cambridge, MA, 1998.

    Google Scholar 

  • Llorens, D., Casacuberta, F., Segarra, E., Sánchez, J.A., Aibar, P., and Castro, M.J. Acoustic and Syntactical Modeling in the Atros System. In: Proceedings of ICASSP’99 3 (1999a): 641–644.

    Google Scholar 

  • Llorens, D., Casacuberta, F., Segarra, E., Sánchez, J.A., and Aibar, P. A Fast Version of the Atros System. In: Proceedings EUROSPEECH’99, 1999b: 1299–1302.

    Google Scholar 

  • Ney, H. The Use of a One-Stage Dynamic Programming Algorithm for Connected Word Recognition. In: IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984: 263–271.

    Google Scholar 

  • Oncina, J. and Carrasco, R. Inference of Probabilistic Automata. In: Lecture Notes in Computer Science. ICGI’94, Springer-Verlag, 1994.

    Google Scholar 

  • Pastor, M. and Casacuberta, F. Automatic Learning of Finite-State for pronunciation modeling. In: Proceedings of EUROSPEECH’01, 2001: 2297–2300.

    Google Scholar 

  • Pastor, M., Sanchis, A., Casacuberta, F., and Vidal, E. EuTrans: a Speech-to-Speech Translator Prototype. In: Proceedings of EUROSPEECH’O1, 2001: 2385–2388.

    Google Scholar 

  • Rossmanith, P. and Zeugmann, T. Stochastic Finite Learning of the Pattern Languages, Machine Learning 44(1–2) (2001): 67–91.

    Article  Google Scholar 

  • Sánchez, J.A., Casacuberta, F., Aibar, P., Llorens, D., and Castro, M.J. Fast phoneme look—ahead in the Atros system. In: Proceedings of VIII Spanish Symposium of Pattern Recognition and Image Analysis, 1 (1999): 77–84.

    Google Scholar 

  • JHU Workshop 96 Pronunciation Group. Automatic Learning of Word Pronunciation from Data. Project Report, April 1997.

    Google Scholar 

  • Young, S., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. The HTK Book Cambridge University Department and Entropic Research Laboratories Inc.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer

About this chapter

Cite this chapter

Pastor, M., Casacuberta, F. (2005). Pronunciation Modeling. In: Barry, W.J., van Dommelen, W.A. (eds) The Integration of Phonetic Knowledge in Speech Technology. Text, Speech and Language Technology, vol 25. Springer, Dordrecht. https://doi.org/10.1007/1-4020-2637-4_8

Download citation

Publish with us

Policies and ethics