Pronunciation Modeling

Pastor, Moisés; Casacuberta, Francisco

doi:10.1007/1-4020-2637-4_8

Moisés Pastor¹³ &
Francisco Casacuberta¹³

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 25))

420 Accesses

Abstract

The great variability of word pronunciation in spontaneous speech is one of the reasons for the low performance of the present speech recognition systems. The generation of dictionaries which take this variability into account may increase the robustness of such systems. A word pronunciation is a possible phoneme-like sequence that can appear in a real utterance, and represents a possible acoustic production of the word.

In this paper, word pronunciations are modeled using stochastic finite-state automata. The use of such models allows the application of grammatical inference methods and an easy integration with the other knowledge sources. The training samples are obtained from the alignment between the phoneme-like decoding of each training utterance and the corresponding canonical transcription.

The models proposed in this work were applied in a translation-oriented speech task. The improvements achieved by these new models ranged from 2.7 to 0.6 points depending on the language model used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amengual, J.C., Benedí, J.M., Casacuberta, F., Castaño, A., Castellanos, A., Jiménez, V.M., Llorens, D., Marzal, A., Pastor, M., Prat, F., Vidal, E., and Vilar, J.M. The EuTrans-I Speech Tranlation System. In Machine Translation, 15(1–2) (2000): 75–103.
Article Google Scholar
Bordel, G., Varona, A., and Torres, I. K-TLSS(S) Language Models for Speech Recognition. In Proceedings of ICASSP’97, 1997: 819–822.
Google Scholar
Casacuberta, F. Some Relations Among Stochastic Finite—State Networks Used in Automatic Speech Recognition. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(7) (1990): 691–695.
Article Google Scholar
Casacuberta, F., Llorens, D., Martínez, C., Molau, S., Nevado, F., Ney, H., Pastor, M., Picó, D., Sanchis, A., Vidal, E., and Vilar, J.M. Speech-to-Speech Translation Based on Finite-State Transducers. In: Proceedings of ICASSP’01, 2001.
Google Scholar
De Mori, R., Snow, Ch., and Galler, M. On the Use of Stochastic Inference Networks for Representing Multiple Word Pronunciations. In: Proceedings of ICASSP’95, 1995.
Google Scholar
Fosler-Lussier, E., Weintraub, M., Wegmann, S., Kao, Y., Khudanpur, S., Galles, C., and Saraclar, M. Automatic Learning of Word Pronunciation from Data. In: Proceedings of ICSLP’96, 1996.
Google Scholar
Fosler-Lussier, E. Dynamic Pronunciation Models for Automatic Speech Recognition. PhD thesis, U.C. Berkeley, 1999.
Google Scholar
García, P. and Vidal, E. Inference of k-testables languages in the strict sense and applications to syntactic pattern recognition. In: IEEE Transaction on Pattern Analysis and Machine Intelligence, 12(9) (1990): 920–925.
Article Google Scholar
Gonzalez, R. and Thomason, M.G. Syntactic Pattern Recognition: An Introduction. Addison-Wesley, Reading, Massachusetts, 1978.
Google Scholar
Hanna, P., Stewart, D., and Ming, J. The application of an Improved DP Match for Automatic Lexicon Generation. In: Proc. of EUROSPEECH’99, 1999: 475–478.
Google Scholar
Jelinek, F. Speech Recognition by Statistical Methods. MIT Press, Cambridge, MA, 1998.
Google Scholar
Llorens, D., Casacuberta, F., Segarra, E., Sánchez, J.A., Aibar, P., and Castro, M.J. Acoustic and Syntactical Modeling in the Atros System. In: Proceedings of ICASSP’99 3 (1999a): 641–644.
Google Scholar
Llorens, D., Casacuberta, F., Segarra, E., Sánchez, J.A., and Aibar, P. A Fast Version of the Atros System. In: Proceedings EUROSPEECH’99, 1999b: 1299–1302.
Google Scholar
Ney, H. The Use of a One-Stage Dynamic Programming Algorithm for Connected Word Recognition. In: IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984: 263–271.
Google Scholar
Oncina, J. and Carrasco, R. Inference of Probabilistic Automata. In: Lecture Notes in Computer Science. ICGI’94, Springer-Verlag, 1994.
Google Scholar
Pastor, M. and Casacuberta, F. Automatic Learning of Finite-State for pronunciation modeling. In: Proceedings of EUROSPEECH’01, 2001: 2297–2300.
Google Scholar
Pastor, M., Sanchis, A., Casacuberta, F., and Vidal, E. EuTrans: a Speech-to-Speech Translator Prototype. In: Proceedings of EUROSPEECH’O1, 2001: 2385–2388.
Google Scholar
Rossmanith, P. and Zeugmann, T. Stochastic Finite Learning of the Pattern Languages, Machine Learning 44(1–2) (2001): 67–91.
Article Google Scholar
Sánchez, J.A., Casacuberta, F., Aibar, P., Llorens, D., and Castro, M.J. Fast phoneme look—ahead in the Atros system. In: Proceedings of VIII Spanish Symposium of Pattern Recognition and Image Analysis, 1 (1999): 77–84.
Google Scholar
JHU Workshop 96 Pronunciation Group. Automatic Learning of Word Pronunciation from Data. Project Report, April 1997.
Google Scholar
Young, S., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. The HTK Book Cambridge University Department and Entropic Research Laboratories Inc.
Google Scholar

Download references

Author information

Authors and Affiliations

Institut Tecnològic d’Informàtica, Universitat Politècnica de València, València
Moisés Pastor & Francisco Casacuberta

Authors

Moisés Pastor
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Casacuberta
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universität des Saarlandes, Saarbrücken, Germany
William J. Barry
Norwegian University of Science and Technology, Trondheim, Norway
Wim A. van Dommelen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pastor, M., Casacuberta, F. (2005). Pronunciation Modeling. In: Barry, W.J., van Dommelen, W.A. (eds) The Integration of Phonetic Knowledge in Speech Technology. Text, Speech and Language Technology, vol 25. Springer, Dordrecht. https://doi.org/10.1007/1-4020-2637-4_8

Download citation

DOI: https://doi.org/10.1007/1-4020-2637-4_8
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-2635-5
Online ISBN: 978-1-4020-2637-9
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)

Publish with us

Policies and ethics