Abstract
The great variability of word pronunciation in spontaneous speech is one of the reasons for the low performance of the present speech recognition systems. The generation of dictionaries which take this variability into account may increase the robustness of such systems. A word pronunciation is a possible phoneme-like sequence that can appear in a real utterance, and represents a possible acoustic production of the word.
In this paper, word pronunciations are modeled using stochastic finite-state automata. The use of such models allows the application of grammatical inference methods and an easy integration with the other knowledge sources. The training samples are obtained from the alignment between the phoneme-like decoding of each training utterance and the corresponding canonical transcription.
The models proposed in this work were applied in a translation-oriented speech task. The improvements achieved by these new models ranged from 2.7 to 0.6 points depending on the language model used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amengual, J.C., Benedí, J.M., Casacuberta, F., Castaño, A., Castellanos, A., Jiménez, V.M., Llorens, D., Marzal, A., Pastor, M., Prat, F., Vidal, E., and Vilar, J.M. The EuTrans-I Speech Tranlation System. In Machine Translation, 15(1–2) (2000): 75–103.
Bordel, G., Varona, A., and Torres, I. K-TLSS(S) Language Models for Speech Recognition. In Proceedings of ICASSP’97, 1997: 819–822.
Casacuberta, F. Some Relations Among Stochastic Finite—State Networks Used in Automatic Speech Recognition. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(7) (1990): 691–695.
Casacuberta, F., Llorens, D., Martínez, C., Molau, S., Nevado, F., Ney, H., Pastor, M., Picó, D., Sanchis, A., Vidal, E., and Vilar, J.M. Speech-to-Speech Translation Based on Finite-State Transducers. In: Proceedings of ICASSP’01, 2001.
De Mori, R., Snow, Ch., and Galler, M. On the Use of Stochastic Inference Networks for Representing Multiple Word Pronunciations. In: Proceedings of ICASSP’95, 1995.
Fosler-Lussier, E., Weintraub, M., Wegmann, S., Kao, Y., Khudanpur, S., Galles, C., and Saraclar, M. Automatic Learning of Word Pronunciation from Data. In: Proceedings of ICSLP’96, 1996.
Fosler-Lussier, E. Dynamic Pronunciation Models for Automatic Speech Recognition. PhD thesis, U.C. Berkeley, 1999.
García, P. and Vidal, E. Inference of k-testables languages in the strict sense and applications to syntactic pattern recognition. In: IEEE Transaction on Pattern Analysis and Machine Intelligence, 12(9) (1990): 920–925.
Gonzalez, R. and Thomason, M.G. Syntactic Pattern Recognition: An Introduction. Addison-Wesley, Reading, Massachusetts, 1978.
Hanna, P., Stewart, D., and Ming, J. The application of an Improved DP Match for Automatic Lexicon Generation. In: Proc. of EUROSPEECH’99, 1999: 475–478.
Jelinek, F. Speech Recognition by Statistical Methods. MIT Press, Cambridge, MA, 1998.
Llorens, D., Casacuberta, F., Segarra, E., Sánchez, J.A., Aibar, P., and Castro, M.J. Acoustic and Syntactical Modeling in the Atros System. In: Proceedings of ICASSP’99 3 (1999a): 641–644.
Llorens, D., Casacuberta, F., Segarra, E., Sánchez, J.A., and Aibar, P. A Fast Version of the Atros System. In: Proceedings EUROSPEECH’99, 1999b: 1299–1302.
Ney, H. The Use of a One-Stage Dynamic Programming Algorithm for Connected Word Recognition. In: IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984: 263–271.
Oncina, J. and Carrasco, R. Inference of Probabilistic Automata. In: Lecture Notes in Computer Science. ICGI’94, Springer-Verlag, 1994.
Pastor, M. and Casacuberta, F. Automatic Learning of Finite-State for pronunciation modeling. In: Proceedings of EUROSPEECH’01, 2001: 2297–2300.
Pastor, M., Sanchis, A., Casacuberta, F., and Vidal, E. EuTrans: a Speech-to-Speech Translator Prototype. In: Proceedings of EUROSPEECH’O1, 2001: 2385–2388.
Rossmanith, P. and Zeugmann, T. Stochastic Finite Learning of the Pattern Languages, Machine Learning 44(1–2) (2001): 67–91.
Sánchez, J.A., Casacuberta, F., Aibar, P., Llorens, D., and Castro, M.J. Fast phoneme look—ahead in the Atros system. In: Proceedings of VIII Spanish Symposium of Pattern Recognition and Image Analysis, 1 (1999): 77–84.
JHU Workshop 96 Pronunciation Group. Automatic Learning of Word Pronunciation from Data. Project Report, April 1997.
Young, S., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. The HTK Book Cambridge University Department and Entropic Research Laboratories Inc.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer
About this chapter
Cite this chapter
Pastor, M., Casacuberta, F. (2005). Pronunciation Modeling. In: Barry, W.J., van Dommelen, W.A. (eds) The Integration of Phonetic Knowledge in Speech Technology. Text, Speech and Language Technology, vol 25. Springer, Dordrecht. https://doi.org/10.1007/1-4020-2637-4_8
Download citation
DOI: https://doi.org/10.1007/1-4020-2637-4_8
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-2635-5
Online ISBN: 978-1-4020-2637-9
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)