Abstract
Tracheoesophageal (TE) speech is a possibility to restore the ability to speak after laryngectomy, i.e. the removal of the larynx. TE speech often shows low audibility and intelligibility which also makes it a challenge to automatic speech recognition. We improved the recognition results by adapting a speech recognizer trained on normal, non-pathologic voices to single TE speakers by unsupervised HMM interpolation.
In speech rehabilitation the patient’s voice quality has to be evaluated. As no objective classification means exists until now and an automation of this procedure is desirable we performed initial experiments for automatic evaluation of the intelligibility. We compared scoring results for TE speech from five experienced raters with the word accuracy from different types of speech recognizers. Correlation coefficients of about –0.8 are promising for future work.
This work was partly funded by the EU in the project PF-STAR under grant IST-2001-37599. The responsibility for the contents of this study lies with the authors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Schutte, H.K., Nieboer, G.J.: Aerodynamics of esophageal voice production with and without a Groningen voice prosthesis. Filia Phoniatr Logop 54, 8–18 (2002)
Robbins, J., Fisher, H.B., Blom, E.C., Singer, M.I.: A Comparative Acoustic Study of Normal, Esophageal, and Tracheoesophageal Speech Production. Journal of Speech and Hearing Disorders 49, 202–210 (1984)
Bellandese, M.H., Lerman, J.W., Gilbert, H.R.: An Acoustic Analysis of Excellent Female Esophageal, Tracheoesophageal, and Laryngeal Speakers. Journal of Speech, Language, and Hearing Research 44, 1315–1320 (2001)
Gandour, J., Weinberg, B.: Perception of Intonational Contrasts in Alaryngeal Speech. Journal of Speech and Hearing Research 26, 142–148 (1983)
Searl, J.P., Carpenter, M.A.: Acoustic Cues to the Voicing Feature in Tracheoesophageal Speech. Journal of Speech, Language, and Hearing Research 45, 282–294 (2002)
Lohscheller, J.: Dynamics of the Laryngectomee Substitute Voice Production. Ph.D. thesis, Shaker-Verlag, Aachen, Germany (2003)
Stemmer, G.: Modeling Variability in Speech Recognition. Ph.D. thesis, Chair for Pattern Recognition, University of Erlangen-Nuremberg, Germany (2004)
Wahlster, W. (ed.): Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Berlin (2000)
Steidl, S., Stemmer, G., Hacker, C., Nöth, E., Niemann, H.: Improving Children’s Speech Recognition by HMMInterpolation with an Adults’ Speech Recognizer. In: Michaelis, B., Krell, G. (eds.) DAGM 2003. LNCS, vol. 2781, pp. 600–607. Springer, Heidelberg (2003)
Jelinek, F., Mercer, R.: Interpolated estimation of markov source parameters from sparse data. In: Gelesma, E.S., Kanal, L.N. (eds.) Proc. Workshop on Pattern Recognition in Practice, pp. 381–397. North-Holland, Amsterdam (1980)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Haderlein, T., Steidl, S., Nöth, E., Rosanowski, F., Schuster, M. (2004). Automatic Recognition and Evaluation of Tracheoesophageal Speech. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2004. Lecture Notes in Computer Science(), vol 3206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30120-2_42
Download citation
DOI: https://doi.org/10.1007/978-3-540-30120-2_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23049-6
Online ISBN: 978-3-540-30120-2
eBook Packages: Springer Book Archive