Abstract
Automatic Speech Recognition (ASR) in reverberant rooms can be improved by choosing training data from the same acoustical environment as the test data. In a real-world application this is often not possible. A solution for this problem is to use speech signals from a close-talking microphone and reverberate them artificially with multiple room impulse responses. This paper shows results on recognizers whose training data differ in size and percentage of reverberated signals in order to find the best combination for data sets with different degrees of reverberation. The average error rate on a close-talking and a distant-talking test set could thus be reduced by 29% relative.
Our work was partially supported by the German Federal Ministry of Education and Research (grant no. 01 IMD 01 F) in the frame of the SmartWeb project. The responsibility for the contents of this study lies with the authors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Couvreur, L., Couvreur, C., Ris, C.: A Corpus-Based Approach for Robust ASR in Reverberant Environments. In: Proc. of International Conference on Spoken Language Processing (ICSLP), Beijing, China, vol. 1, pp. 397–400 (2000)
Stahl, V., Fischer, A., Bippus, R.: Acoustic Synthesis of Training Data for Speech Recognition in Living Room Environments. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, Utah, vol. 1, pp. 21–24 (2001)
Junqua, J.-C.: Robust Speech Recognition in Embedded Systems and PC Applications. Kluwer Academic Publishers, Boston (2001)
Kingsbury, B.E.D., Morgan, N.: Recognizing Reverberant Speech with RASTA-PLP. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Munich, Germany, vol. 2, pp. 1259–1262 (1997)
Omologo, M., Svaizer, P., Matassoni, M.: Environmental conditions and acoustic transduction in hands-free speech recognition. Speech Communication 25(1-3), 75–95 (1998)
Haderlein, T., Stemmer, G., Nöth, E.: Speech Recognition with μ-Law Companded Features on Reverberated Signals. In: MatouÅ¡ek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 173–180. Springer, Heidelberg (2003)
Stemmer, G.: Modeling Variability in Speech Recognition. PhD thesis, Chair for Pattern Recognition, University of Erlangen-Nuremberg, Germany (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Haderlein, T., Nöth, E., Herbordt, W., Kellermann, W., Niemann, H. (2005). Using Artificially Reverberated Training Data in Distant-Talking ASR. In: MatouÅ¡ek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_29
Download citation
DOI: https://doi.org/10.1007/11551874_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28789-6
Online ISBN: 978-3-540-31817-0
eBook Packages: Computer ScienceComputer Science (R0)