Using Artificially Reverberated Training Data in Distant-Talking ASR

Haderlein, Tino; Nöth, Elmar; Herbordt, Wolfgang; Kellermann, Walter; Niemann, Heinrich

doi:10.1007/11551874_29

Tino Haderlein¹⁹,
Elmar Nöth¹⁹,
Wolfgang Herbordt²⁰,
Walter Kellermann²⁰ &
…
Heinrich Niemann¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3658))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

681 Accesses
5 Citations

Abstract

Automatic Speech Recognition (ASR) in reverberant rooms can be improved by choosing training data from the same acoustical environment as the test data. In a real-world application this is often not possible. A solution for this problem is to use speech signals from a close-talking microphone and reverberate them artificially with multiple room impulse responses. This paper shows results on recognizers whose training data differ in size and percentage of reverberated signals in order to find the best combination for data sets with different degrees of reverberation. The average error rate on a close-talking and a distant-talking test set could thus be reduced by 29% relative.

Our work was partially supported by the German Federal Ministry of Education and Research (grant no. 01 IMD 01 F) in the frame of the SmartWeb project. The responsibility for the contents of this study lies with the authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Couvreur, L., Couvreur, C., Ris, C.: A Corpus-Based Approach for Robust ASR in Reverberant Environments. In: Proc. of International Conference on Spoken Language Processing (ICSLP), Beijing, China, vol. 1, pp. 397–400 (2000)
Google Scholar
Stahl, V., Fischer, A., Bippus, R.: Acoustic Synthesis of Training Data for Speech Recognition in Living Room Environments. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, Utah, vol. 1, pp. 21–24 (2001)
Google Scholar
Junqua, J.-C.: Robust Speech Recognition in Embedded Systems and PC Applications. Kluwer Academic Publishers, Boston (2001)
Google Scholar
Kingsbury, B.E.D., Morgan, N.: Recognizing Reverberant Speech with RASTA-PLP. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Munich, Germany, vol. 2, pp. 1259–1262 (1997)
Google Scholar
Omologo, M., Svaizer, P., Matassoni, M.: Environmental conditions and acoustic transduction in hands-free speech recognition. Speech Communication 25(1-3), 75–95 (1998)
Article Google Scholar
Haderlein, T., Stemmer, G., Nöth, E.: Speech Recognition with μ-Law Companded Features on Reverberated Signals. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 173–180. Springer, Heidelberg (2003)
Chapter Google Scholar
Stemmer, G.: Modeling Variability in Speech Recognition. PhD thesis, Chair for Pattern Recognition, University of Erlangen-Nuremberg, Germany (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Chair for Pattern Recognition, (Informatik 5), University of Erlangen-Nuremberg, Martensstraße 3, 91058, Erlangen, Germany
Tino Haderlein, Elmar Nöth & Heinrich Niemann
Chair of Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstraße 7, 91058, Erlangen, Germany
Wolfgang Herbordt & Walter Kellermann

Authors

Tino Haderlein
View author publications
You can also search for this author in PubMed Google Scholar
Elmar Nöth
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Herbordt
View author publications
You can also search for this author in PubMed Google Scholar
Walter Kellermann
View author publications
You can also search for this author in PubMed Google Scholar
Heinrich Niemann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of West Bohemia in Pilsen, Univerzitni 8, 30614, Plzen, Czech Republic
Václav Matoušek , Pavel Mautner & Tomáš Pavelka , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Haderlein, T., Nöth, E., Herbordt, W., Kellermann, W., Niemann, H. (2005). Using Artificially Reverberated Training Data in Distant-Talking ASR. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_29

Download citation

DOI: https://doi.org/10.1007/11551874_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28789-6
Online ISBN: 978-3-540-31817-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics