Speech Recognition with μ -Law Companded Features on Reverberated Signals

Haderlein, Tino; Stemmer, Georg; Nöth, Elmar

doi:10.1007/978-3-540-39398-6_25

Speech Recognition with μ -Law Companded Features on Reverberated Signals

Tino Haderlein⁷,
Georg Stemmer⁷ &
Elmar Nöth⁷

Conference paper

418 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2807))

Abstract

One of the goals of the EMBASSI project is the creation of a speech interface between a user and a TV set or VCR. The interface should allow spontaneous speech recorded by microphones far away from the speaker. This paper describes experiments evaluating the robustness of a speech recognizer against reverberation. For this purpose a speech corpus was recorded with several different distortion types under real-life conditions. On these data the recognition results for reverberated signals using μ -law companded features were compared to an MFCC baseline system. Trained with clear speech, the word accuracy for the μ -law features on highly reverberated signals was 3 percent points better than the baseline result.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Junqua, J.-C.: Robust Speech Recognition in Embedded Systems and PC Applications. Kluwer Academic Publishers, Boston (2001)
Google Scholar
Hunt, M.J.: Spectral Signal Processing for ASR. In: Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Keystone, Colorado, vol. 1, pp. 17–25 (1999)
Google Scholar
Lim, J.S.: Spectral Root Homomorphic Deconvolution System. IEEE Trans. ASSP 27(3), 223–233 (1979)
Article MATH Google Scholar
Sarikaya, R., Hansen, J.H.L.: Analysis of the Root-Cepstrum for Acoustic Modeling and Fast Decoding in Speech Recognition. In: Proc. European Conf. on Speech Communication and Technology (Eurospeech), Aalborg, Denmark, vol. 1, pp. 687–690 (2001)
Google Scholar
Hermansky, H.: Perceptual Linear Predictive (PLP) Analysis of Speech. The Journal of The Acoustical Society of America 87(4), 1738–1752 (1990)
Article Google Scholar
Koehler, J., Morgan, N., Hermansky, H., Hirsch, H.G., Tong, G.: Integrating RASTA-PLP into Speech Recognition. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Adelaide, Australia, pp. 421–424 (1994)
Google Scholar
Kingsbury, B.E.D., Morgan, N.: Recognizing Reverberant Speech with RASTA-PLP. In: Proc. Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Munich, Germany, vol. 2, pp. 1259–1262 (1997)
Google Scholar
Gelbart, D., Morgan, N.: Double the Trouble: Handling Noise and Reverberation in Far-Field Automatic Speech Recognition. In: Proc. Int. Conf. on Spoken Language Processing (ICSLP), Denver, Colorado, vol. 3, pp. 2185–2188 (2002)
Google Scholar
Pan, Y., Waibel, A.: The Effects of Room Acoustics on MFCC Speech Parameter. In: Proc. Int. Conf. on Spoken Language Processing (ICSLP), Beijing, China, vol. IV, pp. 129–133 (2000)
Google Scholar
Omologo, M., Svaizer, P., Matassoni, M.: Environmental conditions and acoustic transduction in hands-free speech recognition. Speech Communication 25(1-3), 75–95 (1998)
Article Google Scholar
Morgan, N., Hermansky, H.: RASTA Extensions: Robustness to Additive and Convolutional Noise. In: Proc. Workshop on Speech Processing in Adverse Conditions. Cannes, France (1992)
Google Scholar
Alexandre, P., Lockwood, P.: Root cepstral analysis: A unified view. Application to speech processing in car noise environments 12(3), 277–288 (1993)
Google Scholar
Lockwood, P., Alexandre, P.: Root Adaptive Homomorphic Deconvolution Schemes for Speech Recognition in Noise. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Adelaide, Australia, vol. 1, pp. 441–444 (1994)
Google Scholar
Weiß, R.: Anwendung von KNN zur Beseitigung der raumbedingten Störungen in einem Sprachsignal. Student Thesis, Chair for Pattern Recognition, University of Erlangen-Nuremberg (2002) (in German)
Google Scholar

Download references

Author information

Authors and Affiliations

Chair for Pattern Recognition (Informatik 5), University of Erlangen-Nuremberg, Martensstr. 3, 91058, Erlangen, Germany
Tino Haderlein, Georg Stemmer & Elmar Nöth

Authors

Tino Haderlein
View author publications
You can also search for this author in PubMed Google Scholar
Georg Stemmer
View author publications
You can also search for this author in PubMed Google Scholar
Elmar Nöth
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of West Bohemia in Pilsen, Univerzitni 8, 30614, Plzen, Czech Republic
Václav Matoušek & Pavel Mautner &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Haderlein, T., Stemmer, G., Nöth, E. (2003). Speech Recognition with μ -Law Companded Features on Reverberated Signals. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-39398-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20024-6
Online ISBN: 978-3-540-39398-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics