Skip to main content

Speech Recognition with μ -Law Companded Features on Reverberated Signals

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2807))

Abstract

One of the goals of the EMBASSI project is the creation of a speech interface between a user and a TV set or VCR. The interface should allow spontaneous speech recorded by microphones far away from the speaker. This paper describes experiments evaluating the robustness of a speech recognizer against reverberation. For this purpose a speech corpus was recorded with several different distortion types under real-life conditions. On these data the recognition results for reverberated signals using μ -law companded features were compared to an MFCC baseline system. Trained with clear speech, the word accuracy for the μ -law features on highly reverberated signals was 3 percent points better than the baseline result.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Junqua, J.-C.: Robust Speech Recognition in Embedded Systems and PC Applications. Kluwer Academic Publishers, Boston (2001)

    Google Scholar 

  2. Hunt, M.J.: Spectral Signal Processing for ASR. In: Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Keystone, Colorado, vol. 1, pp. 17–25 (1999)

    Google Scholar 

  3. Lim, J.S.: Spectral Root Homomorphic Deconvolution System. IEEE Trans. ASSP 27(3), 223–233 (1979)

    Article  MATH  Google Scholar 

  4. Sarikaya, R., Hansen, J.H.L.: Analysis of the Root-Cepstrum for Acoustic Modeling and Fast Decoding in Speech Recognition. In: Proc. European Conf. on Speech Communication and Technology (Eurospeech), Aalborg, Denmark, vol. 1, pp. 687–690 (2001)

    Google Scholar 

  5. Hermansky, H.: Perceptual Linear Predictive (PLP) Analysis of Speech. The Journal of The Acoustical Society of America 87(4), 1738–1752 (1990)

    Article  Google Scholar 

  6. Koehler, J., Morgan, N., Hermansky, H., Hirsch, H.G., Tong, G.: Integrating RASTA-PLP into Speech Recognition. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Adelaide, Australia, pp. 421–424 (1994)

    Google Scholar 

  7. Kingsbury, B.E.D., Morgan, N.: Recognizing Reverberant Speech with RASTA-PLP. In: Proc. Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Munich, Germany, vol. 2, pp. 1259–1262 (1997)

    Google Scholar 

  8. Gelbart, D., Morgan, N.: Double the Trouble: Handling Noise and Reverberation in Far-Field Automatic Speech Recognition. In: Proc. Int. Conf. on Spoken Language Processing (ICSLP), Denver, Colorado, vol. 3, pp. 2185–2188 (2002)

    Google Scholar 

  9. Pan, Y., Waibel, A.: The Effects of Room Acoustics on MFCC Speech Parameter. In: Proc. Int. Conf. on Spoken Language Processing (ICSLP), Beijing, China, vol. IV, pp. 129–133 (2000)

    Google Scholar 

  10. Omologo, M., Svaizer, P., Matassoni, M.: Environmental conditions and acoustic transduction in hands-free speech recognition. Speech Communication 25(1-3), 75–95 (1998)

    Article  Google Scholar 

  11. Morgan, N., Hermansky, H.: RASTA Extensions: Robustness to Additive and Convolutional Noise. In: Proc. Workshop on Speech Processing in Adverse Conditions. Cannes, France (1992)

    Google Scholar 

  12. Alexandre, P., Lockwood, P.: Root cepstral analysis: A unified view. Application to speech processing in car noise environments 12(3), 277–288 (1993)

    Google Scholar 

  13. Lockwood, P., Alexandre, P.: Root Adaptive Homomorphic Deconvolution Schemes for Speech Recognition in Noise. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Adelaide, Australia, vol. 1, pp. 441–444 (1994)

    Google Scholar 

  14. Weiß, R.: Anwendung von KNN zur Beseitigung der raumbedingten Störungen in einem Sprachsignal. Student Thesis, Chair for Pattern Recognition, University of Erlangen-Nuremberg (2002) (in German)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Haderlein, T., Stemmer, G., Nöth, E. (2003). Speech Recognition with μ -Law Companded Features on Reverberated Signals. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39398-6_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20024-6

  • Online ISBN: 978-3-540-39398-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics