Automatic Speech Recognition Using Missing Data Techniques: Handling of Real-World Data

Gemmeke, Jort F.; Van Segbroeck, Maarten; Wang, Yujun; Cranen, Bert; Van hamme, Hugo

doi:10.1007/978-3-642-21317-5_7

Jort F. Gemmeke³,
Maarten Van Segbroeck⁴,
Yujun Wang⁴,
Bert Cranen³ &
…
Hugo Van hamme⁴

871 Accesses
4 Citations

Abstract

In this chapter, we investigate the performance of a missing data recognizer on real-world speech from the SPEECON and SpeechDat-Car databases. In previous work we hypothesized that in real-world speech, which is corrupted not only by environmental noise, but also by speaker, reverberation and channel effects, the ‘reliable’ features do not match an acoustic model trained on clean speech. In a series of experiments, we investigate the validity of this hypothesis and explore to what extent performance can be improved by combining MDT with three conventional techniques, viz. multi-condition training, dereverberation and feature enhancement. Our results confirm our hypothesis and show that the mismatch can be reduced by multi-condition training of the acoustic models and feature enhancement, and that these effects combine to some degree. Our experiments with dereverberation reveal that reverberation can have a major impact on recognition performance, but that MDT with a suitable missing data mask is capable of compensating both the environmental noise as well as the reverberation at once.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Spraak: Speech processing, recognition and automatic annotation kit. Website (1996). http://www.spraak.org/
ETSI standard document: Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; ES 202 050 V1.1.5 (2007)
Google Scholar
Astudillo, R.F., Kolossa, D., Mandelartz, P., Orglmeister, R.: An uncertainty propagation approach to robust ASR using the ETSI advanced front-end. IEEE Journal of Selected Topics in Signal Processing 4, 824 833 (2010)
Google Scholar
C. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning pp. 273–297 (1995)
Google Scholar
Chang, C., Lin, C.: LIBSVM: A library for support vector machines (2001)
Google Scholar
Cooke, M., Green, P., Crawford, M.: Handling missing data in speech recognition. In: ICSLP-1994, pp. 1555–1558 (1994)
Google Scholar
Cooke, M., Green, P., Josifovksi, L., Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 34, 267–285 (2001)
Article MATH Google Scholar
Delcroix, M., Nakatani, T., Watanabe, S.: Static and dynamic variance compensation for recognition of reverberant speech with dereverberation preprocessing. IEEE Transactions on Audio, Speech, and Language Processing 17(2), 324–334 (2009)
Article Google Scholar
Demange, S., Cerisara, C., Haton, J.P.: Accurate marginalization range for missing data recognition. In: Proc. of Interspeech, pp. 27–31 (2007)
Google Scholar
Demuynck, K., Duchateau, J., Compernolle, D.V.: Optimal feature sub-space selection based on discriminant analysis. In: Proc. of European Conference on Speech Communication and Technology, vol. 3, pp. 1311–1314 (1999)
Google Scholar
Duchateau, J., Demuynck, K., Compernolle, D.V.: Fast and accurate acoustic modelling with semicontinuous HMMs. Speech Communication 24(1), 5–17 (1998)
Article Google Scholar
Duchateau, J., Demuynck, K., Wambacq, D.V.C.P.: Improved parameter tying for efficient acoustic model evaluation in large vocabulary continuous speech recognition. In: Proc. ICSLP, vol. V, pp. 2215–2218. Sydney, Australia (1998)
Google Scholar
Gemmeke, J.F., Wang, Y., Van Segbroeck, M., Cranen, B., Van hamme, H.: Application of noise robust MDT speech recognition on the SPEECON and SpeechDat-Car databases. In: Proc. of Interspeech. Brighton, UK (2009)
Google Scholar
Hirsch, H., Pearce, D.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: Proc. of ISCA ASR2000 Workshop, pp. 181–188 (2000)
Google Scholar
Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall (2001)
Google Scholar
Iskra, D., Grosskopf, B., Marasek, K., van den Heuvel, H., Diehl, F., Kiessling, A.: SPEECON — speech databases for consumer devices: Database specification and validation. In: Proc. of LREC, pp. 329–333 (2002)
Google Scholar
Josifovski, L., Cooke, M., Green, P., Vizinho, A.: State based imputation of missing data for robust speech recognitionand speech enhancement. In: Proc. of Eurospeech (1999)
Google Scholar
Kamath, S., Loizou, P.: A multi-band spectral subtraction method for enhancing speech. In: Proc. of ICASSP (2002)
Google Scholar
Kim, W., Hansen, J.H.L.: Time-frequency correlation-based missing-feature reconstruction for robust speech recognition in band-restricted conditions. IEEE Transactions on Audio, Speech, and Language Processing 17(7), 1292–1304 (2009)
Article Google Scholar
Martin, R.: Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing 9, 504–512 (2001)
Article Google Scholar
Palomäki, K.J., Brown, G.J., Barker, J.: Techniques for handling convolutional distortion with “missing data” automatic speech recognition. Speech Communication 43, 123–142 (2004)
Article Google Scholar
Parihar, N., Picone, J.: An analysis of the Aurora large vocabulary evaluation. In: Proc. of Eurospeech, pp. 337–340 (2003)
Google Scholar
Raj, B., Seltzer, M., Stern, R.: Reconstruction of missing features for robust speech recognition. Speech Communication 43, 275–296 (2004)
Article Google Scholar
Raj, B., Singh, R., Stern, R.: Inference of missing spectrographic features for robust automatic speech recognition. In: Proc. of International Conference on Spoken Language Processing, pp. 1491–1494 (1998)
Google Scholar
Raj, B., Stern, R.: Missing-feature approaches in speech recognition. Signal Processing Magazine 22(5), 101–116 (2005)
Article Google Scholar
Ramírez, J., Górriz, J., Segura, J., Puntonet, C., Rubio, A.: Speech/non-speech discrimination based on contextual information integrated bispectrum LRT. In: IEEE Signal Processing Letters (2006)
Google Scholar
Remes, U., Palomäki, K.J., Kurimo, M.: Missing feature reconstruction and acoustic model adaptation combined for large vocabulary continuous speech recognition. In: Proc. of EUSIPCO (2008)
Google Scholar
Seltzer, M., Raj, B., Stern, R.: A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication 43, 379–393 (2004)
Article Google Scholar
Stouten, V.: Robust automatic speech recognition in time-varying environments. Ph.D. thesis, K. U. Leuven (2006)
Google Scholar
van den Heuvel, H., Boudy, J., Comeyne, R., Communications, M.N.: The SpeechDat-Car multilingual speech databases for in-car applications. In: Proc. of the European Conference on Speech Communication and Technology, pp. 2279–2282 (1999)
Google Scholar
Van hamme, H.: Robust speech recognition using missing feature theory in the cepstral or LDA domain. In: Proc. of European Conference on Speech Communication and Technology, pp. 3089–3092 (2003)
Google Scholar
Van hamme, H.: Prospect features and their application to missing data techniques for robust speech recognition. In: Proc. of Interspeech, pp. 101–104 (2004)
Google Scholar
Van hamme, H.: Robust speech recognition using cepstral domain missing data techniques and noisy masks. In: Proc. of ICASSP, vol. 1, pp. 213–216 (2004)
Google Scholar
Van hamme, H.: Handling time-derivative features in a missing data framework for robust automatic speech recognition. In: Proc. of ICASSP (2006)
Google Scholar
Van Segbroeck, M.: Robust large vocabulary continuous speech recognition using missing data techniques. Ph.D. thesis, K. U. Leuven (2010)
Google Scholar
Van Segbroeck, M., Van hamme, H.: Handling convolutional noise in missing data automatic speech recognition. In: Proc. of ICASSP, pp. 2562–2565 (2006)
Google Scholar
Van Segbroeck, M., Van hamme, H.: Vector-quantization based mask estimation for missing data automatic speech recognition. In: Proc. of ICSLP, pp. 910–913 (2007)
Google Scholar
van Waterschoot, T., Rombouts, G., Verhoeve, P., Moonen, M.: Double-talk-robust prediction error identification algorithms for acoustic echo cancellation. IEEE Transactions on Signal Processing 55(3), 846–858 (2007)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Linguistics, Radboud University Nijmegen, Nijmegen, The Netherlands
Jort F. Gemmeke & Bert Cranen
ESAT Department, Katholieke Universiteit Leuven, Leuven, Belgium
Maarten Van Segbroeck, Yujun Wang & Hugo Van hamme

Authors

Jort F. Gemmeke
View author publications
You can also search for this author in PubMed Google Scholar
Maarten Van Segbroeck
View author publications
You can also search for this author in PubMed Google Scholar
Yujun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bert Cranen
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Van hamme
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jort F. Gemmeke .

Editor information

Editors and Affiliations

Institute of Communication Acoustics, Ruhr-Universität Bochum, Universitätsstrasse 150, Bochum, 44801, Germany
Dorothea Kolossa
, Dept. of Communications Engineering, University of Paderborn, Warburger Strasse 100, Paderborn, 33098, Germany
Reinhold Häb-Umbach

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gemmeke, J.F., Van Segbroeck, M., Wang, Y., Cranen, B., Van hamme, H. (2011). Automatic Speech Recognition Using Missing Data Techniques: Handling of Real-World Data. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-21317-5_7
Published: 23 June 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21316-8
Online ISBN: 978-3-642-21317-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics