Noise Robust Automatic Speech Recognition with Adaptive Quantile Based Noise Estimation and Speech Band Emphasizing Filter Bank

Bonde, Casper Stork; Graversen, Carina; Gregersen, Andreas Gregers; Ngo, Kim Hoang; Nørmark, Kim; Purup, Mikkel; Thorsen, Thomas; Lindberg, Børge

doi:10.1007/11613107_26

Casper Stork Bonde²³,
Carina Graversen²³,
Andreas Gregers Gregersen²³,
Kim Hoang Ngo²³,
Kim Nørmark²³,
Mikkel Purup²³,
Thomas Thorsen²³ &
…
Børge Lindberg²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3817))

Included in the following conference series:

International Conference on Nonlinear Analyses and Algorithms for Speech Processing

712 Accesses

Abstract

An important topic in Automatic Speech Recognition (ASR) is to reduce the effect of noise, in particular when mismatch exists between the training and application conditions.

Many noise robutness schemes within the feature processing domain use as a prerequisite a noise estimate prior to the appearance of the speech signal which require noise robust voice activity detection and assumptions of stationary noise. However, both of these requirements are often not met and it is therefore of particular interest to investigate methods like the Quantile Based Noise Estimation (QBNE) mehtod which estimates the noise during speech and non-speech sections without the use of a voice activity detector. While the standard QBNE-method uses a fixed pre-defined quantile accross all frequency bands, this paper suggests adaptive QBNE (AQBNE) which adapts the quantile individually to each frequency band.

Furthermore the paper investigates an alternative to the standard mel frequency cepstral coefficient filter bank (MFCC), an empirically chosen Speech Band Emphasizing filter bank (SBE), which improves the resolution in the speech band.

The combinations of AQBNE and SBE are tested on the Danish SpeechDat-Car database and compared to the performance achieved by the standards presented by the Aurora consortium (Aurora Baseline and Aurora Advanced Fronted). For the High Mismatch (HM) condition, the AQBNE achieves significantly better performance compared to the Aurora Baseline, both when combined with SBE and standard MFCC. AQBNE also outperforms the Aurora Baseline for the Medium Mismatch (MM) and Well Matched (WM) conditions. Though for all three conditions, the Aurora Advanced Frontend achieves superior performance, the AQBNE is still a relevant method to consider for small foot print applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Stahl, V., Fischer, A., Bippus, R.: Quantile Based Noise Estimation for Spectral Subtraction and Wiener Filtering. In: ICSLP 2000, pp. 1–4 (2000)
Google Scholar
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustics, Speech, Signal Processing 28(4), 357–366
Google Scholar
European Telecommunications Standards Institute, ”ES 201 108 v.1.1.2” (2000), http://www.etsi.org/
Moreno, A., Lindberg, B., Draxler, C., Richard, G., Choukri, K., Euler, S., Allen, J.: SpeechDat-Car. A Large Speech Database for Automotive Environments. In: LREC 2000, pp. 1–6 (2000)
Google Scholar
Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.2.1) (2002), http://htk.eng.cam.ac.uk
Macho, D., Mauurary, L., No, B., Cheng, Y.M., Ealey, D., Jouver, D., Kelleher, H., Pearce, D., Saadoun, F.: Evaluation of a Noise-Robust DSR Front-end on Aurora Databases. In: Proc. ICSLP 2002, Denver, Colorado, pp. 17–21 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Communication Technology, Aalborg University, Niels Jernes Vej 12, DK-9220, Aalborg Ø
Casper Stork Bonde, Carina Graversen, Andreas Gregers Gregersen, Kim Hoang Ngo, Kim Nørmark, Mikkel Purup, Thomas Thorsen & Børge Lindberg

Authors

Casper Stork Bonde
View author publications
You can also search for this author in PubMed Google Scholar
Carina Graversen
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Gregers Gregersen
View author publications
You can also search for this author in PubMed Google Scholar
Kim Hoang Ngo
View author publications
You can also search for this author in PubMed Google Scholar
Kim Nørmark
View author publications
You can also search for this author in PubMed Google Scholar
Mikkel Purup
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Thorsen
View author publications
You can also search for this author in PubMed Google Scholar
Børge Lindberg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escola Universitària Politècnica de Mataró, UPC, Spain
Marcos Faundez-Zanuy
Escola Universitària Politècnica de Mataró, Spain
Léonard Janer & Antonio Satue-Villar &
Department of Psychology, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare, (SA), Italy
Anna Esposito
The Auton Lab, Carnegie Mellon University, Pittsburgh, PA, USA
Josep Roure
Escola Universitària Politècnica de Mataró (UPC), Barcelona, Spain
Virginia Espinosa-Duro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bonde, C.S. et al. (2006). Noise Robust Automatic Speech Recognition with Adaptive Quantile Based Noise Estimation and Speech Band Emphasizing Filter Bank. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds) Nonlinear Analyses and Algorithms for Speech Processing. NOLISP 2005. Lecture Notes in Computer Science(), vol 3817. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11613107_26

Download citation

DOI: https://doi.org/10.1007/11613107_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31257-4
Online ISBN: 978-3-540-32586-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics