Abstract
An important topic in Automatic Speech Recognition (ASR) is to reduce the effect of noise, in particular when mismatch exists between the training and application conditions.
Many noise robutness schemes within the feature processing domain use as a prerequisite a noise estimate prior to the appearance of the speech signal which require noise robust voice activity detection and assumptions of stationary noise. However, both of these requirements are often not met and it is therefore of particular interest to investigate methods like the Quantile Based Noise Estimation (QBNE) mehtod which estimates the noise during speech and non-speech sections without the use of a voice activity detector. While the standard QBNE-method uses a fixed pre-defined quantile accross all frequency bands, this paper suggests adaptive QBNE (AQBNE) which adapts the quantile individually to each frequency band.
Furthermore the paper investigates an alternative to the standard mel frequency cepstral coefficient filter bank (MFCC), an empirically chosen Speech Band Emphasizing filter bank (SBE), which improves the resolution in the speech band.
The combinations of AQBNE and SBE are tested on the Danish SpeechDat-Car database and compared to the performance achieved by the standards presented by the Aurora consortium (Aurora Baseline and Aurora Advanced Fronted). For the High Mismatch (HM) condition, the AQBNE achieves significantly better performance compared to the Aurora Baseline, both when combined with SBE and standard MFCC. AQBNE also outperforms the Aurora Baseline for the Medium Mismatch (MM) and Well Matched (WM) conditions. Though for all three conditions, the Aurora Advanced Frontend achieves superior performance, the AQBNE is still a relevant method to consider for small foot print applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Stahl, V., Fischer, A., Bippus, R.: Quantile Based Noise Estimation for Spectral Subtraction and Wiener Filtering. In: ICSLP 2000, pp. 1–4 (2000)
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustics, Speech, Signal Processing 28(4), 357–366
European Telecommunications Standards Institute, ”ES 201 108 v.1.1.2” (2000), http://www.etsi.org/
Moreno, A., Lindberg, B., Draxler, C., Richard, G., Choukri, K., Euler, S., Allen, J.: SpeechDat-Car. A Large Speech Database for Automotive Environments. In: LREC 2000, pp. 1–6 (2000)
Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.2.1) (2002), http://htk.eng.cam.ac.uk
Macho, D., Mauurary, L., No, B., Cheng, Y.M., Ealey, D., Jouver, D., Kelleher, H., Pearce, D., Saadoun, F.: Evaluation of a Noise-Robust DSR Front-end on Aurora Databases. In: Proc. ICSLP 2002, Denver, Colorado, pp. 17–21 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bonde, C.S. et al. (2006). Noise Robust Automatic Speech Recognition with Adaptive Quantile Based Noise Estimation and Speech Band Emphasizing Filter Bank. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds) Nonlinear Analyses and Algorithms for Speech Processing. NOLISP 2005. Lecture Notes in Computer Science(), vol 3817. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11613107_26
Download citation
DOI: https://doi.org/10.1007/11613107_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31257-4
Online ISBN: 978-3-540-32586-4
eBook Packages: Computer ScienceComputer Science (R0)