Single channel noise reduction system in low SNR
We propose a two stage noise reduction system for reducing background noise using single-microphone recordings in very low signal-to-noise ratio (SNR) based on Wiener filtering and ideal binary masking. The proposed system contains two stages. In first stage, the Wiener filtering with improved a priori SNR is applied to noisy speech for background noise reduction. In second stage, the ideal binary mask is estimated at every time–frequency channel by using pre-processed first stage speech and comparing the time–frequency channels against a pre-selected threshold T to reduce the residual noise. The time–frequency channels satisfying the threshold are preserved whereas all other time–frequency channels are attenuated. The results revealed substantial improvements in speech intelligibility and quality over that accomplished with the traditional noise reduction algorithms and unprocessed speech.
KeywordsIdeal binary masking SNR Speech intelligibility Speech quality Wiener filtering
- Boldt, J. B., & Ellis, D. (2009). A simple correlation-based model of intelligibility for nonlinear speech enhancement and separation. In Proc. EUSIPCO’09, Glasgow, August 2009 (pp. 1849–1853).Google Scholar
- Boldt, J. B., Kjems, U., Pedersen, M. S., Lunner, T., & Wang, D. (2008). Estimation of the ideal binary mask using directional systems. In Proc. int. workshop acoust. echo and noise control (pp. 1–4)Google Scholar
- Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. In IEEE transactions on acoustics, speech, and signal processing, ASSP (Vol. 27, pp. 113–120). doi: 10.1109/TASSP.1979.1163209.
- Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. In IEEE transactions on acoustics, speech, signal processing, ASSP (Vol. 23, No. 2, pp. 443–445). doi: 10.1109/TASSP.1985.1164550.
- Hansen, J., & Pellom, B. (1998). An effective quality evaluation protocol for speech enhancement algorithms. In International Conference on Spoken Language Processing, 7(2819), 2822.Google Scholar
- Hirsch, H., & Pearce, D. (2000). The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ISCA ITRW ASR2000, Paris.Google Scholar
- ITU-T P.835. (2003). Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm.Google Scholar
- ITU-T Recommendation P.56. (1993). Objective measurement of active speech level.Google Scholar
- Klatt, D. (1982). Prediction of perceived phonetic distance from critical band spectra. In Proc. IEEE int. conf. acoust., speech, signal processing (Vol. 7, pp. 1278–1281). doi: 10.1109/ICASSP.1982.1171512.
- Lim, J, & Oppenheim, A. V. (1978). All-pole modeling of degraded speech. In IEEE trans. acoust., speech, signal proc., ASSP (Vol. 26, No. 3, pp. 197–210). doi: 10.1109/TASSP.1978.1163086.
- Loizou, P. C. (2007). Speech enhancement: Theory and practice. Boca Raton, FL: CRC Press.Google Scholar
- Quackenbush, S., Barnwell, T., & Clements, M. (1988). Objective measures of speech quality. Eaglewood Cliffs, NJ: Prentice-Hall.Google Scholar
- Saleem, N., Shafi, M., Mustafa, E., & Nawaz, A. (2015b). A novel binary mask estimation based on spectral subtraction gain-induced distortions for improved speech intelligibility and quality. Technical Journal, UET, Taxila, 20(4), 35–42.Google Scholar
- Scalart, P., & Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. In Proc. IEEE int. conf. acoust., speech, signal processing (pp. 629–632). doi: 10.1109/ICASSP.1996.543199.
- Wang, D. (2005). On ideal binary mask as the computational goal of auditory scene analysis. In Speech separation by humans and machines (pp. 181–197). doi: 10.1007/0-387-22794-6_12.