Abstract
In recent years, there has been a renaissance of research on the role of the spectral phase in single-channel speech enhancement. One of the recent proposals is to not only estimate the clean speech phase but also use this phase estimate as an additional source of information to facilitate the estimation of the clean speech magnitude. To assess the potential benefit of such approaches, in this paper we systematically explore in which situations additional information about the clean speech phase is most valuable. For this, we compare the performance of phase-aware and phase-blind clean speech estimators in different noise scenarios, i.e. at different signal to noise ratios (SNRs) and for noise sources with different degrees of stationarity. Interestingly, the results indicate that the greatest benefits can be achieved in situations where conventional magnitude-only speech enhancement is most challenging, namely in highly non-stationary noises at low SNRs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Breithaupt, C., Gerkmann, T., Martin, R.: A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, pp. 4897–4900 (2008)
Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)
Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)
Erkelens, J.S., Hendriks, R.C., Heusdens, R., Jensen, J.: Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors. IEEE Trans. Audio Speech Lang. Process. 15(6), 1741–1752 (2007)
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L.: DARPA TIMIT acoustic phonetic continuous speech corpus CDROM (1993)
Gerkmann, T.: Bayesian estimation of clean speech spectral coefficients given a priori knowledge of the phase. IEEE Trans. Signal Process. 62(16), 4199–4208 (2014)
Gerkmann, T., Hendriks, R.C.: Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE Trans. Audio Speech Lang. Process. 20(4), 1383–1393 (2012)
Gerkmann, T., Krawczyk, M.: MMSE-optimal spectral amplitude estimation given the STFT-phase. IEEE Signal Process. Lett. 20(2), 129–132 (2013)
Gerkmann, T., Krawczyk, M., Rehr, R.: Phase estimation in speech enhancement – unimportant, important, or impossible? In: IEEE Convention of Electrical and Electronics Engineers in Israel, Eilat, Israel (2012)
Gerkmann, T., Krawczyk-Becker, M., Le Roux, J.: Phase processing for single channel speech enhancement: history and recent advances. IEEE Signal Process. Mag. 32(2), 55–66 (2015)
Gonzalez, S., Brookes, M.: PEFAC - a pitch estimation algorithm robust to high levels of noise. IEEE Trans. Audio Speech Lang. Process. 22(2), 518–530 (2014)
Griffin, D.W., Lim, J.S.: Signal estimation from modified short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 32(2), 236–243 (1984)
Hendriks, R.C., Gerkmann, T., Jensen, J.: DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement: A Survey of the State-of-the-Art. Morgan & Claypool, Colorado (2013)
Hendriks, R.C., Jensen, J., Heusdens, R.: Noise tracking using DFT domain subspace decompositions. IEEE Trans. Audio Speech Lang. Process. 16(3), 541–553 (2008)
ITU-T: Perceptual evaluation of speech quality (PESQ). ITU-T Recommendation P.862 (2001)
Krawczyk, M., Gerkmann, T.: STFT phase reconstruction in voiced speech for an improved single-channel speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1931–1940 (2014)
Krawczyk-Becker, M., Gerkmann, T.: An evaluation of the perceptual quality of phase-aware single-channel speech enhancement. J. Acoust. Soc. Am. 140(4), EL364–EL369 (2016)
Krawczyk-Becker, M., Gerkmann, T.: On MMSE-based estimation of spectral speech coefficients under phase-uncertainty. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2251–2262 (2016)
Le Roux, J., Vincent, E.: Consistent Wiener filtering for audio source separation. IEEE Signal Process. Lett. 20(3), 217–220 (2013)
Martin, R.: Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9(5), 504–512 (2001)
Martin, R.: Speech enhancement based on minimum mean-square error estimation and supergaussian priors. IEEE Trans. Speech Audio Process. 13(5), 845–856 (2005)
Mowlaee, P., Kulmer, J.: Harmonic phase estimation in single-channel speech enhancement using phase decomposition and SNR information. IEEE/ACM Trans. Audio Speech Lang. Process. 23(9), 1521–1532 (2015)
Mowlaee, P., Saeidi, R.: Iterative closed-loop phase-aware single-channel speech enhancement. IEEE Signal Process. Lett. 20(12), 1235–1239 (2013)
Paliwal, K., Wójcicki, K., Shannon, B.: The importance of phase in speech enhancement. ELSEVIER Speech Commun. 53(4), 465–494 (2011)
Sturmel, N., Daudet, L.: Signal reconstruction from STFT magnitude: a state of the art. In: International Conference on Digital Audio Effects (DAFx), Paris, France, pp. 375–386 (2011)
Taal, C.H., Hendriks, R.C., Heusdens, R., Jensen, J.: An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)
Wang, D.L., Lim, J.S.: The unimportance of phase in speech enhancement. IEEE Trans. Acoust. Speech Signal Process. 30(4), 679–681 (1982)
You, C.H., Koh, S.N., Rahardja, S.: \(\beta \)-order MMSE spectral amplitude estimation for speech enhancement. IEEE Trans. Speech Audio Process. 13(4), 475–486 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Krawczyk-Becker, M., Gerkmann, T. (2018). A Study on the Benefits of Phase-Aware Speech Enhancement in Challenging Noise Scenarios. In: Deville, Y., Gannot, S., Mason, R., Plumbley, M., Ward, D. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2018. Lecture Notes in Computer Science(), vol 10891. Springer, Cham. https://doi.org/10.1007/978-3-319-93764-9_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-93764-9_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93763-2
Online ISBN: 978-3-319-93764-9
eBook Packages: Computer ScienceComputer Science (R0)