A Study on the Benefits of Phase-Aware Speech Enhancement in Challenging Noise Scenarios

Krawczyk-Becker, Martin; Gerkmann, Timo

doi:10.1007/978-3-319-93764-9_38

Martin Krawczyk-Becker¹⁸ &
Timo Gerkmann¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10891))

Included in the following conference series:

International Conference on Latent Variable Analysis and Signal Separation

1770 Accesses

Abstract

In recent years, there has been a renaissance of research on the role of the spectral phase in single-channel speech enhancement. One of the recent proposals is to not only estimate the clean speech phase but also use this phase estimate as an additional source of information to facilitate the estimation of the clean speech magnitude. To assess the potential benefit of such approaches, in this paper we systematically explore in which situations additional information about the clean speech phase is most valuable. For this, we compare the performance of phase-aware and phase-blind clean speech estimators in different noise scenarios, i.e. at different signal to noise ratios (SNRs) and for noise sources with different degrees of stationarity. Interestingly, the results indicate that the greatest benefits can be achieved in situations where conventional magnitude-only speech enhancement is most challenging, namely in highly non-stationary noises at low SNRs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Breithaupt, C., Gerkmann, T., Martin, R.: A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, pp. 4897–4900 (2008)
Google Scholar
Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)
Article Google Scholar
Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)
Article Google Scholar
Erkelens, J.S., Hendriks, R.C., Heusdens, R., Jensen, J.: Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors. IEEE Trans. Audio Speech Lang. Process. 15(6), 1741–1752 (2007)
Article Google Scholar
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L.: DARPA TIMIT acoustic phonetic continuous speech corpus CDROM (1993)
Google Scholar
Gerkmann, T.: Bayesian estimation of clean speech spectral coefficients given a priori knowledge of the phase. IEEE Trans. Signal Process. 62(16), 4199–4208 (2014)
Article MathSciNet Google Scholar
Gerkmann, T., Hendriks, R.C.: Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE Trans. Audio Speech Lang. Process. 20(4), 1383–1393 (2012)
Article Google Scholar
Gerkmann, T., Krawczyk, M.: MMSE-optimal spectral amplitude estimation given the STFT-phase. IEEE Signal Process. Lett. 20(2), 129–132 (2013)
Article Google Scholar
Gerkmann, T., Krawczyk, M., Rehr, R.: Phase estimation in speech enhancement – unimportant, important, or impossible? In: IEEE Convention of Electrical and Electronics Engineers in Israel, Eilat, Israel (2012)
Google Scholar
Gerkmann, T., Krawczyk-Becker, M., Le Roux, J.: Phase processing for single channel speech enhancement: history and recent advances. IEEE Signal Process. Mag. 32(2), 55–66 (2015)
Article Google Scholar
Gonzalez, S., Brookes, M.: PEFAC - a pitch estimation algorithm robust to high levels of noise. IEEE Trans. Audio Speech Lang. Process. 22(2), 518–530 (2014)
Article Google Scholar
Griffin, D.W., Lim, J.S.: Signal estimation from modified short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 32(2), 236–243 (1984)
Article Google Scholar
Hendriks, R.C., Gerkmann, T., Jensen, J.: DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement: A Survey of the State-of-the-Art. Morgan & Claypool, Colorado (2013)
Google Scholar
Hendriks, R.C., Jensen, J., Heusdens, R.: Noise tracking using DFT domain subspace decompositions. IEEE Trans. Audio Speech Lang. Process. 16(3), 541–553 (2008)
Article Google Scholar
ITU-T: Perceptual evaluation of speech quality (PESQ). ITU-T Recommendation P.862 (2001)
Google Scholar
Krawczyk, M., Gerkmann, T.: STFT phase reconstruction in voiced speech for an improved single-channel speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1931–1940 (2014)
Article Google Scholar
Krawczyk-Becker, M., Gerkmann, T.: An evaluation of the perceptual quality of phase-aware single-channel speech enhancement. J. Acoust. Soc. Am. 140(4), EL364–EL369 (2016)
Article Google Scholar
Krawczyk-Becker, M., Gerkmann, T.: On MMSE-based estimation of spectral speech coefficients under phase-uncertainty. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2251–2262 (2016)
Article Google Scholar
Le Roux, J., Vincent, E.: Consistent Wiener filtering for audio source separation. IEEE Signal Process. Lett. 20(3), 217–220 (2013)
Article Google Scholar
Martin, R.: Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9(5), 504–512 (2001)
Article Google Scholar
Martin, R.: Speech enhancement based on minimum mean-square error estimation and supergaussian priors. IEEE Trans. Speech Audio Process. 13(5), 845–856 (2005)
Article Google Scholar
Mowlaee, P., Kulmer, J.: Harmonic phase estimation in single-channel speech enhancement using phase decomposition and SNR information. IEEE/ACM Trans. Audio Speech Lang. Process. 23(9), 1521–1532 (2015)
Article Google Scholar
Mowlaee, P., Saeidi, R.: Iterative closed-loop phase-aware single-channel speech enhancement. IEEE Signal Process. Lett. 20(12), 1235–1239 (2013)
Article Google Scholar
Paliwal, K., Wójcicki, K., Shannon, B.: The importance of phase in speech enhancement. ELSEVIER Speech Commun. 53(4), 465–494 (2011)
Article Google Scholar
Sturmel, N., Daudet, L.: Signal reconstruction from STFT magnitude: a state of the art. In: International Conference on Digital Audio Effects (DAFx), Paris, France, pp. 375–386 (2011)
Google Scholar
Taal, C.H., Hendriks, R.C., Heusdens, R., Jensen, J.: An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)
Article Google Scholar
Wang, D.L., Lim, J.S.: The unimportance of phase in speech enhancement. IEEE Trans. Acoust. Speech Signal Process. 30(4), 679–681 (1982)
Article Google Scholar
You, C.H., Koh, S.N., Rahardja, S.: \(\beta \)-order MMSE spectral amplitude estimation for speech enhancement. IEEE Trans. Speech Audio Process. 13(4), 475–486 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Universität Hamburg, 20148, Hamburg, Germany
Martin Krawczyk-Becker & Timo Gerkmann

Authors

Martin Krawczyk-Becker
View author publications
You can also search for this author in PubMed Google Scholar
Timo Gerkmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Krawczyk-Becker .

Editor information

Editors and Affiliations

Paul Sabatier University, Toulouse, France
Yannick Deville
Bar-Ilan University, Ramat Gan, Israel
Sharon Gannot
University of Surrey, Guildford, United Kingdom
Russell Mason
University of Surrey, Guildford, United Kingdom
Mark D. Plumbley
University of Surrey, Guildford, United Kingdom
Dominic Ward

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Krawczyk-Becker, M., Gerkmann, T. (2018). A Study on the Benefits of Phase-Aware Speech Enhancement in Challenging Noise Scenarios. In: Deville, Y., Gannot, S., Mason, R., Plumbley, M., Ward, D. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2018. Lecture Notes in Computer Science(), vol 10891. Springer, Cham. https://doi.org/10.1007/978-3-319-93764-9_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-93764-9_38
Published: 06 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93763-2
Online ISBN: 978-3-319-93764-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics