Monaural Speech Segregation Using Signal Phase

Zhou, Hong; Jiang, Yi; Chen, Xiao; Zu, Yuanyuan

doi:10.1007/978-3-642-25541-0_34

Hong Zhou²,
Yi Jiang^2,3,
Xiao Chen² &
…
Yuanyuan Zu²

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 121))

3325 Accesses

Abstract

An approach to segregate the target speech form the mixture utterance in low signal noise ratio (SNR) was proposed. Within the framework of computational auditory scene analysis (CASA), phase was the cue for segregation, and short time Fourier transforms (STFT) was used to extract the phase of the signal. Binary masking was used to group the target speech units based on the difference of phase between the mixture, clean speech and noise. The threshold of the binary masks was not linear. It adapted with the frequency change, and obtained from pretest. Experiments illustrated that the improvement of signal to noise ratio was more than 20dB in babble, m109, white and machinegun noise in -30dB to -20dB. The waveform of the result signal shown it remained most detail of the original signal, and had a well intelligibility. Phase is a robust cue in monaural speech segregation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wang, D., Hu, G.: Cocktail Party Processing. In: Zurada, J.M., Yen, G.G., Wang, J. (eds.) Computational Intelligence: Research Frontiers. LNCS, vol. 5050, pp. 333–348. Springer, Heidelberg (2008)
Chapter Google Scholar
Kerlin, J.R., Shahin, A.J., Miller, L.M.: Attentional Gain Control of Ongoing Cortical Speech Representations in a "Cocktail Party". J. Neurosci., 620–628 (2010)
Google Scholar
Boll, S.F.: A spectral subtraction algorithm for suppression of acoustic noise in speech. In: ICASSP 1979, pp. 200–203. IEEE Press, New York (1979)
Google Scholar
Jan, T., Wang, W.W., Wang, D.L.: A multistage approach to blind separation of convolutive speech mixtures. Speech Commun. 53, 524–539 (2011)
Article Google Scholar
Brown, G.J., Cooke, M.: Computational auditory scene analysis. Comput. Speech Lang. 8, 297–336 (1994)
Article Google Scholar
Yang, S., Srinivasan, S., Zhaozhang, J., DeLiang, W.: A computational auditory scene analysis system for speech segregation and robust speech recognition. Comput. Speech Lang. 24, 77–93 (2010)
Article Google Scholar
Narayanan, A., Wang, D.L.: Robust speech recognition from binary masks. J. Acoust. Soc. Am. 128, L217–L222 (2010)
Google Scholar
Wang, D., Lim, J.: The unimportance of phase in speech enhancement. IEEE Transactions on Acoustics Speech and Signal Processing 30, 679–681 (1982)
Article Google Scholar
Oppenheim, A.V., Lim, J.S.: The importance of phase in signals, vol. 69, pp. 529–541. IEEE press (1981)
Google Scholar
Hu, G., Wang, D.: A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation. IEEE Transactions on Audio, Speech, and Language Processing 18, 2067–2079 (2010)
Article Google Scholar
Hu, G., Wang, D.: Auditory Segmentation Based on Onset and Offset Analysis. IEEE Transactions on Audio, Speech, and Language Processing 15, 396–405 (2007)
Article Google Scholar
Woodruff, J., Wang, D.L.: Integrating Monaural and Binaural Analysis gor localizing Multiple Reverberant Sound Sources. IEEE Transactions on Audio, Speech, and Language Processing, 2706–2709 (2010)
Google Scholar
Wang, D.: On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis: Speech Separation by Humans and Machines, pp. 181–197. Kluwer (2005)
Google Scholar
Brungart, D.S., Chang, P.S., Simpson, B.D., Wang, D.L.: Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. J. Acoust. Soc. Am. 120, 4007–4018 (2006)
Article Google Scholar
Li, N., Loizou, P.C.: Effect of spectral resolution on the intelligibility of ideal binary masked speech. J. Acoust. Soc. Am. 123, L59–L64 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Center of Soldier Support System, The Quartermaster Equipment Research Institute, CPLA, Beijing, P.R. China
Hong Zhou, Yi Jiang, Xiao Chen & Yuanyuan Zu
Department of Electronic Engineering, Tsinghua University, Beijing, P.R. China
Yi Jiang

Authors

Hong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yi Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuanyuan Zu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Central China Normal University, Lvting Yajing 10-3-102, 430079, Wuhan, Hongshan Qu, China
Yanwen Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, H., Jiang, Y., Chen, X., Zu, Y. (2011). Monaural Speech Segregation Using Signal Phase. In: Wu, Y. (eds) Advances in Computer, Communication, Control and Automation. Lecture Notes in Electrical Engineering, vol 121. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25541-0_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-25541-0_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25540-3
Online ISBN: 978-3-642-25541-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics