Speech Separation and Extraction by Combining Superdirective Beamforming and Blind Source Separation

Wang, Lin; Ding, Heping; Yin, Fuliang

doi:10.1007/978-3-642-55016-4_11

Lin Wang^3,4,
Heping Ding⁵ &
Fuliang Yin³

Part of the book series: Signals and Communication Technology ((SCT))

2876 Accesses

Abstract

Blind source separation (BSS) and beamforming are two well-known multiple microphone techniques for speech separation and extraction in cocktail-party environments. However, both of them perform limitedly in highly reverberant and dynamic scenarios. Emulating human auditory systems, this chapter proposes a combined method for better separation and extraction performance, which uses superdirective beamforming as a preprocessor of frequency-domain BSS. Based on spatial information only, superdirective beamforming presents abilities of dereverberation and noise reduction and performs robustly in time-varying environments. Using it as a preprocessor can mitigate the inherent “circular convolution approximation problem” of the frequency-domain BSS and enhances its robustness in dynamic environments. Meanwhile, utilizing statistical information only, BSS can further reduce the residual interferences after beamforming efficiently. The combined method can exploit both spatial information and statistical information about microphone signals and hence performs better than using either BSS or beamforming alone. The proposed method is applied to two specific challenging tasks, namely a separation task in highly reverberant environments with the positions of all sources known, and a target speech extraction task in highly dynamic cocktail-party environments with only the position of the target known. Experimental results prove the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
More details of the experiment can be found in Sect. 11.4.2.

References

Van Veen, B.D., Buckley, K.M.: Beamforming: A versatile approach to spatial filtering. IEEE ASSP Magazine 5, 4–24 (1988)
Article Google Scholar
Van Trees, H.L.: Optimum Array Processing - Part IV of Detection, Estimation, and Modulation Theory, Chapter 4, pp. 231–331, Wiley-Interscience (2002)
Google Scholar
Griffiths, L.J., Jim, C.W.: An alternative approach to linearly constrained adaptive beamforming. IEEE Trans. Antennas Propag. 30(1), 27–34 (1982)
Google Scholar
Cox, H., Zeskind, R.M., Kooij, T.: Practical supergain. IEEE Trans. Speech Audio Process.ing, ASSP-34(3), 393–398 (1986)
Google Scholar
Doclo, S., Moonen, M.: Design of broadband beamformers robust against gain and phase errors in the microphone array characteristics. IEEE Trans. Signal Process. 51(10), 2511–2526 (2003)
Article Google Scholar
Doclo, S., Moonen, M.: GSVD-based optimal filtering for single and multimicrophone speech enhancement. IEEE Trans. Signal Process. 50(9), 2230–2244 (2002)
Article Google Scholar
Doclo, S., Spriet, A., Wouters, J., Moonen, M.: Frequency-domain criterion for the speechdistortion weighted multichannel Wiener filter for robust noise reduction. Speech Commun. 49(7–8), 636–656 (2007)
Article Google Scholar
Hyvarien, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York (2001)
Book Google Scholar
Cardoso, J.: Blind signal separation: statistical principles. Proc. IEEE 86(10), 2009–2025 (1998)
Google Scholar
Bingham, E., Hyvarien, A.: A fast fixed-point algorithm for independent component analysis of complex valued signals. Int. J. Neural Syst. 10, 1–8 (2000)
Article Google Scholar
Bell, A.J., Sejonwski, T.J.: An information maximization approach to blind separation and blind deconvolution. Neural Comput. 7(6), 1129–1159 (1995)
Article Google Scholar
Amari, S., Cichocki, A., Yang, H.H.: A new learning algorithm for blind signal separation. Adv. Neural Inf. Process. Sys. 8, 757–763 (1996)
Google Scholar
Wang, W., Sanei, S., Chambers, J.A.: Penalty function based joint diagonalisation approach for convolutive blind separation of nonstationary sources. IEEE Trans. Signal Process. 53(5), 1654–1669 (2005)
Article MathSciNet Google Scholar
Pedersen, M.S., Larsen, J., Kjems, U., Parra, L.C.: A survey of convolutive blind source separation methods. In: Handbook on Speech Processing and Speech Communication, pp. 1–34, Springer (2007)
Google Scholar
Douglas, S.C., Sun, X.: Convolutive blind separation of speech mixtures using the natural gradient. Speech Commun. 39, 65–78 (2003)
Article MATH Google Scholar
Aichner, R., Buchner, H., Yan, F., Kellermann, W.: A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments. Sig. Process. 86(6), 1260–1277 (2006)
Article MATH Google Scholar
Douglas, S.C., Gupta, M., Sawada, H., Makino, S.: Spatio-temporal FastICA algorithms for the blind separation of convolutive mixtures. IEEE Trans. Audio Speech Lang. Process. 15(5), 1511–1520 (2007)
Article Google Scholar
Sawada, H., Araki, S., Makino, S.: Frequency-domain blind source separation. In: Blind Speech Separation, pp. 47–78, Springer (2007)
Google Scholar
Smaragdis, P.: Blind separation of convolved mixtures in the frequency domain. Neurocomputing 22, 21–34 (1998)
Article MATH Google Scholar
Parra, L., Spence, C.: Convolutive blind separation of non-stationary sources. IEEE Trans. Speech Audio Process. 8(3), 320–327 (2000)
Article Google Scholar
Mei, T., Mertins, A., Yin, F., Xi, J., Chicharo, J.F.: Blind source separation for convolutive mixtures based on the joint diagonalization of power spectral density matrices. Sig. Process. 88(8), 1990–2007 (2008)
Article MATH Google Scholar
Murata, N., Ikeda, S., Ziehe, A.: An approach to blind source separation based on temporal structure of speech signals. Neurocomputing 41(1-4), 1–24 (2001)
Article MATH Google Scholar
Sawada, H., Araki, S., Makino, S.: Measuring dependence of bin-wise separated signals for permutation alignment in frequency-domain BSS. In: 2007 IEEE International Symposium on Circuits and Systems, pp. 3247–3250 (2007)
Google Scholar
Wang, L., Ding, H., Yin, F.: A region-growing permutation alignment approach in frequency-domain blind source separation of speech mixtures. IEEE Trans. Audio Speech Lang. Process. 19(3), 549–557 (2011)
Article Google Scholar
Wang, L., Ding, H., Yin, F.: An improved method for permutation correction in convolutive blind source separation. Arch. Acoust. 35(4), 493–504 (2010)
Article Google Scholar
Kim, T., Attias, H.T., Lee, S.Y., Lee, T.W.: Blind source separation exploiting higher-order frequency dependencies. IEEE Trans. Audio Speech Lang. Process. 15(1), 70–79 (2007)
Article Google Scholar
Mazur, R., Mertins, A.: An approach for solving the permutation problem of convolutive blind source separation based on statistical signal models. IEEE Trans. Speech Audio Process. 17(1), 117–126 (2009)
Article Google Scholar
Serviere, C., Pham, D.T.: Permutation correction in the frequency domain in blind separation of speech mixtures. EURASIP J. Appl. Sig. Process. 2006(1), 177–193 (2006)
Google Scholar
Ono, N.: Stable and fast update rules for independent vector analysis based on auxiliary function technique. In: 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 189–192, New Paltz (2011)
Google Scholar
Sawada, H., Araki, S., Makino, S.: Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans. Audio Speech Lang. Process. 19(3), 516–527 (2011)
Article Google Scholar
Saruwatari, H., Kurita, S., Takeda, K.: Blind source separation combining independent component analysis and beamforming. EURASIP J. Appl. Sig. Process. 2003(11), 1135–1146 (2003)
Article MATH Google Scholar
Ikram, M.Z., Morgan, D.R.: Permutation inconsistency in blind speech separation: investigation and solutions. IEEE Trans. Speech Audio Process. 13(1), 1–13 (2005)
Article Google Scholar
Sawada, H., Mukai, R., Araki, S., Makino, S.: A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Trans. Speech Audio Process. 12(5), 530–538 (2004)
Article Google Scholar
Nesta, F., Svaizer, P., Omologo, M.: Convolutive BSS of short mixtures by ICA recursively regularized across frequencies. IEEE Trans. Audio Speech Lang. Process. 19(3), 624–639 (2011)
Article Google Scholar
Nesta, F., Wada, T.S., Juang, B.: Coherent spectral estimation for a robust solution of the permutation problem. In: 2009 IEEE Workshop on Application of Signal Processing to Audio and Acoustics, pp. 1–4, New Paltz, New York (2009)
Google Scholar
Liu, Q., Wang, W., Jackson, P.: Use of bimodal coherence to resolve the permutation problem in convolutive BSS. Sig. Process. 92(8), 1916–1927 (2012)
Article Google Scholar
Araki, S., Mukai, R., Makino, S., Nishikawa, T., Saruwatari, H.: The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech. IEEE Trans. Speech Audio Process. 11(2), 109–116 (2003)
Article Google Scholar
Parra, L., Fancourt, C.: An adaptive beamforming perspective on convolutive blind source separation. In: Davis, G.M. (ed.) Noise Reduction in Speech Applications, pp. 361–376. CRC Press (2002)
Google Scholar
Ikram, M.Z., Morgan, D.R.: A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation. In: 2002 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 881–884 (2002)
Google Scholar
Parra, L.C., Alvino, C.V.: Geometric source separation: Merging convolutive source separation with geometric beamforming. IEEE Trans. Speech Audio Process. 10(6), 352–362 (2002)
Article Google Scholar
Saruwatari, H., Kawamura, T., Nishikawa, T., Lee, A., Shikano, K.: Blind source separation based on a fast-convergence algorithm combining ICA and beamforming. IEEE Trans. Audio Speech Lang. Process. 14(2), 666–678 (2006)
Article Google Scholar
Gupta, M., Douglas, S.C.: Beamforming initialization and data prewhitening in natural gradient convolutive blind source separation of speech mixtures. In: Independent Component Analysis and Signal Separation, vol. 4666, pp. 512–519, Springer, Berlin (2007)
Google Scholar
Nishikawa, T.,Saruwatari, H., Shikano, K.: Blind source separation of acoustic signals based on multistage ICA combining frequency-domain ICA and time-domain ICA. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E86-A(4), 846–858 (2003)
Google Scholar
Chen, J., Van Veen, B.D., Hecox, K.E.: External ear transfer function modeling: a beamforming approach. J. Acoust. Soc. Am. 92(4), 1933–1944 (1992)
Article Google Scholar
Wang, L., Ding, H., Yin, F.: Combining superdirective beamforming and frequency-domain blind source separation for highly reverberant signals. EURASIP J. Audio Speech Music Process. 2010, 1–13 (2010). (Article ID 797962)
Article Google Scholar
Wang, L., Ding, H., Yin, F.: Target speech extraction in cocktail party by combining beamforming and blind source separation. IEEE Trans. Audio Speech Lang. Process. 39(2), 64–67 (2011)
Google Scholar
Pan, Q., Aboulnasr, T.: Combined spatial/beamforming and time/frequency processing for blind source separation. In: European Signal Processing Conference 2005, Antalya, Turkey, pp. 1–4 (2005)
Google Scholar
Matsuoka, K., Nakashima, S.:Minimal distortion principle for blind source separation. In: 2001 International Workshop on Independent Component, pp. 722–727 (2001)
Google Scholar
Ryan, J.G., Goubran, R.A.: Array optimization applied in the near field of a microphone array. IEEE Trans. Speech Audio Process. 8(2), 173–176 (2000)
Article Google Scholar
Bouchard, C., Havelock, D.I.: Beamforming with microphone arrays for directional sources. J. Acoust. Soc. Am. 125(4), 2098–2104 (2008)
Article Google Scholar
Allen, J.B., Berkley, D.A.: Image method for efficiently simulating small room acoustics. J. Acoust. Soc. Am. 65, 943–950 (1979)
Article Google Scholar
Silverman, H.F., Yu, Y., Sachar, J.M., Patterson, W.R.: Performance of real-time source-location estimators for a large-aperture microphone array. IEEE Trans. Speech Audio Process. 13(4) (2005)
Google Scholar
Madhu, N., Martin, R.: A scalable framework for multiple speaker localisation and tracking. In: 2008 International Workshop on Acoustic Echo and Noise Control, Seatle, Washington, pp. 1–4, (2008)
Google Scholar
Maazaoui, M., Abed-Meraim, K., Grenier, Y.: Blind source separation for robot audition using fixed HRTF beamforming. EURASIP J. Audio Speech Music Process. 2012,1–18 (2012)
Google Scholar
Sawada, H., Araki, S., Mukai, R., Makino, S.: Blind extraction of dominant target sources using ICA and time-frequency masking. IEEE Trans. Audio Speech Lang. Process. 16(6), 2165–2173 (2006)
Google Scholar
https://sites.google.com/site/linwangsig/extraction

Download references

Acknowledgments

This work is partly supported by the Alexander von Humboldt Foundation.

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Dalian University of Technology, Dalian, China
Lin Wang & Fuliang Yin
Institute of Physics - Signal Processing Group, University of Oldenburg, Oldenburg, Germany
Lin Wang
Information and Communications Technology, National Research Council, Ottawa, Canada
Heping Ding

Authors

Lin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Heping Ding
View author publications
You can also search for this author in PubMed Google Scholar
Fuliang Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lin Wang .

Editor information

Editors and Affiliations

University of Technology, Sydney, Sydney, Australia
Ganesh R. Naik
University of Surrey, Guildford, United Kingdom
Wenwu Wang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, L., Ding, H., Yin, F. (2014). Speech Separation and Extraction by Combining Superdirective Beamforming and Blind Source Separation. In: Naik, G., Wang, W. (eds) Blind Source Separation. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55016-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-55016-4_11
Published: 22 May 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55015-7
Online ISBN: 978-3-642-55016-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics