Abstract
Blind source separation (BSS) and beamforming are two well-known multiple microphone techniques for speech separation and extraction in cocktail-party environments. However, both of them perform limitedly in highly reverberant and dynamic scenarios. Emulating human auditory systems, this chapter proposes a combined method for better separation and extraction performance, which uses superdirective beamforming as a preprocessor of frequency-domain BSS. Based on spatial information only, superdirective beamforming presents abilities of dereverberation and noise reduction and performs robustly in time-varying environments. Using it as a preprocessor can mitigate the inherent “circular convolution approximation problem” of the frequency-domain BSS and enhances its robustness in dynamic environments. Meanwhile, utilizing statistical information only, BSS can further reduce the residual interferences after beamforming efficiently. The combined method can exploit both spatial information and statistical information about microphone signals and hence performs better than using either BSS or beamforming alone. The proposed method is applied to two specific challenging tasks, namely a separation task in highly reverberant environments with the positions of all sources known, and a target speech extraction task in highly dynamic cocktail-party environments with only the position of the target known. Experimental results prove the effectiveness of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
More details of the experiment can be found in Sect. 11.4.2.
References
Van Veen, B.D., Buckley, K.M.: Beamforming: A versatile approach to spatial filtering. IEEE ASSP Magazine 5, 4–24 (1988)
Van Trees, H.L.: Optimum Array Processing - Part IV of Detection, Estimation, and Modulation Theory, Chapter 4, pp. 231–331, Wiley-Interscience (2002)
Griffiths, L.J., Jim, C.W.: An alternative approach to linearly constrained adaptive beamforming. IEEE Trans. Antennas Propag. 30(1), 27–34 (1982)
Cox, H., Zeskind, R.M., Kooij, T.: Practical supergain. IEEE Trans. Speech Audio Process.ing, ASSP-34(3), 393–398 (1986)
Doclo, S., Moonen, M.: Design of broadband beamformers robust against gain and phase errors in the microphone array characteristics. IEEE Trans. Signal Process. 51(10), 2511–2526 (2003)
Doclo, S., Moonen, M.: GSVD-based optimal filtering for single and multimicrophone speech enhancement. IEEE Trans. Signal Process. 50(9), 2230–2244 (2002)
Doclo, S., Spriet, A., Wouters, J., Moonen, M.: Frequency-domain criterion for the speechdistortion weighted multichannel Wiener filter for robust noise reduction. Speech Commun. 49(7–8), 636–656 (2007)
Hyvarien, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York (2001)
Cardoso, J.: Blind signal separation: statistical principles. Proc. IEEE 86(10), 2009–2025 (1998)
Bingham, E., Hyvarien, A.: A fast fixed-point algorithm for independent component analysis of complex valued signals. Int. J. Neural Syst. 10, 1–8 (2000)
Bell, A.J., Sejonwski, T.J.: An information maximization approach to blind separation and blind deconvolution. Neural Comput. 7(6), 1129–1159 (1995)
Amari, S., Cichocki, A., Yang, H.H.: A new learning algorithm for blind signal separation. Adv. Neural Inf. Process. Sys. 8, 757–763 (1996)
Wang, W., Sanei, S., Chambers, J.A.: Penalty function based joint diagonalisation approach for convolutive blind separation of nonstationary sources. IEEE Trans. Signal Process. 53(5), 1654–1669 (2005)
Pedersen, M.S., Larsen, J., Kjems, U., Parra, L.C.: A survey of convolutive blind source separation methods. In: Handbook on Speech Processing and Speech Communication, pp. 1–34, Springer (2007)
Douglas, S.C., Sun, X.: Convolutive blind separation of speech mixtures using the natural gradient. Speech Commun. 39, 65–78 (2003)
Aichner, R., Buchner, H., Yan, F., Kellermann, W.: A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments. Sig. Process. 86(6), 1260–1277 (2006)
Douglas, S.C., Gupta, M., Sawada, H., Makino, S.: Spatio-temporal FastICA algorithms for the blind separation of convolutive mixtures. IEEE Trans. Audio Speech Lang. Process. 15(5), 1511–1520 (2007)
Sawada, H., Araki, S., Makino, S.: Frequency-domain blind source separation. In: Blind Speech Separation, pp. 47–78, Springer (2007)
Smaragdis, P.: Blind separation of convolved mixtures in the frequency domain. Neurocomputing 22, 21–34 (1998)
Parra, L., Spence, C.: Convolutive blind separation of non-stationary sources. IEEE Trans. Speech Audio Process. 8(3), 320–327 (2000)
Mei, T., Mertins, A., Yin, F., Xi, J., Chicharo, J.F.: Blind source separation for convolutive mixtures based on the joint diagonalization of power spectral density matrices. Sig. Process. 88(8), 1990–2007 (2008)
Murata, N., Ikeda, S., Ziehe, A.: An approach to blind source separation based on temporal structure of speech signals. Neurocomputing 41(1-4), 1–24 (2001)
Sawada, H., Araki, S., Makino, S.: Measuring dependence of bin-wise separated signals for permutation alignment in frequency-domain BSS. In: 2007 IEEE International Symposium on Circuits and Systems, pp. 3247–3250 (2007)
Wang, L., Ding, H., Yin, F.: A region-growing permutation alignment approach in frequency-domain blind source separation of speech mixtures. IEEE Trans. Audio Speech Lang. Process. 19(3), 549–557 (2011)
Wang, L., Ding, H., Yin, F.: An improved method for permutation correction in convolutive blind source separation. Arch. Acoust. 35(4), 493–504 (2010)
Kim, T., Attias, H.T., Lee, S.Y., Lee, T.W.: Blind source separation exploiting higher-order frequency dependencies. IEEE Trans. Audio Speech Lang. Process. 15(1), 70–79 (2007)
Mazur, R., Mertins, A.: An approach for solving the permutation problem of convolutive blind source separation based on statistical signal models. IEEE Trans. Speech Audio Process. 17(1), 117–126 (2009)
Serviere, C., Pham, D.T.: Permutation correction in the frequency domain in blind separation of speech mixtures. EURASIP J. Appl. Sig. Process. 2006(1), 177–193 (2006)
Ono, N.: Stable and fast update rules for independent vector analysis based on auxiliary function technique. In: 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 189–192, New Paltz (2011)
Sawada, H., Araki, S., Makino, S.: Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans. Audio Speech Lang. Process. 19(3), 516–527 (2011)
Saruwatari, H., Kurita, S., Takeda, K.: Blind source separation combining independent component analysis and beamforming. EURASIP J. Appl. Sig. Process. 2003(11), 1135–1146 (2003)
Ikram, M.Z., Morgan, D.R.: Permutation inconsistency in blind speech separation: investigation and solutions. IEEE Trans. Speech Audio Process. 13(1), 1–13 (2005)
Sawada, H., Mukai, R., Araki, S., Makino, S.: A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Trans. Speech Audio Process. 12(5), 530–538 (2004)
Nesta, F., Svaizer, P., Omologo, M.: Convolutive BSS of short mixtures by ICA recursively regularized across frequencies. IEEE Trans. Audio Speech Lang. Process. 19(3), 624–639 (2011)
Nesta, F., Wada, T.S., Juang, B.: Coherent spectral estimation for a robust solution of the permutation problem. In: 2009 IEEE Workshop on Application of Signal Processing to Audio and Acoustics, pp. 1–4, New Paltz, New York (2009)
Liu, Q., Wang, W., Jackson, P.: Use of bimodal coherence to resolve the permutation problem in convolutive BSS. Sig. Process. 92(8), 1916–1927 (2012)
Araki, S., Mukai, R., Makino, S., Nishikawa, T., Saruwatari, H.: The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech. IEEE Trans. Speech Audio Process. 11(2), 109–116 (2003)
Parra, L., Fancourt, C.: An adaptive beamforming perspective on convolutive blind source separation. In: Davis, G.M. (ed.) Noise Reduction in Speech Applications, pp. 361–376. CRC Press (2002)
Ikram, M.Z., Morgan, D.R.: A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation. In: 2002 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 881–884 (2002)
Parra, L.C., Alvino, C.V.: Geometric source separation: Merging convolutive source separation with geometric beamforming. IEEE Trans. Speech Audio Process. 10(6), 352–362 (2002)
Saruwatari, H., Kawamura, T., Nishikawa, T., Lee, A., Shikano, K.: Blind source separation based on a fast-convergence algorithm combining ICA and beamforming. IEEE Trans. Audio Speech Lang. Process. 14(2), 666–678 (2006)
Gupta, M., Douglas, S.C.: Beamforming initialization and data prewhitening in natural gradient convolutive blind source separation of speech mixtures. In: Independent Component Analysis and Signal Separation, vol. 4666, pp. 512–519, Springer, Berlin (2007)
Nishikawa, T.,Saruwatari, H., Shikano, K.: Blind source separation of acoustic signals based on multistage ICA combining frequency-domain ICA and time-domain ICA. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E86-A(4), 846–858 (2003)
Chen, J., Van Veen, B.D., Hecox, K.E.: External ear transfer function modeling: a beamforming approach. J. Acoust. Soc. Am. 92(4), 1933–1944 (1992)
Wang, L., Ding, H., Yin, F.: Combining superdirective beamforming and frequency-domain blind source separation for highly reverberant signals. EURASIP J. Audio Speech Music Process. 2010, 1–13 (2010). (Article ID 797962)
Wang, L., Ding, H., Yin, F.: Target speech extraction in cocktail party by combining beamforming and blind source separation. IEEE Trans. Audio Speech Lang. Process. 39(2), 64–67 (2011)
Pan, Q., Aboulnasr, T.: Combined spatial/beamforming and time/frequency processing for blind source separation. In: European Signal Processing Conference 2005, Antalya, Turkey, pp. 1–4 (2005)
Matsuoka, K., Nakashima, S.:Minimal distortion principle for blind source separation. In: 2001 International Workshop on Independent Component, pp. 722–727 (2001)
Ryan, J.G., Goubran, R.A.: Array optimization applied in the near field of a microphone array. IEEE Trans. Speech Audio Process. 8(2), 173–176 (2000)
Bouchard, C., Havelock, D.I.: Beamforming with microphone arrays for directional sources. J. Acoust. Soc. Am. 125(4), 2098–2104 (2008)
Allen, J.B., Berkley, D.A.: Image method for efficiently simulating small room acoustics. J. Acoust. Soc. Am. 65, 943–950 (1979)
Silverman, H.F., Yu, Y., Sachar, J.M., Patterson, W.R.: Performance of real-time source-location estimators for a large-aperture microphone array. IEEE Trans. Speech Audio Process. 13(4) (2005)
Madhu, N., Martin, R.: A scalable framework for multiple speaker localisation and tracking. In: 2008 International Workshop on Acoustic Echo and Noise Control, Seatle, Washington, pp. 1–4, (2008)
Maazaoui, M., Abed-Meraim, K., Grenier, Y.: Blind source separation for robot audition using fixed HRTF beamforming. EURASIP J. Audio Speech Music Process. 2012,1–18 (2012)
Sawada, H., Araki, S., Mukai, R., Makino, S.: Blind extraction of dominant target sources using ICA and time-frequency masking. IEEE Trans. Audio Speech Lang. Process. 16(6), 2165–2173 (2006)
Acknowledgments
This work is partly supported by the Alexander von Humboldt Foundation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wang, L., Ding, H., Yin, F. (2014). Speech Separation and Extraction by Combining Superdirective Beamforming and Blind Source Separation. In: Naik, G., Wang, W. (eds) Blind Source Separation. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55016-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-55016-4_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55015-7
Online ISBN: 978-3-642-55016-4
eBook Packages: EngineeringEngineering (R0)