Skip to main content

Speech Separation and Extraction by Combining Superdirective Beamforming and Blind Source Separation

  • Chapter
  • First Online:
Book cover Blind Source Separation

Part of the book series: Signals and Communication Technology ((SCT))

  • 2876 Accesses

Abstract

Blind source separation (BSS) and beamforming are two well-known multiple microphone techniques for speech separation and extraction in cocktail-party environments. However, both of them perform limitedly in highly reverberant and dynamic scenarios. Emulating human auditory systems, this chapter proposes a combined method for better separation and extraction performance, which uses superdirective beamforming as a preprocessor of frequency-domain BSS. Based on spatial information only, superdirective beamforming presents abilities of dereverberation and noise reduction and performs robustly in time-varying environments. Using it as a preprocessor can mitigate the inherent “circular convolution approximation problem” of the frequency-domain BSS and enhances its robustness in dynamic environments. Meanwhile, utilizing statistical information only, BSS can further reduce the residual interferences after beamforming efficiently. The combined method can exploit both spatial information and statistical information about microphone signals and hence performs better than using either BSS or beamforming alone. The proposed method is applied to two specific challenging tasks, namely a separation task in highly reverberant environments with the positions of all sources known, and a target speech extraction task in highly dynamic cocktail-party environments with only the position of the target known. Experimental results prove the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    More details of the experiment can be found in Sect. 11.4.2.

References

  1. Van Veen, B.D., Buckley, K.M.: Beamforming: A versatile approach to spatial filtering. IEEE ASSP Magazine 5, 4–24 (1988)

    Article  Google Scholar 

  2. Van Trees, H.L.: Optimum Array Processing - Part IV of Detection, Estimation, and Modulation Theory, Chapter 4, pp. 231–331, Wiley-Interscience (2002)

    Google Scholar 

  3. Griffiths, L.J., Jim, C.W.: An alternative approach to linearly constrained adaptive beamforming. IEEE Trans. Antennas Propag. 30(1), 27–34 (1982)

    Google Scholar 

  4. Cox, H., Zeskind, R.M., Kooij, T.: Practical supergain. IEEE Trans. Speech Audio Process.ing, ASSP-34(3), 393–398 (1986)

    Google Scholar 

  5. Doclo, S., Moonen, M.: Design of broadband beamformers robust against gain and phase errors in the microphone array characteristics. IEEE Trans. Signal Process. 51(10), 2511–2526 (2003)

    Article  Google Scholar 

  6. Doclo, S., Moonen, M.: GSVD-based optimal filtering for single and multimicrophone speech enhancement. IEEE Trans. Signal Process. 50(9), 2230–2244 (2002)

    Article  Google Scholar 

  7. Doclo, S., Spriet, A., Wouters, J., Moonen, M.: Frequency-domain criterion for the speechdistortion weighted multichannel Wiener filter for robust noise reduction. Speech Commun. 49(7–8), 636–656 (2007)

    Article  Google Scholar 

  8. Hyvarien, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York (2001)

    Book  Google Scholar 

  9. Cardoso, J.: Blind signal separation: statistical principles. Proc. IEEE 86(10), 2009–2025 (1998)

    Google Scholar 

  10. Bingham, E., Hyvarien, A.: A fast fixed-point algorithm for independent component analysis of complex valued signals. Int. J. Neural Syst. 10, 1–8 (2000)

    Article  Google Scholar 

  11. Bell, A.J., Sejonwski, T.J.: An information maximization approach to blind separation and blind deconvolution. Neural Comput. 7(6), 1129–1159 (1995)

    Article  Google Scholar 

  12. Amari, S., Cichocki, A., Yang, H.H.: A new learning algorithm for blind signal separation. Adv. Neural Inf. Process. Sys. 8, 757–763 (1996)

    Google Scholar 

  13. Wang, W., Sanei, S., Chambers, J.A.: Penalty function based joint diagonalisation approach for convolutive blind separation of nonstationary sources. IEEE Trans. Signal Process. 53(5), 1654–1669 (2005)

    Article  MathSciNet  Google Scholar 

  14. Pedersen, M.S., Larsen, J., Kjems, U., Parra, L.C.: A survey of convolutive blind source separation methods. In: Handbook on Speech Processing and Speech Communication, pp. 1–34, Springer (2007)

    Google Scholar 

  15. Douglas, S.C., Sun, X.: Convolutive blind separation of speech mixtures using the natural gradient. Speech Commun. 39, 65–78 (2003)

    Article  MATH  Google Scholar 

  16. Aichner, R., Buchner, H., Yan, F., Kellermann, W.: A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments. Sig. Process. 86(6), 1260–1277 (2006)

    Article  MATH  Google Scholar 

  17. Douglas, S.C., Gupta, M., Sawada, H., Makino, S.: Spatio-temporal FastICA algorithms for the blind separation of convolutive mixtures. IEEE Trans. Audio Speech Lang. Process. 15(5), 1511–1520 (2007)

    Article  Google Scholar 

  18. Sawada, H., Araki, S., Makino, S.: Frequency-domain blind source separation. In: Blind Speech Separation, pp. 47–78, Springer (2007)

    Google Scholar 

  19. Smaragdis, P.: Blind separation of convolved mixtures in the frequency domain. Neurocomputing 22, 21–34 (1998)

    Article  MATH  Google Scholar 

  20. Parra, L., Spence, C.: Convolutive blind separation of non-stationary sources. IEEE Trans. Speech Audio Process. 8(3), 320–327 (2000)

    Article  Google Scholar 

  21. Mei, T., Mertins, A., Yin, F., Xi, J., Chicharo, J.F.: Blind source separation for convolutive mixtures based on the joint diagonalization of power spectral density matrices. Sig. Process. 88(8), 1990–2007 (2008)

    Article  MATH  Google Scholar 

  22. Murata, N., Ikeda, S., Ziehe, A.: An approach to blind source separation based on temporal structure of speech signals. Neurocomputing 41(1-4), 1–24 (2001)

    Article  MATH  Google Scholar 

  23. Sawada, H., Araki, S., Makino, S.: Measuring dependence of bin-wise separated signals for permutation alignment in frequency-domain BSS. In: 2007 IEEE International Symposium on Circuits and Systems, pp. 3247–3250 (2007)

    Google Scholar 

  24. Wang, L., Ding, H., Yin, F.: A region-growing permutation alignment approach in frequency-domain blind source separation of speech mixtures. IEEE Trans. Audio Speech Lang. Process. 19(3), 549–557 (2011)

    Article  Google Scholar 

  25. Wang, L., Ding, H., Yin, F.: An improved method for permutation correction in convolutive blind source separation. Arch. Acoust. 35(4), 493–504 (2010)

    Article  Google Scholar 

  26. Kim, T., Attias, H.T., Lee, S.Y., Lee, T.W.: Blind source separation exploiting higher-order frequency dependencies. IEEE Trans. Audio Speech Lang. Process. 15(1), 70–79 (2007)

    Article  Google Scholar 

  27. Mazur, R., Mertins, A.: An approach for solving the permutation problem of convolutive blind source separation based on statistical signal models. IEEE Trans. Speech Audio Process. 17(1), 117–126 (2009)

    Article  Google Scholar 

  28. Serviere, C., Pham, D.T.: Permutation correction in the frequency domain in blind separation of speech mixtures. EURASIP J. Appl. Sig. Process. 2006(1), 177–193 (2006)

    Google Scholar 

  29. Ono, N.: Stable and fast update rules for independent vector analysis based on auxiliary function technique. In: 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 189–192, New Paltz (2011)

    Google Scholar 

  30. Sawada, H., Araki, S., Makino, S.: Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans. Audio Speech Lang. Process. 19(3), 516–527 (2011)

    Article  Google Scholar 

  31. Saruwatari, H., Kurita, S., Takeda, K.: Blind source separation combining independent component analysis and beamforming. EURASIP J. Appl. Sig. Process. 2003(11), 1135–1146 (2003)

    Article  MATH  Google Scholar 

  32. Ikram, M.Z., Morgan, D.R.: Permutation inconsistency in blind speech separation: investigation and solutions. IEEE Trans. Speech Audio Process. 13(1), 1–13 (2005)

    Article  Google Scholar 

  33. Sawada, H., Mukai, R., Araki, S., Makino, S.: A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Trans. Speech Audio Process. 12(5), 530–538 (2004)

    Article  Google Scholar 

  34. Nesta, F., Svaizer, P., Omologo, M.: Convolutive BSS of short mixtures by ICA recursively regularized across frequencies. IEEE Trans. Audio Speech Lang. Process. 19(3), 624–639 (2011)

    Article  Google Scholar 

  35. Nesta, F., Wada, T.S., Juang, B.: Coherent spectral estimation for a robust solution of the permutation problem. In: 2009 IEEE Workshop on Application of Signal Processing to Audio and Acoustics, pp. 1–4, New Paltz, New York (2009)

    Google Scholar 

  36. Liu, Q., Wang, W., Jackson, P.: Use of bimodal coherence to resolve the permutation problem in convolutive BSS. Sig. Process. 92(8), 1916–1927 (2012)

    Article  Google Scholar 

  37. Araki, S., Mukai, R., Makino, S., Nishikawa, T., Saruwatari, H.: The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech. IEEE Trans. Speech Audio Process. 11(2), 109–116 (2003)

    Article  Google Scholar 

  38. Parra, L., Fancourt, C.: An adaptive beamforming perspective on convolutive blind source separation. In: Davis, G.M. (ed.) Noise Reduction in Speech Applications, pp. 361–376. CRC Press (2002)

    Google Scholar 

  39. Ikram, M.Z., Morgan, D.R.: A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation. In: 2002 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 881–884 (2002)

    Google Scholar 

  40. Parra, L.C., Alvino, C.V.: Geometric source separation: Merging convolutive source separation with geometric beamforming. IEEE Trans. Speech Audio Process. 10(6), 352–362 (2002)

    Article  Google Scholar 

  41. Saruwatari, H., Kawamura, T., Nishikawa, T., Lee, A., Shikano, K.: Blind source separation based on a fast-convergence algorithm combining ICA and beamforming. IEEE Trans. Audio Speech Lang. Process. 14(2), 666–678 (2006)

    Article  Google Scholar 

  42. Gupta, M., Douglas, S.C.: Beamforming initialization and data prewhitening in natural gradient convolutive blind source separation of speech mixtures. In: Independent Component Analysis and Signal Separation, vol. 4666, pp. 512–519, Springer, Berlin (2007)

    Google Scholar 

  43. Nishikawa, T.,Saruwatari, H., Shikano, K.: Blind source separation of acoustic signals based on multistage ICA combining frequency-domain ICA and time-domain ICA. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E86-A(4), 846–858 (2003)

    Google Scholar 

  44. Chen, J., Van Veen, B.D., Hecox, K.E.: External ear transfer function modeling: a beamforming approach. J. Acoust. Soc. Am. 92(4), 1933–1944 (1992)

    Article  Google Scholar 

  45. Wang, L., Ding, H., Yin, F.: Combining superdirective beamforming and frequency-domain blind source separation for highly reverberant signals. EURASIP J. Audio Speech Music Process. 2010, 1–13 (2010). (Article ID 797962)

    Article  Google Scholar 

  46. Wang, L., Ding, H., Yin, F.: Target speech extraction in cocktail party by combining beamforming and blind source separation. IEEE Trans. Audio Speech Lang. Process. 39(2), 64–67 (2011)

    Google Scholar 

  47. Pan, Q., Aboulnasr, T.: Combined spatial/beamforming and time/frequency processing for blind source separation. In: European Signal Processing Conference 2005, Antalya, Turkey, pp. 1–4 (2005)

    Google Scholar 

  48. Matsuoka, K., Nakashima, S.:Minimal distortion principle for blind source separation. In: 2001 International Workshop on Independent Component, pp. 722–727 (2001)

    Google Scholar 

  49. Ryan, J.G., Goubran, R.A.: Array optimization applied in the near field of a microphone array. IEEE Trans. Speech Audio Process. 8(2), 173–176 (2000)

    Article  Google Scholar 

  50. Bouchard, C., Havelock, D.I.: Beamforming with microphone arrays for directional sources. J. Acoust. Soc. Am. 125(4), 2098–2104 (2008)

    Article  Google Scholar 

  51. Allen, J.B., Berkley, D.A.: Image method for efficiently simulating small room acoustics. J. Acoust. Soc. Am. 65, 943–950 (1979)

    Article  Google Scholar 

  52. Silverman, H.F., Yu, Y., Sachar, J.M., Patterson, W.R.: Performance of real-time source-location estimators for a large-aperture microphone array. IEEE Trans. Speech Audio Process. 13(4) (2005)

    Google Scholar 

  53. Madhu, N., Martin, R.: A scalable framework for multiple speaker localisation and tracking. In: 2008 International Workshop on Acoustic Echo and Noise Control, Seatle, Washington, pp. 1–4, (2008)

    Google Scholar 

  54. Maazaoui, M., Abed-Meraim, K., Grenier, Y.: Blind source separation for robot audition using fixed HRTF beamforming. EURASIP J. Audio Speech Music Process. 2012,1–18 (2012)

    Google Scholar 

  55. Sawada, H., Araki, S., Mukai, R., Makino, S.: Blind extraction of dominant target sources using ICA and time-frequency masking. IEEE Trans. Audio Speech Lang. Process. 16(6), 2165–2173 (2006)

    Google Scholar 

  56. https://sites.google.com/site/linwangsig/extraction

Download references

Acknowledgments

This work is partly supported by the Alexander von Humboldt Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wang, L., Ding, H., Yin, F. (2014). Speech Separation and Extraction by Combining Superdirective Beamforming and Blind Source Separation. In: Naik, G., Wang, W. (eds) Blind Source Separation. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55016-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-55016-4_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-55015-7

  • Online ISBN: 978-3-642-55016-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics