Skip to main content

Neural Beamforming for Speech Enhancement: Preliminary Results

  • Chapter
  • First Online:

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 102))

Abstract

In the field of multi-channel speech quality enhancement, beamforming algorithms play a key role, being able to reduce noise and reverberation by spatial filtering. To that extent, an accurate knowledge of the Direction of Arrival (DOA) is crucial for the beamforming to be effective. This paper reports extremely improved DOA estimates with the use of a recently introduced neural DOA estimation technique, when compared to a reference algorithm such as Multiple Signal Classification (MUSIC). These findings motivated for the evaluation of beamforming with neural DOA estimation in the field of speech enhancement. By using the neural DOA estimation in conjunction with beamforming, speech signals affected by reverberation and noise improve their quality. These first findings are reported to be taken as a reference for further works related to beamforming for speech enhancement.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.audiolabs-erlangen.de/fau/professor/habets/software/rir-generator.

  2. 2.

    http://www.itu.int/rec/T-REC-P.862/en.

References

  1. Allen, J., Berkley, D.: Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am., 943 (1979)

    Article  Google Scholar 

  2. Araki, S., Hayashi, T., Delcroix, M., Fujimoto, M., Takeda, K., Nakatani, T.: Using neural network front-ends on far field multiple microphones based speech recognition. In: Proceedings of ICASSP, Florence, Italy, pp. 5542–5546, 4–9 May 2014

    Google Scholar 

  3. Araki, S., Hayashi, T., Delcroix, M., Fujimoto, M., Takeda, K., Nakatani, T.: Exploring multi-channel features for denoising-autoencoder-based speech enhancement. In: Proceedings of ICASSP, pp. 116–120 (2015)

    Google Scholar 

  4. Benesty, J., Chen, J., Huang, Y.: Microphone Array Signal Processing, vol. 1. Springer Science & Business Media (2008)

    Google Scholar 

  5. Capon, J.: High resolution frequency-wavenumber spectrum analysis. Proc. IEEE 57(8), 1408–1418 (1969)

    Article  Google Scholar 

  6. Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)

    Article  Google Scholar 

  7. Erdogan, H., Hayashi, T., Hershey, J.R., Hori, T., Hori, C., Hsu, W.n., Kim, S., Roux, J.L., Meng, Z., Watanabe, S.: Multi-channel speech recognition: LSTMs all the way through. In: Proceedings of the 4th CHiME Speech Separation and Recognition Challenge, San Francisco, CA, USA (2016)

    Google Scholar 

  8. Gannot, S., Cohen, I.: Speech enhancement based on the general transfer function gsc and postfiltering. IEEE Trans. Speech Audio Process. 12(6), 561–571 (2004)

    Article  Google Scholar 

  9. Griffiths, L., Jim, C.: An alternative approach to linearly constrained adaptive beamforming. IEEE Trans. Antennas Propag. 30(1), 27–34 (1982)

    Article  Google Scholar 

  10. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  11. Hoshen, Y., Weiss, R., Wilson, K.: Speech Acoustic Modeling from Raw Multichannel Waveforms, pp. 4624–4628 (2015)

    Google Scholar 

  12. Hussain, A., Chetouani, M., Squartini, S., Bastari, A., Piazza, F.: Nonlinear Speech Enhancement: An Overview, pp. 217–248. Springer Berlin (2007)

    Google Scholar 

  13. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)

    Google Scholar 

  14. Kinoshita, K., Delcroix, M., Yoshioka, T., Nakatani, T., Sehr, A., Kellermann, W., Maas, R.: The REVERB challenge: a common evaluation framework for dereverberation and recognition of reverberant speech. In: 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 1–4. IEEE (2013)

    Google Scholar 

  15. Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)

    Article  Google Scholar 

  16. Knecht, W., Schenkel, M.E., Moschytz, G.S.: Neural network filters for speech enhancement. IEEE Trans. Speech Audio Process. 3(6), 433–438 (1995)

    Article  Google Scholar 

  17. Li, B., Sainath, T., Weiss, R., Wilson, K., Bacchiani, M.: Neural network adaptive beamforming for robust multichannel speech recognition. In: Proceedings of Interspeech, pp. 1976–1980, 8–12 Sept 2016

    Google Scholar 

  18. Li, J., Deng, L., Haeb-Umbach, R., Gong, Y.: Robust Automatic Speech Recognition: A Bridge to Practical Applications. Academic Press (2015)

    Google Scholar 

  19. Loizou, P.: Speech processing in vocoder-centric cochlear implants. In: Cochlear and Brainstem Implants, vol. 64, pp. 109–143. Karger Publishers (2006)

    Chapter  Google Scholar 

  20. Philipos C. Loizou: Speech Enhancement: Theory and Practice. CRC Press (2013)

    Google Scholar 

  21. Principi, E., Fuselli, D., Squartini, S., Bonifazi, M., Piazza, F.: A speech-based system for in-home emergency detection and remote assistance. In: Proceedings of the 134th International AES Convention, Rome, Italy, pp. 560–569, 4–7 May 2013

    Google Scholar 

  22. Principi, E., Squartini, S., Bonfigli, R., Ferroni, G., Piazza, F.: An integrated system for voice command recognition and emergency detection based on audio signals. Expert Syst. Appl. 42(13), 5668–5683 (2015)

    Article  Google Scholar 

  23. Principi, E., Squartini, S., Piazza, F.: Power normalized cepstral coefficients based supervectors and i-vectors for small vocabulary speech recognition. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), Beijing, China, pp. 3562–3568, 6–11 July 2014

    Google Scholar 

  24. Renals, S., Swietojanski, P.: Neural networks for distant speech recognition. In: Proceedings of HSCMA, pp. 172–176 (2014)

    Google Scholar 

  25. Robinson, T., Fransen, J., Pye, D., Foote, J., Renals, S.: WSJ-CAM0: a british english corpus for large vocabulary continuous speech recognition. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (1994)

    Google Scholar 

  26. Schmidt, R.: Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 34(3), 276–280 (1986)

    Article  Google Scholar 

  27. Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)

    Article  Google Scholar 

  28. Swietojanski, P., Ghoshal, A., Renals, S.: Convolutional neural networks for distant speech recognition. IEEE Signal Process. Lett. 21(9), 1120–1124 (2014)

    Article  Google Scholar 

  29. Xiao, X., Watanabe, S., Erdogan, H., Lu, L., Hershey, J., Seltzer, M., Chen, G., Zhang, Y., Mandel, M., Yu, D.: Deep beamforming networks for multi-channel speech recognition. In: Proceedings of ICASSP, pp. 5745–5749 (2016)

    Google Scholar 

  30. Xiao, X., Zhao, S., Zhong, X., Jones, D.L., Chng, E.S., Li, H.: A learning-based approach to direction of arrival estimation in noisy and reverberant environments. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2814–2818. IEEE (2015)

    Google Scholar 

  31. Yoganathan, V., Moir, T.: Multi-microphone adaptive neural switched Griffiths-Jim beamformer for noise reduction. In: Proceedings of the 10th International Conference on Signal Processing, pp. 299–302 (2010)

    Google Scholar 

  32. Zhang, H., Zhang, X., Gao, G.: Multi-channel speech enhancement based on deep stacking network. In: Proceedings of the 4th CHiME Speech Separation and Recognition Challenge, San Francisco, CA, USA (2016)

    Google Scholar 

Download references

Acknowledgements

We acknowledge the CINECA award under the ISCRA initiative, for the availability of high performance computing resources and support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonardo Gabrielli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Tomassetti, S., Gabrielli, L., Principi, E., Ferretti, D., Squartini, S. (2019). Neural Beamforming for Speech Enhancement: Preliminary Results. In: Esposito, A., Faundez-Zanuy, M., Morabito, F., Pasero, E. (eds) Neural Advances in Processing Nonlinear Dynamic Signals. WIRN 2017 2017. Smart Innovation, Systems and Technologies, vol 102. Springer, Cham. https://doi.org/10.1007/978-3-319-95098-3_4

Download citation

Publish with us

Policies and ethics