Neural Beamforming for Speech Enhancement: Preliminary Results

Tomassetti, Stefano; Gabrielli, Leonardo; Principi, Emanuele; Ferretti, Daniele; Squartini, Stefano

doi:10.1007/978-3-319-95098-3_4

Neural Beamforming for Speech Enhancement: Preliminary Results

Stefano Tomassetti⁷,
Leonardo Gabrielli⁷,
Emanuele Principi⁷,
Daniele Ferretti⁷ &
…
Stefano Squartini⁷

Chapter
First Online: 22 July 2018

508 Accesses
1 Citations

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 102))

Abstract

In the field of multi-channel speech quality enhancement, beamforming algorithms play a key role, being able to reduce noise and reverberation by spatial filtering. To that extent, an accurate knowledge of the Direction of Arrival (DOA) is crucial for the beamforming to be effective. This paper reports extremely improved DOA estimates with the use of a recently introduced neural DOA estimation technique, when compared to a reference algorithm such as Multiple Signal Classification (MUSIC). These findings motivated for the evaluation of beamforming with neural DOA estimation in the field of speech enhancement. By using the neural DOA estimation in conjunction with beamforming, speech signals affected by reverberation and noise improve their quality. These first findings are reported to be taken as a reference for further works related to beamforming for speech enhancement.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Allen, J., Berkley, D.: Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am., 943 (1979)
Article Google Scholar
Araki, S., Hayashi, T., Delcroix, M., Fujimoto, M., Takeda, K., Nakatani, T.: Using neural network front-ends on far field multiple microphones based speech recognition. In: Proceedings of ICASSP, Florence, Italy, pp. 5542–5546, 4–9 May 2014
Google Scholar
Araki, S., Hayashi, T., Delcroix, M., Fujimoto, M., Takeda, K., Nakatani, T.: Exploring multi-channel features for denoising-autoencoder-based speech enhancement. In: Proceedings of ICASSP, pp. 116–120 (2015)
Google Scholar
Benesty, J., Chen, J., Huang, Y.: Microphone Array Signal Processing, vol. 1. Springer Science & Business Media (2008)
Google Scholar
Capon, J.: High resolution frequency-wavenumber spectrum analysis. Proc. IEEE 57(8), 1408–1418 (1969)
Article Google Scholar
Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)
Article Google Scholar
Erdogan, H., Hayashi, T., Hershey, J.R., Hori, T., Hori, C., Hsu, W.n., Kim, S., Roux, J.L., Meng, Z., Watanabe, S.: Multi-channel speech recognition: LSTMs all the way through. In: Proceedings of the 4th CHiME Speech Separation and Recognition Challenge, San Francisco, CA, USA (2016)
Google Scholar
Gannot, S., Cohen, I.: Speech enhancement based on the general transfer function gsc and postfiltering. IEEE Trans. Speech Audio Process. 12(6), 561–571 (2004)
Article Google Scholar
Griffiths, L., Jim, C.: An alternative approach to linearly constrained adaptive beamforming. IEEE Trans. Antennas Propag. 30(1), 27–34 (1982)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hoshen, Y., Weiss, R., Wilson, K.: Speech Acoustic Modeling from Raw Multichannel Waveforms, pp. 4624–4628 (2015)
Google Scholar
Hussain, A., Chetouani, M., Squartini, S., Bastari, A., Piazza, F.: Nonlinear Speech Enhancement: An Overview, pp. 217–248. Springer Berlin (2007)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)
Google Scholar
Kinoshita, K., Delcroix, M., Yoshioka, T., Nakatani, T., Sehr, A., Kellermann, W., Maas, R.: The REVERB challenge: a common evaluation framework for dereverberation and recognition of reverberant speech. In: 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 1–4. IEEE (2013)
Google Scholar
Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)
Article Google Scholar
Knecht, W., Schenkel, M.E., Moschytz, G.S.: Neural network filters for speech enhancement. IEEE Trans. Speech Audio Process. 3(6), 433–438 (1995)
Article Google Scholar
Li, B., Sainath, T., Weiss, R., Wilson, K., Bacchiani, M.: Neural network adaptive beamforming for robust multichannel speech recognition. In: Proceedings of Interspeech, pp. 1976–1980, 8–12 Sept 2016
Google Scholar
Li, J., Deng, L., Haeb-Umbach, R., Gong, Y.: Robust Automatic Speech Recognition: A Bridge to Practical Applications. Academic Press (2015)
Google Scholar
Loizou, P.: Speech processing in vocoder-centric cochlear implants. In: Cochlear and Brainstem Implants, vol. 64, pp. 109–143. Karger Publishers (2006)
Chapter Google Scholar
Philipos C. Loizou: Speech Enhancement: Theory and Practice. CRC Press (2013)
Google Scholar
Principi, E., Fuselli, D., Squartini, S., Bonifazi, M., Piazza, F.: A speech-based system for in-home emergency detection and remote assistance. In: Proceedings of the 134th International AES Convention, Rome, Italy, pp. 560–569, 4–7 May 2013
Google Scholar
Principi, E., Squartini, S., Bonfigli, R., Ferroni, G., Piazza, F.: An integrated system for voice command recognition and emergency detection based on audio signals. Expert Syst. Appl. 42(13), 5668–5683 (2015)
Article Google Scholar
Principi, E., Squartini, S., Piazza, F.: Power normalized cepstral coefficients based supervectors and i-vectors for small vocabulary speech recognition. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), Beijing, China, pp. 3562–3568, 6–11 July 2014
Google Scholar
Renals, S., Swietojanski, P.: Neural networks for distant speech recognition. In: Proceedings of HSCMA, pp. 172–176 (2014)
Google Scholar
Robinson, T., Fransen, J., Pye, D., Foote, J., Renals, S.: WSJ-CAM0: a british english corpus for large vocabulary continuous speech recognition. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (1994)
Google Scholar
Schmidt, R.: Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 34(3), 276–280 (1986)
Article Google Scholar
Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)
Article Google Scholar
Swietojanski, P., Ghoshal, A., Renals, S.: Convolutional neural networks for distant speech recognition. IEEE Signal Process. Lett. 21(9), 1120–1124 (2014)
Article Google Scholar
Xiao, X., Watanabe, S., Erdogan, H., Lu, L., Hershey, J., Seltzer, M., Chen, G., Zhang, Y., Mandel, M., Yu, D.: Deep beamforming networks for multi-channel speech recognition. In: Proceedings of ICASSP, pp. 5745–5749 (2016)
Google Scholar
Xiao, X., Zhao, S., Zhong, X., Jones, D.L., Chng, E.S., Li, H.: A learning-based approach to direction of arrival estimation in noisy and reverberant environments. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2814–2818. IEEE (2015)
Google Scholar
Yoganathan, V., Moir, T.: Multi-microphone adaptive neural switched Griffiths-Jim beamformer for noise reduction. In: Proceedings of the 10th International Conference on Signal Processing, pp. 299–302 (2010)
Google Scholar
Zhang, H., Zhang, X., Gao, G.: Multi-channel speech enhancement based on deep stacking network. In: Proceedings of the 4th CHiME Speech Separation and Recognition Challenge, San Francisco, CA, USA (2016)
Google Scholar

Download references

Acknowledgements

We acknowledge the CINECA award under the ISCRA initiative, for the availability of high performance computing resources and support.

Author information

Authors and Affiliations

Dipartimento di Ingegneria dell’Informazione, Università Politecnica delle Marche, Via Brecce Bianche, 60131, Ancona, Italy
Stefano Tomassetti, Leonardo Gabrielli, Emanuele Principi, Daniele Ferretti & Stefano Squartini

Authors

Stefano Tomassetti
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Gabrielli
View author publications
You can also search for this author in PubMed Google Scholar
Emanuele Principi
View author publications
You can also search for this author in PubMed Google Scholar
Daniele Ferretti
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Squartini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leonardo Gabrielli .

Editor information

Editors and Affiliations

Dipartimento di Psicologia, Università della Campana Luigi Vanvitelli, Caserta, Italy
Anna Esposito
Fundació Tecnocampus, Pompeu Fabra University, Mataro, Barcelona, Spain
Marcos Faundez-Zanuy
Department of Civil, Environmental, Energy, and Material Engineering, University Mediterranea of Reggio Calabria, Reggio Calabria, Italy
Francesco Carlo Morabito
Laboratorio di Neuronica, Dipartimento Elettronica e Telecomunicazioni, Politecnico di Torino, Torino, Italy
Eros Pasero

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tomassetti, S., Gabrielli, L., Principi, E., Ferretti, D., Squartini, S. (2019). Neural Beamforming for Speech Enhancement: Preliminary Results. In: Esposito, A., Faundez-Zanuy, M., Morabito, F., Pasero, E. (eds) Neural Advances in Processing Nonlinear Dynamic Signals. WIRN 2017 2017. Smart Innovation, Systems and Technologies, vol 102. Springer, Cham. https://doi.org/10.1007/978-3-319-95098-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-95098-3_4
Published: 22 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95097-6
Online ISBN: 978-3-319-95098-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics