Multichannel Audio Modeling with Elliptically Stable Tensor Decomposition

Fontaine, Mathieu; Stöter, Fabian-Robert; Liutkus, Antoine; Şimşekli, Umut; Serizel, Romain; Badeau, Roland

doi:10.1007/978-3-319-93764-9_2

Mathieu Fontaine¹⁸,
Fabian-Robert Stöter¹⁹,
Antoine Liutkus¹⁹,
Umut Şimşekli²⁰,
Romain Serizel¹⁸ &
…
Roland Badeau²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10891))

Included in the following conference series:

International Conference on Latent Variable Analysis and Signal Separation

1684 Accesses
3 Citations

Abstract

This paper introduces a new method for multichannel speech enhancement based on a versatile modeling of the residual noise spectrogram. Such a model has already been presented before in the single channel case where the noise component is assumed to follow an alpha-stable distribution for each time-frequency bin, whereas the speech spectrogram, supposed to be more regular, is modeled as Gaussian. In this paper, we describe a multichannel extension of this model, as well as a Monte Carlo Expectation - Maximisation algorithm for parameter estimation. In particular, a multichannel extension of the Itakura-Saito nonnegative matrix factorization is exploited to estimate the spectral parameters for speech, and a Metropolis-Hastings algorithm is proposed to estimate the noise contribution. We evaluate the proposed method in a challenging multichannel denoising application and compare it to other state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The probability density function (PDF) of an isotropic complex Gaussian vector is \(\mathcal {N}_{C}(\varvec{x}|\mu ,\varvec{C})=\frac{1}{\pi ^{K}\det \varvec{C}}\exp \left( -\left( \varvec{x}-\mu \right) ^{\star }\varvec{C}^{-1}\left( \varvec{x}-\mu \right) \right) \).

References

ANSI: S3. 5–1997, Methods for the calculation of the speech intelligibility index. New York: American National Standards Institute 19, 90–119 (1997)
Google Scholar
Van den Bogaert, T., Doclo, S., Wouters, J., Moonen, M.: Speech enhancement with multichannel Wiener filter techniques in multimicrophone binaural hearing aids. J. Acoust. Soc. Am. 125(1), 360–371 (2009)
Article Google Scholar
Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for nonnegative matrix factorization. Pattern Recognit. 41(4), 1350–1362 (2008)
Article Google Scholar
Cambanis, S., Keener, R., Simons, G.: On \(\alpha \)-symmetric multivariate distributions. J. Multivar. Anal. 13(2), 213–233 (1983)
Article MathSciNet Google Scholar
Duong, N., Vincent, E., Gribonval, R.: Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio Speech Lang. Process. 18(7), 1830–1840 (2010)
Article Google Scholar
Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)
Article Google Scholar
Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)
Article MathSciNet Google Scholar
Fontaine, M., Liutkus, A., Girin, L., Badeau, R.: Parameterized Wiener filtering for single-channel denoising. In: Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2017)
Google Scholar
Greenberg, J., Peterson, P., Zurek, P.: Intelligibility-weighted measures of speech-to-interference ratio and speech system performance. J. Acoust. Soc. Am. 94(5), 3009–3010 (1993)
Article Google Scholar
Hirsch, H., Pearce, D.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ASR2000-Automatic Speech Recognition: Challenges for the new Millenium ISCA Tutorial and Research Workshop (ITRW) (2000)
Google Scholar
Leglaive, S., Simsekli, U., Liutkus, A., Badeau, R., Richard, G.: Alpha-stable multichannel audio source separation. In: 42nd International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)
Google Scholar
Liutkus, A., Badeau, R.: Generalized Wiener filtering with fractional power spectrograms. In: 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 266–270. IEEE (2015)
Google Scholar
Liutkus, A., Badeau, R., Richard, G.: Gaussian processes for underdetermined source separation. IEEE Trans. Signal Process. 59(7), 3155–3167 (2011)
Article MathSciNet Google Scholar
Nugraha, A.A., Liutkus, A., Vincent, E.: Multichannel music separation with deep neural networks. In: 24th European Signal Processing Conference (EUSIPCO) 2016. pp. 1748–1752. IEEE (2016)
Google Scholar
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)
Google Scholar
Samoradnitsky, G., Taqqu, M.: Stable non-Gaussian random processes: stochastic models with infinite variance, vol. 1. CRC Press, Boca Raton (1994)
Google Scholar
Sawada, H., Kameoka, H., Araki, S., Ueda, N.: Efficient algorithms for multichannel extensions of Itakura-Saito nonnegative matrix factorization. In: 37th International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 261–264. IEEE (2012)
Google Scholar
Serizel, R., Moonen, M., Van Dijk, B., Wouters, J.: Low-rank approximation based multichannel Wiener filter algorithms for noise reduction with application in cochlear implants. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 785–799 (2014)
Article Google Scholar
Şimşekli, U., Liutkus, A., Cemgil, A.: Alpha-stable matrix factorization. IEEE Signal Process. Lett. 22(12), 2289–2293 (2015)
Article Google Scholar
Şimşekli, U., et al.: Alpha-stable low-rank plus residual decomposition for speech enhancement. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2018)
Google Scholar
Van Veen, B.D., Buckley, K.M.: Beamforming: a versatile approach to spatial filtering. IEEE assp magazine 5(2), 4–24 (1988)
Article Google Scholar
Yoshii, K., Itoyama, K., Goto, M.: Student’s t nonnegative matrix factorization and positive semidefinite tensor factorization for single-channel audio source separation. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 51–55. IEEE (2016)
Google Scholar

Download references

Acknowledgments

This work was partly supported by the research programme KAMoulox (ANR-15-CE38-0003-01), EDiSon3D (ANR-13-CORD-0008-01), FBIMATRIX (ANR-16-CE23-0014) funded by ANR, the French State agency for research.

Author information

Authors and Affiliations

Université de Lorraine, CNRS, Inria, LORIA, 54000, Nancy, France
Mathieu Fontaine & Romain Serizel
Inria and LIRMM, Montpellier, France
Fabian-Robert Stöter & Antoine Liutkus
LTCI, Télécom ParisTech, Université Paris-Saclay, Paris, France
Umut Şimşekli & Roland Badeau

Authors

Mathieu Fontaine
View author publications
You can also search for this author in PubMed Google Scholar
Fabian-Robert Stöter
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Liutkus
View author publications
You can also search for this author in PubMed Google Scholar
Umut Şimşekli
View author publications
You can also search for this author in PubMed Google Scholar
Romain Serizel
View author publications
You can also search for this author in PubMed Google Scholar
Roland Badeau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Mathieu Fontaine , Fabian-Robert Stöter , Antoine Liutkus , Umut Şimşekli , Romain Serizel or Roland Badeau .

Editor information

Editors and Affiliations

Paul Sabatier University, Toulouse, France
Yannick Deville
Bar-Ilan University, Ramat Gan, Israel
Sharon Gannot
University of Surrey, Guildford, United Kingdom
Russell Mason
University of Surrey, Guildford, United Kingdom
Mark D. Plumbley
University of Surrey, Guildford, United Kingdom
Dominic Ward

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fontaine, M., Stöter, FR., Liutkus, A., Şimşekli, U., Serizel, R., Badeau, R. (2018). Multichannel Audio Modeling with Elliptically Stable Tensor Decomposition. In: Deville, Y., Gannot, S., Mason, R., Plumbley, M., Ward, D. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2018. Lecture Notes in Computer Science(), vol 10891. Springer, Cham. https://doi.org/10.1007/978-3-319-93764-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-93764-9_2
Published: 06 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93763-2
Online ISBN: 978-3-319-93764-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics