Skip to main content

Multichannel Audio Modeling with Elliptically Stable Tensor Decomposition

  • Conference paper
  • First Online:
Latent Variable Analysis and Signal Separation (LVA/ICA 2018)

Abstract

This paper introduces a new method for multichannel speech enhancement based on a versatile modeling of the residual noise spectrogram. Such a model has already been presented before in the single channel case where the noise component is assumed to follow an alpha-stable distribution for each time-frequency bin, whereas the speech spectrogram, supposed to be more regular, is modeled as Gaussian. In this paper, we describe a multichannel extension of this model, as well as a Monte Carlo Expectation - Maximisation algorithm for parameter estimation. In particular, a multichannel extension of the Itakura-Saito nonnegative matrix factorization is exploited to estimate the spectral parameters for speech, and a Metropolis-Hastings algorithm is proposed to estimate the noise contribution. We evaluate the proposed method in a challenging multichannel denoising application and compare it to other state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The probability density function (PDF) of an isotropic complex Gaussian vector is \(\mathcal {N}_{C}(\varvec{x}|\mu ,\varvec{C})=\frac{1}{\pi ^{K}\det \varvec{C}}\exp \left( -\left( \varvec{x}-\mu \right) ^{\star }\varvec{C}^{-1}\left( \varvec{x}-\mu \right) \right) \).

References

  1. ANSI: S3. 5–1997, Methods for the calculation of the speech intelligibility index. New York: American National Standards Institute 19, 90–119 (1997)

    Google Scholar 

  2. Van den Bogaert, T., Doclo, S., Wouters, J., Moonen, M.: Speech enhancement with multichannel Wiener filter techniques in multimicrophone binaural hearing aids. J. Acoust. Soc. Am. 125(1), 360–371 (2009)

    Article  Google Scholar 

  3. Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for nonnegative matrix factorization. Pattern Recognit. 41(4), 1350–1362 (2008)

    Article  Google Scholar 

  4. Cambanis, S., Keener, R., Simons, G.: On \(\alpha \)-symmetric multivariate distributions. J. Multivar. Anal. 13(2), 213–233 (1983)

    Article  MathSciNet  Google Scholar 

  5. Duong, N., Vincent, E., Gribonval, R.: Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio Speech Lang. Process. 18(7), 1830–1840 (2010)

    Article  Google Scholar 

  6. Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)

    Article  Google Scholar 

  7. Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)

    Article  MathSciNet  Google Scholar 

  8. Fontaine, M., Liutkus, A., Girin, L., Badeau, R.: Parameterized Wiener filtering for single-channel denoising. In: Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2017)

    Google Scholar 

  9. Greenberg, J., Peterson, P., Zurek, P.: Intelligibility-weighted measures of speech-to-interference ratio and speech system performance. J. Acoust. Soc. Am. 94(5), 3009–3010 (1993)

    Article  Google Scholar 

  10. Hirsch, H., Pearce, D.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ASR2000-Automatic Speech Recognition: Challenges for the new Millenium ISCA Tutorial and Research Workshop (ITRW) (2000)

    Google Scholar 

  11. Leglaive, S., Simsekli, U., Liutkus, A., Badeau, R., Richard, G.: Alpha-stable multichannel audio source separation. In: 42nd International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)

    Google Scholar 

  12. Liutkus, A., Badeau, R.: Generalized Wiener filtering with fractional power spectrograms. In: 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 266–270. IEEE (2015)

    Google Scholar 

  13. Liutkus, A., Badeau, R., Richard, G.: Gaussian processes for underdetermined source separation. IEEE Trans. Signal Process. 59(7), 3155–3167 (2011)

    Article  MathSciNet  Google Scholar 

  14. Nugraha, A.A., Liutkus, A., Vincent, E.: Multichannel music separation with deep neural networks. In: 24th European Signal Processing Conference (EUSIPCO) 2016. pp. 1748–1752. IEEE (2016)

    Google Scholar 

  15. Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)

    Google Scholar 

  16. Samoradnitsky, G., Taqqu, M.: Stable non-Gaussian random processes: stochastic models with infinite variance, vol. 1. CRC Press, Boca Raton (1994)

    Google Scholar 

  17. Sawada, H., Kameoka, H., Araki, S., Ueda, N.: Efficient algorithms for multichannel extensions of Itakura-Saito nonnegative matrix factorization. In: 37th International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 261–264. IEEE (2012)

    Google Scholar 

  18. Serizel, R., Moonen, M., Van Dijk, B., Wouters, J.: Low-rank approximation based multichannel Wiener filter algorithms for noise reduction with application in cochlear implants. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 785–799 (2014)

    Article  Google Scholar 

  19. Şimşekli, U., Liutkus, A., Cemgil, A.: Alpha-stable matrix factorization. IEEE Signal Process. Lett. 22(12), 2289–2293 (2015)

    Article  Google Scholar 

  20. Şimşekli, U., et al.: Alpha-stable low-rank plus residual decomposition for speech enhancement. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2018)

    Google Scholar 

  21. Van Veen, B.D., Buckley, K.M.: Beamforming: a versatile approach to spatial filtering. IEEE assp magazine 5(2), 4–24 (1988)

    Article  Google Scholar 

  22. Yoshii, K., Itoyama, K., Goto, M.: Student’s t nonnegative matrix factorization and positive semidefinite tensor factorization for single-channel audio source separation. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 51–55. IEEE (2016)

    Google Scholar 

Download references

Acknowledgments

This work was partly supported by the research programme KAMoulox (ANR-15-CE38-0003-01), EDiSon3D (ANR-13-CORD-0008-01), FBIMATRIX (ANR-16-CE23-0014) funded by ANR, the French State agency for research.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mathieu Fontaine , Fabian-Robert Stöter , Antoine Liutkus , Umut Şimşekli , Romain Serizel or Roland Badeau .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fontaine, M., Stöter, FR., Liutkus, A., Şimşekli, U., Serizel, R., Badeau, R. (2018). Multichannel Audio Modeling with Elliptically Stable Tensor Decomposition. In: Deville, Y., Gannot, S., Mason, R., Plumbley, M., Ward, D. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2018. Lecture Notes in Computer Science(), vol 10891. Springer, Cham. https://doi.org/10.1007/978-3-319-93764-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93764-9_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93763-2

  • Online ISBN: 978-3-319-93764-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics