Abstract
This paper introduces a new method for multichannel speech enhancement based on a versatile modeling of the residual noise spectrogram. Such a model has already been presented before in the single channel case where the noise component is assumed to follow an alpha-stable distribution for each time-frequency bin, whereas the speech spectrogram, supposed to be more regular, is modeled as Gaussian. In this paper, we describe a multichannel extension of this model, as well as a Monte Carlo Expectation - Maximisation algorithm for parameter estimation. In particular, a multichannel extension of the Itakura-Saito nonnegative matrix factorization is exploited to estimate the spectral parameters for speech, and a Metropolis-Hastings algorithm is proposed to estimate the noise contribution. We evaluate the proposed method in a challenging multichannel denoising application and compare it to other state-of-the-art algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The probability density function (PDF) of an isotropic complex Gaussian vector is \(\mathcal {N}_{C}(\varvec{x}|\mu ,\varvec{C})=\frac{1}{\pi ^{K}\det \varvec{C}}\exp \left( -\left( \varvec{x}-\mu \right) ^{\star }\varvec{C}^{-1}\left( \varvec{x}-\mu \right) \right) \).
References
ANSI: S3. 5–1997, Methods for the calculation of the speech intelligibility index. New York: American National Standards Institute 19, 90–119 (1997)
Van den Bogaert, T., Doclo, S., Wouters, J., Moonen, M.: Speech enhancement with multichannel Wiener filter techniques in multimicrophone binaural hearing aids. J. Acoust. Soc. Am. 125(1), 360–371 (2009)
Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for nonnegative matrix factorization. Pattern Recognit. 41(4), 1350–1362 (2008)
Cambanis, S., Keener, R., Simons, G.: On \(\alpha \)-symmetric multivariate distributions. J. Multivar. Anal. 13(2), 213–233 (1983)
Duong, N., Vincent, E., Gribonval, R.: Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio Speech Lang. Process. 18(7), 1830–1840 (2010)
Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)
Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)
Fontaine, M., Liutkus, A., Girin, L., Badeau, R.: Parameterized Wiener filtering for single-channel denoising. In: Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2017)
Greenberg, J., Peterson, P., Zurek, P.: Intelligibility-weighted measures of speech-to-interference ratio and speech system performance. J. Acoust. Soc. Am. 94(5), 3009–3010 (1993)
Hirsch, H., Pearce, D.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ASR2000-Automatic Speech Recognition: Challenges for the new Millenium ISCA Tutorial and Research Workshop (ITRW) (2000)
Leglaive, S., Simsekli, U., Liutkus, A., Badeau, R., Richard, G.: Alpha-stable multichannel audio source separation. In: 42nd International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)
Liutkus, A., Badeau, R.: Generalized Wiener filtering with fractional power spectrograms. In: 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 266–270. IEEE (2015)
Liutkus, A., Badeau, R., Richard, G.: Gaussian processes for underdetermined source separation. IEEE Trans. Signal Process. 59(7), 3155–3167 (2011)
Nugraha, A.A., Liutkus, A., Vincent, E.: Multichannel music separation with deep neural networks. In: 24th European Signal Processing Conference (EUSIPCO) 2016. pp. 1748–1752. IEEE (2016)
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)
Samoradnitsky, G., Taqqu, M.: Stable non-Gaussian random processes: stochastic models with infinite variance, vol. 1. CRC Press, Boca Raton (1994)
Sawada, H., Kameoka, H., Araki, S., Ueda, N.: Efficient algorithms for multichannel extensions of Itakura-Saito nonnegative matrix factorization. In: 37th International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 261–264. IEEE (2012)
Serizel, R., Moonen, M., Van Dijk, B., Wouters, J.: Low-rank approximation based multichannel Wiener filter algorithms for noise reduction with application in cochlear implants. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 785–799 (2014)
Şimşekli, U., Liutkus, A., Cemgil, A.: Alpha-stable matrix factorization. IEEE Signal Process. Lett. 22(12), 2289–2293 (2015)
Şimşekli, U., et al.: Alpha-stable low-rank plus residual decomposition for speech enhancement. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2018)
Van Veen, B.D., Buckley, K.M.: Beamforming: a versatile approach to spatial filtering. IEEE assp magazine 5(2), 4–24 (1988)
Yoshii, K., Itoyama, K., Goto, M.: Student’s t nonnegative matrix factorization and positive semidefinite tensor factorization for single-channel audio source separation. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 51–55. IEEE (2016)
Acknowledgments
This work was partly supported by the research programme KAMoulox (ANR-15-CE38-0003-01), EDiSon3D (ANR-13-CORD-0008-01), FBIMATRIX (ANR-16-CE23-0014) funded by ANR, the French State agency for research.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Fontaine, M., Stöter, FR., Liutkus, A., Şimşekli, U., Serizel, R., Badeau, R. (2018). Multichannel Audio Modeling with Elliptically Stable Tensor Decomposition. In: Deville, Y., Gannot, S., Mason, R., Plumbley, M., Ward, D. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2018. Lecture Notes in Computer Science(), vol 10891. Springer, Cham. https://doi.org/10.1007/978-3-319-93764-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-93764-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93763-2
Online ISBN: 978-3-319-93764-9
eBook Packages: Computer ScienceComputer Science (R0)