Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues

  • Cédric Févotte
  • Alexey Ozerov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6684)


Nonnegative tensor factorization (NTF) of multichannel spectrograms under PARAFAC structure has recently been proposed by Fitzgerald et al as a mean of performing blind source separation (BSS) of multichannel audio data. In this paper we investigate the statistical source models implied by this approach. We show that it implicitly assumes a nonpoint-source model contrasting with usual BSS assumptions and we clarify the links between the measure of fit chosen for the NTF and the implied statistical distribution of the sources. While the original approach of Fitzgeral et al requires a posterior clustering of the spatial cues to group the NTF components into sources, we discuss means of performing the clustering within the factorization. In the results section we test the impact of the simplifying nonpoint-source assumption on underdetermined linear instantaneous mixtures of musical sources and discuss the limits of the approach for such mixtures.


Nonnegative tensor factorization (NTF) audio source separation nonpoint-source models multiplicative parameter updates 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cao, Y., Eggermont, P.P.B., Terebey, S.: Cross Burg entropy maximization and its application to ringing suppression in image reconstruction. IEEE Transactions on Image Processing 8(2), 286–292 (1999)CrossRefGoogle Scholar
  2. 2.
    Cemgil, A.T.: Bayesian inference for nonnegative matrix factorisation models. Computational Intelligence and Neuroscience (Article ID 785152), 17 pages (2009); doi:10.1155/2009/785152Google Scholar
  3. 3.
    Févotte, C.: Itakura-Saito nonnegative factorizations of the power spectrogram for music signal decomposition. In: Wang, W. (ed.) Machine Audition: Principles, Algorithms and Systems, ch. 11. IGI Global Press (August 2010),
  4. 4.
    Févotte, C., Bertin, N., Durrieu, J.L.: Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis. Neural Computation 21(3), 793–830 (2009), CrossRefzbMATHGoogle Scholar
  5. 5.
    FitzGerald, D., Cranitch, M., Coyle, E.: Non-negative tensor factorisation for sound source separation. In: Proc. of the Irish Signals and Systems Conference, Dublin, Ireland (September 2005)Google Scholar
  6. 6.
    FitzGerald, D., Cranitch, M., Coyle, E.: Extended nonnegative tensor factorisation models for musical sound source separation. Computational Intelligence and Neuroscience (Article ID 872425), 15 pages (2008)Google Scholar
  7. 7.
    Helén, M., Virtanen, T.: Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine. In: Proc. 13th European Signal Processing Conference (EUSIPCO 2005) (2005)Google Scholar
  8. 8.
    Lee, D.D., Seung, H.S.: Learning the parts of objects with nonnegative matrix factorization. Nature 401, 788–791 (1999)CrossRefzbMATHGoogle Scholar
  9. 9.
    Neeser, F.D., Massey, J.L.: Proper complex random processes with applications to information theory. IEEE Transactions on Information Theory 39(4), 1293–1302 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Ozerov, A., Févotte, C.: Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech and Language Processing 18(3), 550–563 (2010), CrossRefGoogle Scholar
  11. 11.
    Parry, R.M., Essa, I.: Estimating the spatial position of spectral components in audio. In: Rosca, J.P., Erdogmus, D., Príncipe, J.C., Haykin, S. (eds.) ICA 2006. LNCS, vol. 3889, pp. 666–673. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
    Shashua, A., Hazan, T.: Non-negative tensor factorization with applications to statistics and computer vision. In: Proc. 22nd International Conference on Machine Learning, pp. 792–799. ACM, Bonn (2005)Google Scholar
  13. 13.
    Shepp, L.A., Vardi, Y.: Maximum likelihood reconstruction for emission tomography. IEEE Transactions on Medical Imaging 1(2), 113–122 (1982)CrossRefGoogle Scholar
  14. 14.
    Smaragdis, P.: Convolutive speech bases and their application to speech separation. IEEE Transactions on Audio, Speech, and Language Processing 15(1), 1–12 (2007)CrossRefGoogle Scholar
  15. 15.
    Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2003) (October 2003)Google Scholar
  16. 16.
    Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech and Language Processing 14(4), 1462–1469 (2006), CrossRefGoogle Scholar
  17. 17.
    Vincent, E., Sawada, H., Bofill, P., Makino, S., Rosca, J.P.: First stereo audio source separation evaluation campaign: Data, algorithms and results. In: Davies, M.E., James, C.J., Abdallah, S.A., Plumbley, M.D. (eds.) ICA 2007. LNCS, vol. 4666, pp. 552–559. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  18. 18.
    Vincent, E., Araki, S., Bofill, P.: Signal Separation Evaluation Campaign. In: (SiSEC 2008) / Under-determined speech and music mixtures task results (2008),
  19. 19.
    Virtanen, T.: Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech and Language Processing 15(3), 1066–1074 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Cédric Févotte
    • 1
  • Alexey Ozerov
    • 2
  1. 1.CNRS LTCITelecom ParisTechParisFrance
  2. 2.IRISA, INRIARennesFrance

Personalised recommendations