Skip to main content

Non-negative Matrix Factorization and Its Variants for Audio Signal Processing

  • Chapter
  • First Online:
Applied Matrix and Tensor Variate Data Analysis

Part of the book series: SpringerBriefs in Statistics ((JSSRES))

Abstract

In this chapter, I briefly introduce a multivariate analysis technique called non-negative matrix factorization (NMF), which has attracted a lot of attention in the field of audio signal processing in recent years. I will mention some basic properties of NMF, effects induced by the non-negative constraints, how to derive an iterative algorithm for NMF, and some attempts that have been made to apply NMF to audio processing problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lee, D. D., & Seung, H. S. (2000). Algorithms for nonnegative matrix factorization. In Advances in NIPS (pp. 556–562).

    Google Scholar 

  2. Paatero, P., & Tapper, U. (1994). Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 5, 111–126.

    Article  Google Scholar 

  3. Csiszár, I. (1975). I-divergence geometry of probability distributions and minimization problems. The Annals of Probability, 3(1), 146–158.

    Article  MATH  Google Scholar 

  4. Parry, R. M., & Essa, I. (2007). Phase-aware non-negative spectrogram factorization. In Proceedings of ICA (pp. 536–543).

    Google Scholar 

  5. Févotte, C., Bertin, N., & Durrieu, J.-L. (2009). Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis. Neural Computation, 21(3), 793–830.

    Article  MATH  Google Scholar 

  6. Ortega, J. M., & Rheinboldt, W. C. (1970). Iterative solutions of nonlinear equations in several variables. New York: Academic Press.

    Google Scholar 

  7. Hunter, D. R., & Lange, K. (2000). Quantile regression via an MM algorithm. Journal of Computational and Graphical Statistics, 9, 60–77.

    MathSciNet  Google Scholar 

  8. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39(1), 1–38.

    MathSciNet  MATH  Google Scholar 

  9. Kameoka, H., Goto, M., & Sagayama, S. (2006, August). Selective amplifier of periodic and non-periodic components in concurrent audio signals with spectral control envelopes. IPSJ Technical Report (vol. 2006-MUS-66, pp. 77–84) (in Japanese).

    Google Scholar 

  10. Eguchi, S., & Kano, Y. (2001). "Robustifying maximum likelihood estimation. Technical Report, Institute of Statistical Mathematics. Research Memo. 802.

    Google Scholar 

  11. Nakano, M., Kameoka, H., Le Roux, J., Kitano, Y., Ono, N., & Sagayama, S. (2010). Convergence-guaranteed multiplicative algorithms for non-negative matrix factorization with beta-divergence. In Proceedings of MLSP (pp. 283–288).

    Google Scholar 

  12. Bregman, L. M. (1967). The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7(3), 210–217.

    Article  Google Scholar 

  13. Hennequin, R., David, B., & Badeau, R. (2011). Beta-divergence as a subclass of Bregman divergence. IEEE Signal Processing Letters, 18(2), 83–86.

    Article  Google Scholar 

  14. Dhillon, I. S., & Sra, S. (2005). Generalized nonnegative matrix approximations with Bregman divergences. In Advances in NIPS (pp. 283–290).

    Google Scholar 

  15. Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of UAI (pp. 289–296).

    Google Scholar 

  16. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022. (J. Lafferty (Ed.)).

    MATH  Google Scholar 

  17. Cemgil, A. T. (2008). Bayesian inference for nonnegative matrix factorization models, Technical Report CUED/F-INFENG/TR.609, University of Cambridge.

    Google Scholar 

  18. Smaragdis, P., & Brown, J. C. (2003). Non-negative matrix factorization for music transcription. In Proceedings of WASPAA (pp. 177–180).

    Google Scholar 

  19. Kameoka, H., Ono, N., Kashino, K., & Sagayama, S. (2009) Complex NMF: A new sparse representation for acoustic signals. In Proceedings of ICASSP (pp. 3437–3440).

    Google Scholar 

  20. Smaragdis, P. (2004). Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs. In Proceedings of ICA (pp. 494–499).

    Google Scholar 

  21. Ozerov, A. Févotte, C., & Charbit, M. (2009). Factorial scaled hidden Markov model for polyphonic audio representation and source separation. In Proceedings of WASPAA (pp. 121–124).

    Google Scholar 

  22. Nakano, M., Le Roux, J., Kameoka, H., Nakamura, T., Ono, N., & Sagayama, S. (2011). Bayesian nonparametric spectrogram modeling based on infinite factorial infinite hidden Markov model. In Proceedings of WASPAA (pp. 325–328).

    Google Scholar 

  23. Virtanen, T. (2007). Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1066–1074.

    Article  Google Scholar 

  24. Raczynski, S. A., Ono, N., & Sagayama, S. (2007). Multipitch analisys with harmonic nonnegative matrix approximation. In Proceedings of ISMIR (pp. 381–386).

    Google Scholar 

  25. Virtanen, T., & Klapuri, A. (2006). Analysis of polyphonic audio using source-filter model and non-negative matrix factorization. In Advances of NIPS.

    Google Scholar 

  26. Vincent, E., Bertin, N., & Badeau, R. (2008) Harmonic and inharmonic nonnegative matrix factorization for polyphonic pitch transcription. In Proceedings of ICASSP (pp. 109–112).

    Google Scholar 

  27. Kameoka, H., & Kashino, K. (2009). Composite autoregressive system for sparse source-filter representation of speech. In Proceedings of ISCAS (pp. 2477–2480).

    Google Scholar 

  28. Yoshii, K., & Goto, M. (2012, October). Infinite composite autoregressive models for music signal analysis. In Proceedings of The 13th International Society for Music Information Retrieval Conference (ISMIR) (pp. 79–84).

    Google Scholar 

  29. Kameoka, H., Nakano, M., Ochiai, K., Imoto, Y., Kashino, K., & Sagayama, S. (2012). Constrained and regularized variants of non-negative matrix factorization incorporating music-specific constraints. In Proceedings of ICASSP (pp. 5365–5368).

    Google Scholar 

  30. Smaragdis, P., Raj, B., & Shashanka, M. V. (2007). Supervised and semi-supervised separation of sounds from single-channel mixtures. In Proceedings of ICA (pp. 414–421).

    Google Scholar 

  31. Smaragdis, P., & Raj, B. (2007). Example-driven bandwidth expansion. In Proceedings of WASPAA (pp. 135–138).

    Google Scholar 

  32. Durrieu, J.-L., Richard, G., David, B., & Févotte, C. (2010). Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 564–575.

    Article  Google Scholar 

  33. Helén, M., & Virtanen, T. (2005). Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine. In Proceedings of EUSIPCO.

    Google Scholar 

  34. Hurmalainen, A., Gemmeke, J., & Virtanen, T. (2011). Non-negative matrix deconvolution in noise robust speech recognition. In Proceddings of ICASSP (pp. 4588–4591).

    Google Scholar 

  35. Durrieu, J. -L., Thiran, J. -P. (2011). Sparse non-negative decomposition of speech power spectra for formant tracking. In Proceedings of ICASSP (pp. 5260–5263).

    Google Scholar 

  36. Togami, M., Kawaguchi, Y., Kokubo, H., & Obuchi, Y. (2010). Acoustic echo suppressor with multichannel semi-blind non-negative matrix factorization. In Proceedings of APSIPA (pp. 522–525).

    Google Scholar 

  37. Hiroya, S. (2013). Non-negative temporal decomposition of speech parameters by multiplicative update rules. IEEE Transactions on Audio, Speech, and Language Processing, 21(10), 2108–2117.

    Article  Google Scholar 

  38. Kameoka, H., Nakatani, T., & Yoshioka, T. (2009). Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms. In Proceedings of ICASSP (pp. 45–48).

    Google Scholar 

  39. Ozerov, A., & Févotte, C. (2010). Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 550–563.

    Article  Google Scholar 

  40. Kitano, Y., Kameoka, H., Izumi, Y., Ono, N., & Sagayama, S. (2010). A sparse component model of source sinals and its application to blind source separation. In Proceedings of ICASSP (pp. 4122–4125).

    Google Scholar 

  41. Sawada, H., Kameoka, H., Araki, S., & Ueda, N. (2011). New formulations and efficient algorithms for multichannel NMF. In Proceedings of WASPAA (pp. 153–156).

    Google Scholar 

  42. Sawada, H., Kameoka, H., Araki, S., & Ueda, N. (2012). Efficient algorithms for multichannel extensions of Itakura-Saito nonnegative matrix factorization. In Proceedings of ICASSP (pp. 261–264).

    Google Scholar 

  43. Higuchi, T., Takeda, H., Nakamura, T., Kameoka, H. (2014). A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models. In Proceedings of The 15th Annual Conference of the International Speech Communication Association (Interspeech 2014) (pp. 850–854).

    Google Scholar 

  44. Schmidt, M. N., Winther, O., & Hansen, L. K. (2009). Bayesian non-negative matrix factorization. In Proceedings of ICA (pp. 540–547).

    Google Scholar 

  45. Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464.

    Article  MathSciNet  MATH  Google Scholar 

  46. Corduneanu, A., & Bishop, C. M. (2001). Variational Bayesian model selection for mixture distributions. In Proceedings of AISTATS (pp. 27–34).

    Google Scholar 

  47. Smaragdis, P., Raj, B., & Shashanka, M. (2006). A probabilistic latent variable model for acoustic modeling. In Advances in NIPS.

    Google Scholar 

  48. Yoshii, K., & Goto, M. (2012). A nonparametric Bayesian multipitch analyzer based on infinite latent harmonic allocation. IEEE Transactions on Audio, Speech, and Language Processing, 20(3), 717–730.

    Article  Google Scholar 

  49. Knowles, D., & Ghahramani, Z. (2007). Infinite sparse factor analysis and infinite independent components analysis.

    Google Scholar 

  50. Liang, D., Hoffman, M. D., & Ellis, D. P. W. (2013). Beta process sparse nonnegative matrix factorization for music.

    Google Scholar 

  51. Hoffman, M., Blei, D. & Cook, P. (2010). Bayesian nonparametric matrix factorization for recorded music. In Proceedings of ICML (pp. 439–446).

    Google Scholar 

  52. Cichocki, A., Zdunek, R., Phan, A. H., & Amari, S. (2009). Nonnegative matrix and tensor factorizations: Applications to exploratory multi-way data analysis and blind source separation. London: Wiley.

    Google Scholar 

  53. Kameoka, H. (2012). Non-negative matrix factorization with application to audio signal processing. Acoustical Science and Technology, 68(11), 559–565. (in Japanese).

    Google Scholar 

  54. Sawada, H. (2012). Nonnegative matrix factorization and its applications to data/signal analysis. IEICE Journal, 95, 829–833.

    Google Scholar 

  55. Smaragdis, P., Fevotte, C., Mysore, G., Mohammadiha, N., & Hoffman, M. (2014). Static and dynamic source separation using nonnegative factorizations: A unified view. In IEEE Signal Processing Magazine (pp. 66–75).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hirokazu Kameoka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 The Author(s)

About this chapter

Cite this chapter

Kameoka, H. (2016). Non-negative Matrix Factorization and Its Variants for Audio Signal Processing. In: Sakata, T. (eds) Applied Matrix and Tensor Variate Data Analysis. SpringerBriefs in Statistics(). Springer, Tokyo. https://doi.org/10.1007/978-4-431-55387-8_2

Download citation

Publish with us

Policies and ethics