Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification

Wang, Wenwu; Mustafa, Hafiz

doi:10.1007/978-3-642-23126-1_7

Wenwu Wang²⁰ &
Hafiz Mustafa²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6684))

Included in the following conference series:

International Symposium on Computer Music Modeling and Retrieval

1268 Accesses

Abstract

Separating multiple music sources from a single channel mixture is a challenging problem. We present a new approach to this problem based on non-negative matrix factorization (NMF) and note classification, assuming that the instruments used to play the sound signals are known a priori. The spectrogram of the mixture signal is first decomposed into building components (musical notes) using an NMF algorithm. The Mel frequency cepstrum coefficients (MFCCs) of both the decomposed components and the signals in the training dataset are extracted. The mean squared errors (MSEs) between the MFCC feature space of the decomposed music component and those of the training signals are used as the similarity measures for the decomposed music notes. The notes are then labelled to the corresponding type of instruments by the K nearest neighbors (K-NN) classification algorithm based on the MSEs. Finally, the source signals are reconstructed from the classified notes and the weighting matrices obtained from the NMF algorithm. Simulations are provided to show the performance of the proposed system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abdallah, S.A., Plumbley, M.D.: Polyphonic Transcription by Non-Negative Sparse Coding of Power Spectra. In: International Conference on Music Information Retrieval, Barcelona, Spain (October 2004)
Google Scholar
Barry, D., Lawlor, B., Coyle, E.: Real-time Sound Source Separation: Azimuth Discrimination and Re-synthesis, AES (2004)
Google Scholar
Brown, G.J., Cooke, M.P.: Perceptual Grouping of Musical Sounds: A Computational Model. J. New Music Res. 23, 107–132 (1994)
Article Google Scholar
Casey, M.A., Westner, W.: Separation of Mixed Audio Sources by Independent Subspace Analysis. In: Proc. Int. Comput. Music Conf. (2000)
Google Scholar
Devijver, P.A., Kittler, J.: Pattern Recognition - A Statistical Approach. Prentice Hall International, Englewood Cliffs (1982)
MATH Google Scholar
Every, M.R., Szymanski, J.E.: Separation of Synchronous Pitched Notes by Spectral Filtering of Harmonics. IEEE Trans. Audio Speech Lang. Process. 14, 1845–1856 (2006)
Article Google Scholar
Fevotte, C., Bertin, N., Durrieu, J.-L.: Nonnegative Matrix Factorization With the Itakura-Saito Divergence. With Application to Music Analysis. Neural Computation 21, 793–830 (2009)
Article MATH Google Scholar
FitzGerald, D., Cranitch, M., Coyle, E.: Extended Nonnegative Tensor Factorisation Models for Musical Sound Source Separation, Article ID 872425, 15 pages (2008)
Google Scholar
Fukunage, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press Inc., London (1990)
Google Scholar
Gutierrez-Osuna, R.: Lecture 12: K Nearest Neighbor Classifier, http://research.cs.tamu.edu/prism/lectures (accessed January 17, 2010)
Hoyer, P.: Non-Negative Sparse Coding. In: IEEE Workshop on Networks for Signal Processing XII, Martigny, Switzerland (2002)
Google Scholar
Lee, D.D., Seung, H.S.: Learning the Parts of Objects by Non-Negative Matrix Factorization. Nature 401, 788–791 (1999)
Article MATH Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for Non-negative Matrix Factorization. In: Neural Information Processing Systems, Denver (2001)
Google Scholar
Li, Y., Woodruff, J., Wang, D.L.: Monaural Musical Sound Separation Based on Pitch and Common Amplitude Modulation. IEEE Transactions on Audio, Speech, and Language Processing 17, 1361–1371 (2009)
Article Google Scholar
Mellinger, D.K.: Event Formation and Separation in Musical Sound. PhD dissertation, Dept. of Comput. Sci., Standford Univ., Standford, CA (1991)
Google Scholar
Opolko, F., Wapnick, J.: McGill University master samples, McGill Univ., Montreal, QC, Canada, Tech. Rep. (1987)
Google Scholar
Pedersen, M.S., Wang, D.L., Larsen, J., Kjems, U.: Two-Microphone Separation of Speech Mixtures. IEEE Trans. on Neural Networks 19, 475–492 (2008)
Article Google Scholar
Rickard, S., Balan, R., Rosca, J.: Real-time Time-Frequency based Blind Source Separation. In: 3rd International Conference on Independent Component Analysis and Blind Source Separation, San Diego, CA (December 2001)
Google Scholar
Smaragdis, P., Brown, J.C.: Non-negative Matrix Factorization for Polyphonic Music Transcription. In: Proc. IEEE Int. Workshop Application on Signal Process. Audio Acoust., pp. 177–180 (2003)
Google Scholar
Smaragdis, P.: Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs. In: Puntonet, C.G., Prieto, A.G. (eds.) ICA 2004. LNCS, vol. 3195, pp. 494–499. Springer, Heidelberg (2004)
Chapter Google Scholar
The University of Iowa Musical Instrument Samples Database, http://theremin.music.uiowa.edu
Virtanen, T.: Sound Source Separation Using Sparse Coding with Temporal Continuity Objective. In: International Computer Music Conference, Singapore (2003)
Google Scholar
Virtanen, T.: Separation of Sound Sources by Convolutive Sparse Coding. In: Proceedings of ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, Jeju, Korea (2004)
Google Scholar
Virtanen, T.: Sound Source Separation in Monaural Music Signals. PhD dissertation, Tampere Univ. of Technol., Tampere, Finland (2006)
Google Scholar
Virtanen, T.: Monaural Sound Source Separation by Non-Negative Matrix Factorization with Temporal Continuity and Sparseness Criteria. IEEE Transactions on Audio, Speech, and Language Processing 15, 1066–1073 (2007)
Article Google Scholar
Wang, D.L., Brown, G.J.: Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley/IEEE Press (2006)
Google Scholar
Wang, B., Plumbley, M.D.: Investigating Single-Channel Audio Source Separation Methods based on Non-negative Matrix Factorization. In: Nandi, Zhu (eds.) Proceedings of the ICA Research Network International Workshop, pp. 17–20 (2006)
Google Scholar
Wang, B., Plumbley, M.D.: Single Channel Audio Separation by Non-negative Matrix Factorization. In: Digital Music Research Network One-day Workshop (DMRN+1), London (2006)
Google Scholar
Wang, W., Luo, Y., Chambers, J.A., Sanei, S.: Note Onset Detection via Non-negative Factorization of Magnitude Spectrum. EURASIP Journal on Advances in Signal Processing, Article ID 231367, 15 pages (June 2008); doi:10.1155/2008/231367
Google Scholar
Wang, W., Cichocki, A., Chambers, J.A.: A Multiplicative Algorithm for Convolutive Non-negative Matrix Factorization Based on Squared Euclidean Distance. IEEE Transactions on Signal Processing 57, 2858–2864 (2009)
Article MathSciNet Google Scholar
Webb, A.: Statistical Pattern Recognition, 2nd edn. Wiley, New York (2005)
MATH Google Scholar
Woodruff, J., Pardo, B.: Using Pitch, Amplitude Modulation and Spatial Cues for Separation of Harmonic Instruments from Stereo Music Recordings. EURASIP J. Adv. Signal Process. (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, GU2 7XH, UK
Wenwu Wang & Hafiz Mustafa

Authors

Wenwu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hafiz Mustafa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CNRS - LMA, 31 Chemin Joseph Aiguier, 13402, Marseille Cedex 20, France
Sølvi Ystad
CNRS-INCM, 31 Chemin Joseph Aiguier, 13402, Marseille Cedex 20, France
Mitsuko Aramaki
CNRS-LMA, 31 Chemin Joseph Aiguier, 13402, Marseille Cedex 20, France
Richard Kronland-Martinet
Aalborg University Esbjerg, Niels Bohr Vej 8, 6700, Esbjerg, Denmark
Kristoffer Jensen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, W., Mustafa, H. (2011). Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification. In: Ystad, S., Aramaki, M., Kronland-Martinet, R., Jensen, K. (eds) Exploring Music Contents. CMMR 2010. Lecture Notes in Computer Science, vol 6684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23126-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-23126-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23125-4
Online ISBN: 978-3-642-23126-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics