A Singing Voice/Music Separation Method Based on Non-negative Tensor Factorization and Repeat Pattern Extraction

  • Yong ZhangEmail author
  • Xiaohong Ma
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9377)


In this paper, a novel singing voice/music separation method is proposed based on the non-negative tensor factorization (NTF) and repeat pattern extraction technique (REPET) to separate the mixture into an audio signal and a background music. Our system consists of three stages. Firstly, we use the NTF to decompose the mixture into different components, and similarity detection is applied to distinguish the components from each other, in order to classify the components into two classes as the voice including voice/periodic music and the block music/voice; next we utilize the REPET to extract the background music one step further for the two classes, and the final background music is estimated by adding the two backgrounds together, the left is added together as the singing voice; finally the music spectrum and the voice spectrum are filtered by harmonic filter and percussive filter respectively. To improve the performance further, wiener filter is used to separate the voice and music. Our method can improve the separation performance compared with the other state-of-the-art methods on the MIR-1K dataset.


NTF REPET Source Separation Median Filter Unsupervised Signal Processing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tengtrairat, N., Gao, B., Woo, W.L.: Single-channel blind separation using Pseudo-stereo mixture and complex 2-D histogram. IEEE Transactions on Neural Networks and Learning Systems 24(11), 1722–1735 (2013)CrossRefGoogle Scholar
  2. 2.
    Diamantaras, K.I., Papadimitriou, T.: Blind separation of three binary sources from one nonlinear mixture. Machine Learning for Signal Processing (2010)Google Scholar
  3. 3.
    Diamantaras, K.I., Papadimitriou, T., Vranou, G.: Blind separation of multiple binary sources from one nonlinear Mixture. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2011)Google Scholar
  4. 4.
    Diamantaras, K.I., Papadimitriou, T.: Separating two binary sources from a single nonlinear mixture. In: IEEE International Conference on Acoustics Speech and Signal Processing (2010)Google Scholar
  5. 5.
    Diamantaras, K.I., Vranou, G., Papadimitriou, T.: Multi-Input Single-Output Nonlinear Blind Separation of Binary Sources. IEEE Transactions on Signal Processing 61(11), 2866–2873 (2013)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Song, J., Ma, X., Zhang, Y.: Binary source separation layer by layer for one sensor. In: IEEE International Conference on Intelligent Control and Information Processing (2014)Google Scholar
  7. 7.
    Wang, C.K., Lyu, R.Y., Chiang, Y.C.: An automatic singing transcription system with multilingual singing lyric recognizer and robust melody tracker. In: European Conference on Speech Communication and Technology (2003)Google Scholar
  8. 8.
    Fujihara, H., Goto, M., Ogata, J., Okuno, H.G.: Lyric Synchronizer: Automatic synchronization system between musical audio signals and lyrics. IEEE Journal of Selected Topics in Signal Processing 5(6), 1252–1261 (2011)CrossRefGoogle Scholar
  9. 9.
    Zhang, T.: System and method for automatic singer identification. Research Disclosure (2003)Google Scholar
  10. 10.
    Rafii, Z., Pardo, B.: Repeating Pattern Extraction Technique (REPET): A simple method for music/voice separation. IEEE Transactions on Audio, Speech, and Language Processing 21(1), 71–82 (2013)CrossRefGoogle Scholar
  11. 11.
    Rafii, Z., Pardo, B.: Music/Voice Separation Using the Similarity Matrix. In: ISMIR (2012)Google Scholar
  12. 12.
    Liutkus, A., Fitzgerald, D., Rafii, Z.: Kernel additive models for source separation. IEEE Transactions on Signal Processing 21(21), 4298–4310 (2014)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Li, Y., Wang, D.L.: Separation of singing voice from music accompaniment for monaural recordings. IEEE Transactions on Audio, Speech, and Language Processing 15(4), 1475–1487 (2007)CrossRefGoogle Scholar
  14. 14.
    Hsu, C.-L., Jang, J.-S.R.: On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset. IEEE Transactions on Audio, Speech, and Language Processing 18(2), 310–319 (2010)CrossRefGoogle Scholar
  15. 15.
    Durrieu, J., David, B., Richard, G.: A musically motivated mid-level representation for pitch estimation and musical audio source separation. IEEE Journal of Selected Topics in Signal Processing 5(6), 1180–1191 (2011)CrossRefGoogle Scholar
  16. 16.
    Huang, P.-S., Chen, S.D., Smaragdis, P., Hasegawa-Johnson, M.: Singing-voice separation from monaural recordings using robust principal component analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2012)Google Scholar
  17. 17.
    Fitzgerald, D.: Harmonic/percussive separation using median filtering. In: 13th International Conference on Digital Audio Effects (2010)Google Scholar
  18. 18.
    Rafii, Z., Germain, F., Sun, D.L.: Combining Modeling of Singing Voice and Background Music For Automatic Separation of Musical Mixtures. In: ISMIR (2013)Google Scholar
  19. 19.
    Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing 14(4), 1462–1469 (2006)CrossRefGoogle Scholar
  20. 20.

Copyright information

© Springer International Publishing Switzerland 2015

<SimplePara><Emphasis Type="Bold">Open Access</Emphasis> This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (, which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. </SimplePara> <SimplePara>The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.</SimplePara>

Authors and Affiliations

  1. 1.School of Information and Communication EngineeringDalian University of TechnologyDalianChina

Personalised recommendations