Skip to main content

A Singing Voice/Music Separation Method Based on Non-negative Tensor Factorization and Repeat Pattern Extraction

  • Conference paper
  • First Online:
Advances in Neural Networks – ISNN 2015 (ISNN 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9377))

Included in the following conference series:

Abstract

In this paper, a novel singing voice/music separation method is proposed based on the non-negative tensor factorization (NTF) and repeat pattern extraction technique (REPET) to separate the mixture into an audio signal and a background music. Our system consists of three stages. Firstly, we use the NTF to decompose the mixture into different components, and similarity detection is applied to distinguish the components from each other, in order to classify the components into two classes as the voice including voice/periodic music and the block music/voice; next we utilize the REPET to extract the background music one step further for the two classes, and the final background music is estimated by adding the two backgrounds together, the left is added together as the singing voice; finally the music spectrum and the voice spectrum are filtered by harmonic filter and percussive filter respectively. To improve the performance further, wiener filter is used to separate the voice and music. Our method can improve the separation performance compared with the other state-of-the-art methods on the MIR-1K dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tengtrairat, N., Gao, B., Woo, W.L.: Single-channel blind separation using Pseudo-stereo mixture and complex 2-D histogram. IEEE Transactions on Neural Networks and Learning Systems 24(11), 1722–1735 (2013)

    Article  Google Scholar 

  2. Diamantaras, K.I., Papadimitriou, T.: Blind separation of three binary sources from one nonlinear mixture. Machine Learning for Signal Processing (2010)

    Google Scholar 

  3. Diamantaras, K.I., Papadimitriou, T., Vranou, G.: Blind separation of multiple binary sources from one nonlinear Mixture. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2011)

    Google Scholar 

  4. Diamantaras, K.I., Papadimitriou, T.: Separating two binary sources from a single nonlinear mixture. In: IEEE International Conference on Acoustics Speech and Signal Processing (2010)

    Google Scholar 

  5. Diamantaras, K.I., Vranou, G., Papadimitriou, T.: Multi-Input Single-Output Nonlinear Blind Separation of Binary Sources. IEEE Transactions on Signal Processing 61(11), 2866–2873 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  6. Song, J., Ma, X., Zhang, Y.: Binary source separation layer by layer for one sensor. In: IEEE International Conference on Intelligent Control and Information Processing (2014)

    Google Scholar 

  7. Wang, C.K., Lyu, R.Y., Chiang, Y.C.: An automatic singing transcription system with multilingual singing lyric recognizer and robust melody tracker. In: European Conference on Speech Communication and Technology (2003)

    Google Scholar 

  8. Fujihara, H., Goto, M., Ogata, J., Okuno, H.G.: Lyric Synchronizer: Automatic synchronization system between musical audio signals and lyrics. IEEE Journal of Selected Topics in Signal Processing 5(6), 1252–1261 (2011)

    Article  Google Scholar 

  9. Zhang, T.: System and method for automatic singer identification. Research Disclosure (2003)

    Google Scholar 

  10. Rafii, Z., Pardo, B.: Repeating Pattern Extraction Technique (REPET): A simple method for music/voice separation. IEEE Transactions on Audio, Speech, and Language Processing 21(1), 71–82 (2013)

    Article  Google Scholar 

  11. Rafii, Z., Pardo, B.: Music/Voice Separation Using the Similarity Matrix. In: ISMIR (2012)

    Google Scholar 

  12. Liutkus, A., Fitzgerald, D., Rafii, Z.: Kernel additive models for source separation. IEEE Transactions on Signal Processing 21(21), 4298–4310 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  13. Li, Y., Wang, D.L.: Separation of singing voice from music accompaniment for monaural recordings. IEEE Transactions on Audio, Speech, and Language Processing 15(4), 1475–1487 (2007)

    Article  Google Scholar 

  14. Hsu, C.-L., Jang, J.-S.R.: On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset. IEEE Transactions on Audio, Speech, and Language Processing 18(2), 310–319 (2010)

    Article  Google Scholar 

  15. Durrieu, J., David, B., Richard, G.: A musically motivated mid-level representation for pitch estimation and musical audio source separation. IEEE Journal of Selected Topics in Signal Processing 5(6), 1180–1191 (2011)

    Article  Google Scholar 

  16. Huang, P.-S., Chen, S.D., Smaragdis, P., Hasegawa-Johnson, M.: Singing-voice separation from monaural recordings using robust principal component analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2012)

    Google Scholar 

  17. Fitzgerald, D.: Harmonic/percussive separation using median filtering. In: 13th International Conference on Digital Audio Effects (2010)

    Google Scholar 

  18. Rafii, Z., Germain, F., Sun, D.L.: Combining Modeling of Singing Voice and Background Music For Automatic Separation of Musical Mixtures. In: ISMIR (2013)

    Google Scholar 

  19. Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing 14(4), 1462–1469 (2006)

    Article  Google Scholar 

  20. BSS Eval toolbox, http://bass-db.gforge.inria.fr/bss_eval/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhang, Y., Ma, X. (2015). A Singing Voice/Music Separation Method Based on Non-negative Tensor Factorization and Repeat Pattern Extraction. In: Hu, X., Xia, Y., Zhang, Y., Zhao, D. (eds) Advances in Neural Networks – ISNN 2015. ISNN 2015. Lecture Notes in Computer Science(), vol 9377. Springer, Cham. https://doi.org/10.1007/978-3-319-25393-0_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25393-0_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25392-3

  • Online ISBN: 978-3-319-25393-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics