A Singing Voice/Music Separation Method Based on Non-negative Tensor Factorization and Repeat Pattern Extraction

Zhang, Yong; Ma, Xiaohong

doi:10.1007/978-3-319-25393-0_32

Yong Zhang²³ &
Xiaohong Ma²³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9377))

Included in the following conference series:

International Symposium on Neural Networks

2374 Accesses
1 Citations

Abstract

In this paper, a novel singing voice/music separation method is proposed based on the non-negative tensor factorization (NTF) and repeat pattern extraction technique (REPET) to separate the mixture into an audio signal and a background music. Our system consists of three stages. Firstly, we use the NTF to decompose the mixture into different components, and similarity detection is applied to distinguish the components from each other, in order to classify the components into two classes as the voice including voice/periodic music and the block music/voice; next we utilize the REPET to extract the background music one step further for the two classes, and the final background music is estimated by adding the two backgrounds together, the left is added together as the singing voice; finally the music spectrum and the voice spectrum are filtered by harmonic filter and percussive filter respectively. To improve the performance further, wiener filter is used to separate the voice and music. Our method can improve the separation performance compared with the other state-of-the-art methods on the MIR-1K dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Tengtrairat, N., Gao, B., Woo, W.L.: Single-channel blind separation using Pseudo-stereo mixture and complex 2-D histogram. IEEE Transactions on Neural Networks and Learning Systems 24(11), 1722–1735 (2013)
Article Google Scholar
Diamantaras, K.I., Papadimitriou, T.: Blind separation of three binary sources from one nonlinear mixture. Machine Learning for Signal Processing (2010)
Google Scholar
Diamantaras, K.I., Papadimitriou, T., Vranou, G.: Blind separation of multiple binary sources from one nonlinear Mixture. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2011)
Google Scholar
Diamantaras, K.I., Papadimitriou, T.: Separating two binary sources from a single nonlinear mixture. In: IEEE International Conference on Acoustics Speech and Signal Processing (2010)
Google Scholar
Diamantaras, K.I., Vranou, G., Papadimitriou, T.: Multi-Input Single-Output Nonlinear Blind Separation of Binary Sources. IEEE Transactions on Signal Processing 61(11), 2866–2873 (2013)
Article MathSciNet MATH Google Scholar
Song, J., Ma, X., Zhang, Y.: Binary source separation layer by layer for one sensor. In: IEEE International Conference on Intelligent Control and Information Processing (2014)
Google Scholar
Wang, C.K., Lyu, R.Y., Chiang, Y.C.: An automatic singing transcription system with multilingual singing lyric recognizer and robust melody tracker. In: European Conference on Speech Communication and Technology (2003)
Google Scholar
Fujihara, H., Goto, M., Ogata, J., Okuno, H.G.: Lyric Synchronizer: Automatic synchronization system between musical audio signals and lyrics. IEEE Journal of Selected Topics in Signal Processing 5(6), 1252–1261 (2011)
Article Google Scholar
Zhang, T.: System and method for automatic singer identification. Research Disclosure (2003)
Google Scholar
Rafii, Z., Pardo, B.: Repeating Pattern Extraction Technique (REPET): A simple method for music/voice separation. IEEE Transactions on Audio, Speech, and Language Processing 21(1), 71–82 (2013)
Article Google Scholar
Rafii, Z., Pardo, B.: Music/Voice Separation Using the Similarity Matrix. In: ISMIR (2012)
Google Scholar
Liutkus, A., Fitzgerald, D., Rafii, Z.: Kernel additive models for source separation. IEEE Transactions on Signal Processing 21(21), 4298–4310 (2014)
Article MathSciNet MATH Google Scholar
Li, Y., Wang, D.L.: Separation of singing voice from music accompaniment for monaural recordings. IEEE Transactions on Audio, Speech, and Language Processing 15(4), 1475–1487 (2007)
Article Google Scholar
Hsu, C.-L., Jang, J.-S.R.: On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset. IEEE Transactions on Audio, Speech, and Language Processing 18(2), 310–319 (2010)
Article Google Scholar
Durrieu, J., David, B., Richard, G.: A musically motivated mid-level representation for pitch estimation and musical audio source separation. IEEE Journal of Selected Topics in Signal Processing 5(6), 1180–1191 (2011)
Article Google Scholar
Huang, P.-S., Chen, S.D., Smaragdis, P., Hasegawa-Johnson, M.: Singing-voice separation from monaural recordings using robust principal component analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2012)
Google Scholar
Fitzgerald, D.: Harmonic/percussive separation using median filtering. In: 13th International Conference on Digital Audio Effects (2010)
Google Scholar
Rafii, Z., Germain, F., Sun, D.L.: Combining Modeling of Singing Voice and Background Music For Automatic Separation of Musical Mixtures. In: ISMIR (2013)
Google Scholar
Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing 14(4), 1462–1469 (2006)
Article Google Scholar
BSS Eval toolbox, http://bass-db.gforge.inria.fr/bss_eval/

Download references

Author information

Authors and Affiliations

School of Information and Communication Engineering, Dalian University of Technology, Dalian, China
Yong Zhang & Xiaohong Ma

Authors

Yong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohong Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Zhang .

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, China
Xiaolin Hu
Fuzhou University, Fuzhou, China
Yousheng Xia
School of Information Science and Technology, Sun Yat-sen University, Guangzhou, China
Yunong Zhang
Chinese Academy of Sciences, Institute of Automation, Beijing, China
Dongbin Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Ma, X. (2015). A Singing Voice/Music Separation Method Based on Non-negative Tensor Factorization and Repeat Pattern Extraction. In: Hu, X., Xia, Y., Zhang, Y., Zhao, D. (eds) Advances in Neural Networks – ISNN 2015. ISNN 2015. Lecture Notes in Computer Science(), vol 9377. Springer, Cham. https://doi.org/10.1007/978-3-319-25393-0_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-25393-0_32
Published: 19 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25392-3
Online ISBN: 978-3-319-25393-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics