Abstract
In this paper, a novel singing voice/music separation method is proposed based on the non-negative tensor factorization (NTF) and repeat pattern extraction technique (REPET) to separate the mixture into an audio signal and a background music. Our system consists of three stages. Firstly, we use the NTF to decompose the mixture into different components, and similarity detection is applied to distinguish the components from each other, in order to classify the components into two classes as the voice including voice/periodic music and the block music/voice; next we utilize the REPET to extract the background music one step further for the two classes, and the final background music is estimated by adding the two backgrounds together, the left is added together as the singing voice; finally the music spectrum and the voice spectrum are filtered by harmonic filter and percussive filter respectively. To improve the performance further, wiener filter is used to separate the voice and music. Our method can improve the separation performance compared with the other state-of-the-art methods on the MIR-1K dataset.
Preview
Unable to display preview. Download preview PDF.
References
Tengtrairat, N., Gao, B., Woo, W.L.: Single-channel blind separation using Pseudo-stereo mixture and complex 2-D histogram. IEEE Transactions on Neural Networks and Learning Systems 24(11), 1722–1735 (2013)
Diamantaras, K.I., Papadimitriou, T.: Blind separation of three binary sources from one nonlinear mixture. Machine Learning for Signal Processing (2010)
Diamantaras, K.I., Papadimitriou, T., Vranou, G.: Blind separation of multiple binary sources from one nonlinear Mixture. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2011)
Diamantaras, K.I., Papadimitriou, T.: Separating two binary sources from a single nonlinear mixture. In: IEEE International Conference on Acoustics Speech and Signal Processing (2010)
Diamantaras, K.I., Vranou, G., Papadimitriou, T.: Multi-Input Single-Output Nonlinear Blind Separation of Binary Sources. IEEE Transactions on Signal Processing 61(11), 2866–2873 (2013)
Song, J., Ma, X., Zhang, Y.: Binary source separation layer by layer for one sensor. In: IEEE International Conference on Intelligent Control and Information Processing (2014)
Wang, C.K., Lyu, R.Y., Chiang, Y.C.: An automatic singing transcription system with multilingual singing lyric recognizer and robust melody tracker. In: European Conference on Speech Communication and Technology (2003)
Fujihara, H., Goto, M., Ogata, J., Okuno, H.G.: Lyric Synchronizer: Automatic synchronization system between musical audio signals and lyrics. IEEE Journal of Selected Topics in Signal Processing 5(6), 1252–1261 (2011)
Zhang, T.: System and method for automatic singer identification. Research Disclosure (2003)
Rafii, Z., Pardo, B.: Repeating Pattern Extraction Technique (REPET): A simple method for music/voice separation. IEEE Transactions on Audio, Speech, and Language Processing 21(1), 71–82 (2013)
Rafii, Z., Pardo, B.: Music/Voice Separation Using the Similarity Matrix. In: ISMIR (2012)
Liutkus, A., Fitzgerald, D., Rafii, Z.: Kernel additive models for source separation. IEEE Transactions on Signal Processing 21(21), 4298–4310 (2014)
Li, Y., Wang, D.L.: Separation of singing voice from music accompaniment for monaural recordings. IEEE Transactions on Audio, Speech, and Language Processing 15(4), 1475–1487 (2007)
Hsu, C.-L., Jang, J.-S.R.: On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset. IEEE Transactions on Audio, Speech, and Language Processing 18(2), 310–319 (2010)
Durrieu, J., David, B., Richard, G.: A musically motivated mid-level representation for pitch estimation and musical audio source separation. IEEE Journal of Selected Topics in Signal Processing 5(6), 1180–1191 (2011)
Huang, P.-S., Chen, S.D., Smaragdis, P., Hasegawa-Johnson, M.: Singing-voice separation from monaural recordings using robust principal component analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2012)
Fitzgerald, D.: Harmonic/percussive separation using median filtering. In: 13th International Conference on Digital Audio Effects (2010)
Rafii, Z., Germain, F., Sun, D.L.: Combining Modeling of Singing Voice and Background Music For Automatic Separation of Musical Mixtures. In: ISMIR (2013)
Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing 14(4), 1462–1469 (2006)
BSS Eval toolbox, http://bass-db.gforge.inria.fr/bss_eval/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, Y., Ma, X. (2015). A Singing Voice/Music Separation Method Based on Non-negative Tensor Factorization and Repeat Pattern Extraction. In: Hu, X., Xia, Y., Zhang, Y., Zhao, D. (eds) Advances in Neural Networks – ISNN 2015. ISNN 2015. Lecture Notes in Computer Science(), vol 9377. Springer, Cham. https://doi.org/10.1007/978-3-319-25393-0_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-25393-0_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25392-3
Online ISBN: 978-3-319-25393-0
eBook Packages: Computer ScienceComputer Science (R0)