Communication with people is the most common phenomena of human. Mostly they can recognize the voice of their known one. Even the same thing is seen while recognizing a voice in the music. If the voice of the artist is known, then the recognition will be the easier one, but if the voice is not very familiar to the listener, it will be a tough job to identify the voice within music. Thus, singer recognition is one of the demanding areas of research by the implication of eligible algorithms in the domain of audio signal processing. There are different approaches that can be made for fulfilling the objective by attaining the goal of truncating the voice frequency range from the audio signal or it may be the detection of the peaks of the voice within that music. As music is polyphonic, so, the essential analysis is required to check for the frequency components and thereby detecting the peaks of the voice signal which can be an easier approach for such detection. In this paper, some songs are taken into consideration to create the training data and through which the neural network is trained. With that training data, a separate set of data is prepared which is used for testing. Apart from the application of the supervised learning procedure, with the implication of hyper parameter tuning, the efficiency is observed for the detection of the new and unknown signer to be detected. Essentially, the neural network works in this field fairly with about 99.29% accuracy and thus the detection is made with a satisfactory level.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Bayle, Y., Maršík, L., Rusek, M., Robine, M., Hanna, P., Slaninová, K., Martinovic, J., & Pokorný, J. (2017). Kara1k: A karaoke dataset for cover song identification and singing voice analysis. In Proceedings of the IEEE International Symposium on Multimedia (ISM). https://doi.org/10.1109/ISM.2017.32.
Bogdanov, D., Porter, A., Herrera, P., & Serra, X. (2016). Cross-collection evaluation for music classification tasks. In Proceedings of the 17th Int. Soc. Music Inform. Retrieval Conf (pp. 379–385).
Eronen, A., & Klapuri, A. (2000). 'Musical instrument recognition using cepstral coefficients and temporal features.', In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2000. ICASSP'OO, vol. 2 (pp. 11753–11756).
Hu, Y., & Liu, G. (2015). Separation of singing voice using nonnegative matrix partial co-factorization for singer identification. IEEE Transactions on Audio, Speech, and Language Processing, 23(4), 643–653.
Kooshan, S., Fard, H., & Toroghi, R. M. (2019). Singer identification by vocal parts detection and singer classification using LSTM neural networks. In Proceedings of the 4th International Conference on Pattern Recognition and Image Analysis (IPRIA 2019) (pp. 246–250).
Kroher, N., Díaz-Báñez, J. M., Mora, J., & Gómez, E. (2015). Corpus COFLA: A research corpus for the computational study of Flamenco music. Journal on Computing and Cultural Heritage, 9(2), 1–24.
Masood, S., Nayal, J. S., & Jain, R. K. (2016). Singer identification in indian hindi songs using MFCC and spectral features. In Proceedings of the 1st IEEE International Conference on Power Electronics. Intelligent Control and Energy Systems (ICPEICES-2016) (pp. 1–5). https://doi.org/10.1109/icpeices.2016.7853641.
Murthy, Y. V. S., Jeshventh, T. K. R., Zoeb, M., Saumyadip, M., & Shashidhar, G. K. (2018). Singer identification from smaller snippets of audio clips using acoustic features and DNNs. In Proceedings of the Eleventh International Conference on Contemporary Computing (IC3). https://doi.org/10.1109/IC3.2018.8530602.
Park, H., Nam, S., Choi, E. M., & Choi, D. (2018). Hidden singer: Distinguishing imitation singers based on training with only the original song. IEICE Transactions on Information and Systems. https://doi.org/10.1587/transinf.2018EDP7140.
Patil, H., Radadia, P., & Basu, T. (2012). Combining evidences from mel cepstral features and cepstral mean subtracted features for singer identification. In Proceedings of the International Conference on Asian Language Processing (pp. 145–148). https://doi.org/10.1109/IALP.2012.33.
Srinivasa Murthy, Y. V., & Koolagudi, S. G. (2015). Classification of vocal and non-vocal regions from audio songs using spectral features and pitch variations. In Proceedings of the Canadian Conference on Electrical and Computer Engineering (pp. 1271–1276). https://doi.org/10.1109/CCECE.2015.7129461
Zhu, B., Li, W., Li, R., & Xue, X. (2013). Multi-stage non-negative matrix factorization for monaural singing voice separation. IEEE Transactions on Audio, Speech, and Language Processing, 21, 2096–2107.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Biswas, S., Solanki, S.S. Speaker recognition: an enhanced approach to identify singer voice using neural network. Int J Speech Technol 24, 9–21 (2021). https://doi.org/10.1007/s10772-020-09698-8
- Multilayer perceptron
- Neural network
- Singer identification
- Single layer perceptron
- Spectral feature