Speaker recognition: an enhanced approach to identify singer voice using neural network

Abstract

Communication with people is the most common phenomena of human. Mostly they can recognize the voice of their known one. Even the same thing is seen while recognizing a voice in the music. If the voice of the artist is known, then the recognition will be the easier one, but if the voice is not very familiar to the listener, it will be a tough job to identify the voice within music. Thus, singer recognition is one of the demanding areas of research by the implication of eligible algorithms in the domain of audio signal processing. There are different approaches that can be made for fulfilling the objective by attaining the goal of truncating the voice frequency range from the audio signal or it may be the detection of the peaks of the voice within that music. As music is polyphonic, so, the essential analysis is required to check for the frequency components and thereby detecting the peaks of the voice signal which can be an easier approach for such detection. In this paper, some songs are taken into consideration to create the training data and through which the neural network is trained. With that training data, a separate set of data is prepared which is used for testing. Apart from the application of the supervised learning procedure, with the implication of hyper parameter tuning, the efficiency is observed for the detection of the new and unknown signer to be detected. Essentially, the neural network works in this field fairly with about 99.29% accuracy and thus the detection is made with a satisfactory level.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

References

  1. Bayle, Y., Maršík, L., Rusek, M., Robine, M., Hanna, P., Slaninová, K., Martinovic, J., & Pokorný, J. (2017). Kara1k: A karaoke dataset for cover song identification and singing voice analysis. In Proceedings of the IEEE International Symposium on Multimedia (ISM). https://doi.org/10.1109/ISM.2017.32.

  2. Bogdanov, D., Porter, A., Herrera, P., & Serra, X. (2016). Cross-collection evaluation for music classification tasks. In Proceedings of the 17th Int. Soc. Music Inform. Retrieval Conf (pp. 379–385).

  3. Eronen, A., & Klapuri, A. (2000). 'Musical instrument recognition using cepstral coefficients and temporal features.', In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2000. ICASSP'OO, vol. 2 (pp. 11753–11756).

  4. Hu, Y., & Liu, G. (2015). Separation of singing voice using nonnegative matrix partial co-factorization for singer identification. IEEE Transactions on Audio, Speech, and Language Processing, 23(4), 643–653.

    Article  Google Scholar 

  5. Kooshan, S., Fard, H., & Toroghi, R. M. (2019). Singer identification by vocal parts detection and singer classification using LSTM neural networks. In Proceedings of the 4th International Conference on Pattern Recognition and Image Analysis (IPRIA 2019) (pp. 246–250).

  6. Kroher, N., Díaz-Báñez, J. M., Mora, J., & Gómez, E. (2015). Corpus COFLA: A research corpus for the computational study of Flamenco music. Journal on Computing and Cultural Heritage, 9(2), 1–24.

    Article  Google Scholar 

  7. Masood, S., Nayal, J. S., & Jain, R. K. (2016). Singer identification in indian hindi songs using MFCC and spectral features. In Proceedings of the 1st IEEE International Conference on Power Electronics. Intelligent Control and Energy Systems (ICPEICES-2016) (pp. 1–5). https://doi.org/10.1109/icpeices.2016.7853641.

  8. Murthy, Y. V. S., Jeshventh, T. K. R., Zoeb, M., Saumyadip, M., & Shashidhar, G. K. (2018). Singer identification from smaller snippets of audio clips using acoustic features and DNNs. In Proceedings of the Eleventh International Conference on Contemporary Computing (IC3). https://doi.org/10.1109/IC3.2018.8530602.

  9. Park, H., Nam, S., Choi, E. M., & Choi, D. (2018). Hidden singer: Distinguishing imitation singers based on training with only the original song. IEICE Transactions on Information and Systems. https://doi.org/10.1587/transinf.2018EDP7140.

    Article  Google Scholar 

  10. Patil, H., Radadia, P., & Basu, T. (2012). Combining evidences from mel cepstral features and cepstral mean subtracted features for singer identification. In Proceedings of the International Conference on Asian Language Processing (pp. 145–148). https://doi.org/10.1109/IALP.2012.33.

  11. Srinivasa Murthy, Y. V., & Koolagudi, S. G. (2015). Classification of vocal and non-vocal regions from audio songs using spectral features and pitch variations. In Proceedings of the Canadian Conference on Electrical and Computer Engineering (pp. 1271–1276). https://doi.org/10.1109/CCECE.2015.7129461

  12. Zhu, B., Li, W., Li, R., & Xue, X. (2013). Multi-stage non-negative matrix factorization for monaural singing voice separation. IEEE Transactions on Audio, Speech, and Language Processing, 21, 2096–2107.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Sharmila Biswas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Biswas, S., Solanki, S.S. Speaker recognition: an enhanced approach to identify singer voice using neural network. Int J Speech Technol 24, 9–21 (2021). https://doi.org/10.1007/s10772-020-09698-8

Download citation

Keywords

  • Multilayer perceptron
  • Neural network
  • Singer identification
  • Single layer perceptron
  • Spectral feature