Skip to main content
Log in

Speaker discrimination based on fuzzy fusion and feature reduction techniques

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, we propose a research work on speaker discrimination using a multi-classifier fusion with focus on feature reduction effects. Speaker discrimination consists in the automatic distinction between two speakers using the vocal characteristics of their speeches. A number of features are extracted using Mel Frequency Spectral Coefficients and then reduced using Relative Speaker Characteristic (RSC) along with the Principal Components Analysis (PCA). Several classification methods are implemented to ensure the discrimination task. Since different classifiers are employed, two fusion algorithms at the decision level, referred to as Weighted Fusion and Fuzzy Fusion, are proposed to boost the classification performances. These algorithms are based on the weighting of the different classifiers outputs. Furthermore, the effects of speaker gender and feature reduction on the speaker discrimination task have been examined too. The evaluation of our approaches was conducted on a subset of Hub-4 Broadcast-News. The experimental results have shown that the speaker discrimination accuracy is improved by 5–15% using the (RSC–PCA) feature reduction. In addition, the proposed fusion methods recorded an improvement of about 10% compared to the individual scores of the classifiers. Finally, we noticed that the gender has an important impact on the discrimination performances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Benzeghiba, M. et al. (2007). Automatic speech recognition and speech variability: A review. Speech Commununication, 49(10), 763–786.

    Article  Google Scholar 

  • Bimbot, F. (2009). Automatic speaker recognition. Language and speech processing, pp. 321–354.

  • Bimbot, F., Magrin-Chagnolleau, I., & Mathan, L. (1995). Second-order statistical measures for text-independent speaker identification. Speech Communication, 17(1–2), 177–192.

    Article  Google Scholar 

  • Bulgakova, E. et al. (2015). Speaker verification using spectral and durational segmental characteristics. In: International conference on speech and computer. New York: Springer, pp. 397–404.

  • Burget, L. et al. (2011). Discriminatively trained probabilistic linear discriminant analysis for speaker verification. In: Proceedings of the 36th international conference on acoustics, speech and signal processing, Prague, Czech Republic, May 22–27.

  • Corinna, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.

    MATH  Google Scholar 

  • Dhonde, S. B., & Jagade, S. M. (2015). Feature extraction techniques in speaker recognition: A review. International Journal on Recent Technologies in Mechanical and Electrical Engineering (IJRMEE), 2(no 5), 104–106.

    Google Scholar 

  • El-Samie, F. E. A. (2011). Information security for automatic speaker identification (1st ed.). New York: Springer.

    Book  Google Scholar 

  • Ghodsi, A. (2006). Dimensionality reduction a short tutorial, Department of Statistics and Actuarial Science, University of Waterloo, Ontario, Canada, pp. 37–38‏.

  • Guandong, X., Zong, Y., & Zhenglu, Y. (2013). Applied data mining. Boston: CRC Press.

    Google Scholar 

  • Jebara, T. (2012). Machine learning: Discriminative and generative (Vol. 755). New York: Springer.

    MATH  Google Scholar 

  • Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.

    Article  Google Scholar 

  • Lee, C. H., Soong, F. K., & Paliwal, K. (Eds.) (2012). Automatic speech and speaker recognition: Advanced topics, (Vol. 355). New York: Springer.

    Google Scholar 

  • Lei, Y., & Scheffer, N., et al. (2014). A novel scheme for speaker recognition using a phonetically-aware deep neural network. In Proceedings of the 39th international conference on acoustics, speech and signal processing, Florence, Italy, May 4–9.

  • Li, W. et al. (2015). Sparsity analysis and compensation for i-vector based speaker verification. In Proceeding of the international conference on speech and computer. New York: Springer.

  • Man-Wai, M., & Hon-Bill, Y. (2014). A study of voice activity detection techniques for NIST speaker recognition evaluations. Computer Speech & Language, 28(1), 295–313.

    Article  Google Scholar 

  • Meignier, S. (2002). Indexation en locuteurs de documents sonores: Segmentation d’un document et Appariement d’une collection. Ph.D. thesis, Univ. d’Avignon et des Pays de Vaucluse, France.

  • Ming, L. et al. (2016). Speaker verification based on the fusion of speech acoustics and inverted articulatory signals. Computer Speech & Language, 36, 196–211.

    Article  Google Scholar 

  • Nakagawa, S., Longbiao, W., & Ohtsuka, S. (2012). Speaker identification and verification by combining MFCC and phase information. IEEE Transactions on Audio, Speech, and Language Processing, 20(4), 1085–1095.

    Article  Google Scholar 

  • Nakagawa, S., Wang, L., & Ohtsuka, S. (2012). Speaker identification and verification by combining MFCC and phase information. IEEE Transactions on Audio, Speech, and Language Processing, 20(4), 1085–1095.

    Article  Google Scholar 

  • Ouamour, S., Guerti, M., & Sayoud, H. (2008). A new relativistic vision in speaker discrimination. Canadian Acoustics, 36(4), 24–35.

    Google Scholar 

  • Ouamour, S., Sayoud, H., & Guerti, M. (2009). Optimal spectral resolution in speaker authentication application in noisy environment and telephony. International Journal of Mobile Computing and Multimedia Communications (IJMCMC), 1(2), 36–47.

    Article  Google Scholar 

  • Pribil, J., Pribilova, A., Matousek, J. (2016). GMM-based speaker gender and age classification after voice conversion. In First international workshop on sensing, processing and learning for intelligent machines (SPLINE), Denmark.

  • Richardson, F., Reynolds, D. A., & Dehak, N. (2015). Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22(10), 1671–1675.

    Article  Google Scholar 

  • Sayoud, H. (2003). Automatic speaker recognition using neural approaches, PhD thesis, USTHB University, Algiers, Algeria.

  • Shlens, J. (2014). A tutorial on principal component analysis, arXiv preprint arXiv.1404.1100.

  • Venables, W. N., & Ripley, B. D. (2013). Modern applied statistics with S-PLUS. Berlin: Springer.

    MATH  Google Scholar 

  • Wu, D., & Jie, C. (2015). Multimodel biometrics fusion based on FAR and FRR using triangular norm. International Journal of Computational Intelligence Systems, 8(4), 779–786.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to H. Sayoud.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khennouf, S., Sayoud, H. Speaker discrimination based on fuzzy fusion and feature reduction techniques. Int J Speech Technol 21, 51–63 (2018). https://doi.org/10.1007/s10772-017-9484-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-017-9484-3

Keywords

Navigation