Skip to main content
Log in

Enhancing GMM speaker identification by incorporating SVM speaker verification for intelligent web-based speech applications

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Speech applications, which operate a system by voice commands, facilitate web access for disabled and visually impaired users. Human-computer interactions, such as speaking and listening to web applications, provide options for developing a multimodal interaction tool in the accessible design of an intelligent web. Speaker identification and verification are essential functionalities for intelligent web programs with speech applications. This paper proposes an enhanced Gaussian mixture model (GMM) method by incorporating the information derived from the support vector machine (SVM), called EGMM-SVM, for web-based applications with speaker recognition. The EGMM-SVM improves the accuracy of the estimated likelihood scores between the speech frame and the GMM. In EGMM-SVM, SVM plays a crucial role in transmitting the quality information of the utterances from a test speaker, through the GMM when performing GMM likelihood calculations. The experimental results show that speaker recognition by using the developed EGMM-SVM with an accurate operation mechanism for Gaussian distribution derivations yields a higher recognition rate than does a conventional GMM without any considerations on the quality of test speech utterances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Bharkad S, Kokare M (2012) Hartley transform based fingerprint matching. J Inf Process Syst 8(1):85–100

    Article  Google Scholar 

  2. Boujelbene SZ, Mezghani DBA, Ellouze N (2010) Improving SVM by modifying kernel functions for speaker identification task. Int J Digit Content Technol Appl 4(6):100–105

    Article  Google Scholar 

  3. Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167

    Article  Google Scholar 

  4. Burget L, Matejka P, Schwarz P, Glembek O, Cernocky J (2007) Analysis of feature extraction and channel compensation in a GMM speaker recognition system. IEEE Trans Audio, Speech, Lang Process 15(7):1979–1986

    Article  Google Scholar 

  5. Campbell WM, Campbell JP, Gleason TP, Reynolds DA, Shen W (2007) Speaker verification using support vector machines and high-level features. IEEE Trans Audio, Speech, Lang Process 15(7):2085–2094

    Article  Google Scholar 

  6. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39:1–38

    MATH  MathSciNet  Google Scholar 

  7. Fan CI, Lin YH (2012) Full privacy minutiae-based fingerprint verification for low-computation devices. J Converg 3(2):21–24

    Google Scholar 

  8. Gaikwad SK, Gawali BW, Yannawar P (2010) A review on speech recognition technique. Int J Comput Appl 10(3):16–24

    Google Scholar 

  9. Griol D, Molina JM, Corrales V (2011) The VoiceApp system: Speech technologies to access the semantic web. In: CAEPIA 2011. Lecture Notes in Computer Science, vol 7023, pp 393–402

  10. Hussain A, Abbasi AR, Afzulpurkar N (2012) Detecting & interpreting self-manipulating hand movements for student’s affect prediction. Hum-centric Comput Inf Sci 2(14):1–18

    Google Scholar 

  11. Jourani R, Daoudi K, Andre-Obrecht R, Aboutajdine D (2011) Speaker verification using large margin GMM discriminative training. In: Proceedings of International Conference on Multimedia Computing and Systems. Toulouse, France, pp 1–5

  12. Kenny P, Boulianne G, Ouellet P, Dumouchel P (2007) Speaker and session variability in GMM-based speaker verification. IEEE Trans Audio, Speech, Lang Process 15(4):1448–1460

    Article  Google Scholar 

  13. Linde Y, Buzo A, Gray RM (1980) An algorithm for vector quantizer design. IEEE Trans Commun 28:84–95

    Article  Google Scholar 

  14. McLaren M, Vogt R, Baker B, Sridharan S (2010) Data-driven background dataset selection for SVM-based speaker verification. IEEE Trans Audio, Speech, Lang Process 18(6):1496–1506

    Article  Google Scholar 

  15. Qian Z, Xu D (2009) Research advances in face recognition. In: Proceedings of IEEE Chinese Conference on Pattern Recognition, pp 1–5

  16. Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 3(1):72–83

    Article  Google Scholar 

  17. Satone MP, Kharate GK (2012) Face recognition based on PCA on wavelet subband of average-half-face. J Inf Process Syst 8(3):483–494

    Article  Google Scholar 

  18. You CH, Lee KA, Li H (2009) An SVM kernel with GMM-supervector based on the Bhattacharyya distance for speaker recognition. IEEE Signal Proces Lett 16(1):49–52

    Article  Google Scholar 

  19. You CH, Lee KA, Li H (2010) GMM-SVM kernel with a Bhattacharyya-based distance for speaker recognition. IEEE Trans Audio, Speech, Lang Process 18(6):1300–1312

    Article  Google Scholar 

  20. Zhang M, Zou KQ (2008) The application of fuzzy clustering after improvement on speaker recognition. ICIC Express Lett 2(3):263–267

    MathSciNet  Google Scholar 

Download references

Acknowledgments

This research is partially supported by the National Science Council (NSC) in Taiwan under grant NSC 101-2221-E-150-084.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chih-Ta Yen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, IJ., Yen, CT. Enhancing GMM speaker identification by incorporating SVM speaker verification for intelligent web-based speech applications. Multimed Tools Appl 74, 5131–5140 (2015). https://doi.org/10.1007/s11042-013-1587-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1587-5

Keywords

Navigation