Skip to main content

Application of Proposed Phoneme Segmentation Technique for Speaker Identification

  • Chapter
  • First Online:
Phoneme-Based Speech Segmentation using Hybrid Soft Computing Framework

Part of the book series: Studies in Computational Intelligence ((SCI,volume 550))

  • 575 Accesses

Abstract

This chapter presents a neural model for speaker identification using speaker-specific information extracted from vowel sounds. The vowel sound is segmented out from words spoken by the speaker to be identified. Vowel sounds occur in a speech more frequently and with higher energy. Therefore, situations where acoustic information is noise corrupted, vowel sounds can be used to extract different amounts of speaker discriminative information. The model explained here uses a neural framework formed with PNN and LVQ where the proposed SOM-based vowel segmentation technique is used. The work extracts glottal source information of the speakers initially using LP residual. Later, empirical-mode decomposition (EMD) of the speech signal is performed to extract the residual. Depending on these residual features a LVQ-based speaker code book is formed. The work shows the use of residual signal obtained from EMD of speech as a speaker discriminative feature. The neural approach of speaker identification gives superior performance in comparison with the conventional statistical approach like hidden Markov models (HMMs), Gaussian mixture models (GMMs), etc. found in the literature. Although the proposed model has been experimented in case of the speakers of Assamese language, it shall also be suitable for other Indian languages for which the speaker database should contain samples of that specific language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Miles MJ (1989) Speaker recognition based upon an analysis of vowel sounds and its application to Forensic work, Masters Dissertation, University of Auckland, NewZeland.

    Google Scholar 

  2. Kumar R, Ranjan R, Singh SK, Kala R, Shukla A, Tiwari R (2010) Text-dependent multilingual speaker identification for Indian Languages using artificial neural network. In: Proceedings of 3rd international conference on emerging trends in engineering and technology, pp 632–635.

    Google Scholar 

  3. Lajish VL, Sunil Kumar RK, Lajish VL, Sunil Kumar RK, Vivek P (2012) Speaker identification using a nonlinear speech model and ANN. Int J Adv Inf Technol 2(5):15–24

    Google Scholar 

  4. Qian B, Tang Z, Li Y, Xu L, Zhang Y (2007) Neural network ensemble based on vowel classification for Chinese speaker recognition. In: Proceedings of the 3rd international conference on natural computation, USA, 03.

    Google Scholar 

  5. Ranjan R, Singh SK, Shukla A, Tiwari R (2010) Text-dependent multilingual speaker identification for indian languages using artificial neural network. Proceedings of 3rd international conference on emerging trends in engineering and technology. Gwalior, India, pp 632–635

    Google Scholar 

  6. Chelali F, Djeradi A, Djeradi R (2011) Speaker identification system based on PLP coefficients and artificial neural network. In: Proceedings of the world congress on engineering, London, p 2.

    Google Scholar 

  7. Soria RAB, Cabral EF (1996) Speaker recognition with artificial neural networks and mel-frequency cepstral coefficients correlations. In: Proceedings of European signal processing conference, Italy.

    Google Scholar 

  8. Justin J, Vennila I (2011) Performance of speech recognition using artificial neural network and fuzzy logic. Eur J Sci Res 66(1):41–47

    Google Scholar 

  9. Yadav R, Mandal D (2011) Optimization of artificial neural network for speaker recognition using particle swarm optimization. Int J Soft Comput Eng 1(3):80–84

    Google Scholar 

  10. Hu YH, Hwang JN (2002) Handbook of neural network signal processing., The electrical engineering and applied signal processing seriesCRC Press, USA.

    Google Scholar 

  11. Templeton TG, Gullemin BJ (1990) Speaker identification based on vowel sounds using neural networks. In: Proceedings of 3rd international conference on speech science and technology, Australia, pp 280–285.

    Google Scholar 

  12. Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83

    Google Scholar 

  13. Hasan T, Hansen J H L (2011) Robust speaker recognition in non-stationary room environments based on empirical mode decomposition. In: Proceedings of Interspeech.

    Google Scholar 

  14. Hsieh CT, Lai E, Wang YC (2003) Robust speaker identification system based on wavelet transform and Gaussian mixture model. J Inf Sci Eng 19:267–282

    Google Scholar 

  15. Ertas F (2001) Feature selection and classification techniques for speaker recognition. J Eng Sci 07(1):47–54

    Google Scholar 

  16. Patil V, Joshi S, Rao P (2009) Improving the robustness of phonetic segmentation to accent and style variation with a two-staged approach. Proceedings of Interspeech. Brighton, UK, pp 2543–2546

    Google Scholar 

  17. Campbell JP (1997) Speaker recognition: a tutorial. Proc IEEE 85(9):1437–1462

    Google Scholar 

  18. Huang NE, Shen Z, Long SR, Wu ML, Shih HH, Zheng Q, Yen NC, Tung CC, Liu HH (1998) The empirical mode decomposition and hilbert spectrum for nonlinear and nonstationary time series analysis. Proc Royal Soc Lond A 454:903–995

    Article  MATH  MathSciNet  Google Scholar 

  19. Fakotakis N, Tsopanoglou A, Kokkinakis G (1991) Text-independent speaker recognition based on vowel spotting. In: Proceedings of 6th international conference on digital processing of signals in communications, Loughborough, pp 272–277.

    Google Scholar 

  20. Thcvenaz P, Hiigli H (1995) Usefulness of the LPC-residue in text-independent speaker verification. Speech Commun 17:145–157

    Google Scholar 

  21. Radova V, Psutka J (1997) An approach to speaker identification using multiple classifiers. Proceedings of IEEE international conference on acoustics, speech, and signal processing 2:1135–1138

    Google Scholar 

  22. Sarma SV, Zue VW (1997) A segment-based speaker verification system using \(summit^{1}\). In: Proceedings of EUROSPEECH.

    Google Scholar 

  23. Mahadeva Prasanna SR, Gupta CS, Yegnanarayana B (2006) Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Commun 48:1243–1261

    Article  Google Scholar 

  24. Espy-Wilson CY, Manocha S, Vishnubhotla S (2006) A new set of features for text-independent speaker identification. In: Proceedings of INTERSPEECH, ISCA.

    Google Scholar 

  25. Antal M (2008) Phonetic speaker recognition. In: Proceedings of 7th international conference, COMMUNICATIONS, pp 67–72.

    Google Scholar 

  26. Jiahong Y, Mark L (2008) Speaker identification on the SCOTUS corpus. J Acoust Soc Am 123(5):3878

    Google Scholar 

  27. Ferras M, Barras C, Gauvain J (2009) Lattice-based MLLR for Speaker Recognition. In: IEEE international conference on acoustics, speech and signal processing, pp 4537–4540.

    Google Scholar 

  28. Tzagkarakis C, Mouchtaris A (2010) Robust text-independent speaker identification using short test and training. In: Proceedings of 18th European signal processing conference, Denmark, pp 586–590.

    Google Scholar 

  29. Shimada K, Yamamoto K, Nakagawa S (2011) Speaker identification using pseudo pitch synchronized phase information in voiced sound. In: Proceedings of annual summit and conference of Asia pacific signal and information processing association, Xian, China.

    Google Scholar 

  30. Pradhan G, Prasanna SRM (2011) Significance of vowel onset point information for speaker verification. Int J Comput CommunTechnol 2(6):60–66

    Google Scholar 

  31. Kinnunen T, Kilpelainen T, Franti P (2011) Comparison of clustering algorithms in speaker identification. Available via http://www.cs.joensuu.fi/pages/tkinnu/webpage/pdf/ComparisonClusteringAlgsSpeakerRec.pdf

  32. Vuppala AK, Rao KS (2012) Speaker identification under background noise using features extracted from steady vowel regions. Int J Adapt Control Signal Process. doi:10.1002/acs.2357

    Google Scholar 

  33. Pati D, Prasanna SRM (2012) Speaker verification using excitation source information. Int J Speech Technol. doi:10.1007/s10772-012-9137-5

    Google Scholar 

  34. Rilling G, Flandrin P, Goncalves P (2003) On empirical mode decomposition and its algorithms. In: Proceedings of the 6th IEEE/EURASIP workshop on nonlinear signal and image processing, Italy.

    Google Scholar 

  35. Bouzid A, Ellouze N (2007) EMD analysis of speech signal in voiced mode.In: Proceedings of ITRW on non-linear speech processing. France, Paris, pp 112–115.

    Google Scholar 

  36. Schlotthauer G, Torres ME, Rufiner HL (2009) Voice fundamental frequency extraction algorithm based on ensemble empirical mode decomposition and entropies. In: Proceedings of the world congress on medical physics and biomedical engineering, Germany.

    Google Scholar 

  37. Schlotthauer G, Torres ME, Rufiner HL (2009) A new algorithm for instantaneous F0 speech extraction based on ensemble empirical mode decomposition. In: Proceedings of 17th European signal processing conference.

    Google Scholar 

  38. Hasan T, Hasan K (2009) Suppression of residual noise from speech signals using empirical mode decomposition. IEEE Signal Process Lett 16(1):2–5

    Article  Google Scholar 

  39. Battista BM, Knapp C, McGee T, Goebel V (2007) Application of the empirical mode decomposition and hilbert-huang transform to seismic reflection data. Geophysics 72:29–37

    Article  Google Scholar 

  40. Bullinaria JA (2000) A learning vector quantization algorithm for probabilistic models. Proceedings of EUSIPCO 2:721–724

    Google Scholar 

  41. Boersma P, Weenink D Praat: doing phonetics by computer. Available via http://www.fon.hum.uva.nl/praat/

  42. Fakotakis N, Tsopanoglou A, Kokkinakis G (1993) A text-independent speaker recognition system based on vowel spotting. Speech Commun 12(1):57–68

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mousmita Sarma .

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer India

About this chapter

Cite this chapter

Sarma, M., Sarma, K.K. (2014). Application of Proposed Phoneme Segmentation Technique for Speaker Identification. In: Phoneme-Based Speech Segmentation using Hybrid Soft Computing Framework. Studies in Computational Intelligence, vol 550. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1862-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-1862-3_9

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-1861-6

  • Online ISBN: 978-81-322-1862-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics