Abstract
This chapter presents a neural model for speaker identification using speaker-specific information extracted from vowel sounds. The vowel sound is segmented out from words spoken by the speaker to be identified. Vowel sounds occur in a speech more frequently and with higher energy. Therefore, situations where acoustic information is noise corrupted, vowel sounds can be used to extract different amounts of speaker discriminative information. The model explained here uses a neural framework formed with PNN and LVQ where the proposed SOM-based vowel segmentation technique is used. The work extracts glottal source information of the speakers initially using LP residual. Later, empirical-mode decomposition (EMD) of the speech signal is performed to extract the residual. Depending on these residual features a LVQ-based speaker code book is formed. The work shows the use of residual signal obtained from EMD of speech as a speaker discriminative feature. The neural approach of speaker identification gives superior performance in comparison with the conventional statistical approach like hidden Markov models (HMMs), Gaussian mixture models (GMMs), etc. found in the literature. Although the proposed model has been experimented in case of the speakers of Assamese language, it shall also be suitable for other Indian languages for which the speaker database should contain samples of that specific language.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Miles MJ (1989) Speaker recognition based upon an analysis of vowel sounds and its application to Forensic work, Masters Dissertation, University of Auckland, NewZeland.
Kumar R, Ranjan R, Singh SK, Kala R, Shukla A, Tiwari R (2010) Text-dependent multilingual speaker identification for Indian Languages using artificial neural network. In: Proceedings of 3rd international conference on emerging trends in engineering and technology, pp 632–635.
Lajish VL, Sunil Kumar RK, Lajish VL, Sunil Kumar RK, Vivek P (2012) Speaker identification using a nonlinear speech model and ANN. Int J Adv Inf Technol 2(5):15–24
Qian B, Tang Z, Li Y, Xu L, Zhang Y (2007) Neural network ensemble based on vowel classification for Chinese speaker recognition. In: Proceedings of the 3rd international conference on natural computation, USA, 03.
Ranjan R, Singh SK, Shukla A, Tiwari R (2010) Text-dependent multilingual speaker identification for indian languages using artificial neural network. Proceedings of 3rd international conference on emerging trends in engineering and technology. Gwalior, India, pp 632–635
Chelali F, Djeradi A, Djeradi R (2011) Speaker identification system based on PLP coefficients and artificial neural network. In: Proceedings of the world congress on engineering, London, p 2.
Soria RAB, Cabral EF (1996) Speaker recognition with artificial neural networks and mel-frequency cepstral coefficients correlations. In: Proceedings of European signal processing conference, Italy.
Justin J, Vennila I (2011) Performance of speech recognition using artificial neural network and fuzzy logic. Eur J Sci Res 66(1):41–47
Yadav R, Mandal D (2011) Optimization of artificial neural network for speaker recognition using particle swarm optimization. Int J Soft Comput Eng 1(3):80–84
Hu YH, Hwang JN (2002) Handbook of neural network signal processing., The electrical engineering and applied signal processing seriesCRC Press, USA.
Templeton TG, Gullemin BJ (1990) Speaker identification based on vowel sounds using neural networks. In: Proceedings of 3rd international conference on speech science and technology, Australia, pp 280–285.
Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83
Hasan T, Hansen J H L (2011) Robust speaker recognition in non-stationary room environments based on empirical mode decomposition. In: Proceedings of Interspeech.
Hsieh CT, Lai E, Wang YC (2003) Robust speaker identification system based on wavelet transform and Gaussian mixture model. J Inf Sci Eng 19:267–282
Ertas F (2001) Feature selection and classification techniques for speaker recognition. J Eng Sci 07(1):47–54
Patil V, Joshi S, Rao P (2009) Improving the robustness of phonetic segmentation to accent and style variation with a two-staged approach. Proceedings of Interspeech. Brighton, UK, pp 2543–2546
Campbell JP (1997) Speaker recognition: a tutorial. Proc IEEE 85(9):1437–1462
Huang NE, Shen Z, Long SR, Wu ML, Shih HH, Zheng Q, Yen NC, Tung CC, Liu HH (1998) The empirical mode decomposition and hilbert spectrum for nonlinear and nonstationary time series analysis. Proc Royal Soc Lond A 454:903–995
Fakotakis N, Tsopanoglou A, Kokkinakis G (1991) Text-independent speaker recognition based on vowel spotting. In: Proceedings of 6th international conference on digital processing of signals in communications, Loughborough, pp 272–277.
Thcvenaz P, Hiigli H (1995) Usefulness of the LPC-residue in text-independent speaker verification. Speech Commun 17:145–157
Radova V, Psutka J (1997) An approach to speaker identification using multiple classifiers. Proceedings of IEEE international conference on acoustics, speech, and signal processing 2:1135–1138
Sarma SV, Zue VW (1997) A segment-based speaker verification system using \(summit^{1}\). In: Proceedings of EUROSPEECH.
Mahadeva Prasanna SR, Gupta CS, Yegnanarayana B (2006) Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Commun 48:1243–1261
Espy-Wilson CY, Manocha S, Vishnubhotla S (2006) A new set of features for text-independent speaker identification. In: Proceedings of INTERSPEECH, ISCA.
Antal M (2008) Phonetic speaker recognition. In: Proceedings of 7th international conference, COMMUNICATIONS, pp 67–72.
Jiahong Y, Mark L (2008) Speaker identification on the SCOTUS corpus. J Acoust Soc Am 123(5):3878
Ferras M, Barras C, Gauvain J (2009) Lattice-based MLLR for Speaker Recognition. In: IEEE international conference on acoustics, speech and signal processing, pp 4537–4540.
Tzagkarakis C, Mouchtaris A (2010) Robust text-independent speaker identification using short test and training. In: Proceedings of 18th European signal processing conference, Denmark, pp 586–590.
Shimada K, Yamamoto K, Nakagawa S (2011) Speaker identification using pseudo pitch synchronized phase information in voiced sound. In: Proceedings of annual summit and conference of Asia pacific signal and information processing association, Xian, China.
Pradhan G, Prasanna SRM (2011) Significance of vowel onset point information for speaker verification. Int J Comput CommunTechnol 2(6):60–66
Kinnunen T, Kilpelainen T, Franti P (2011) Comparison of clustering algorithms in speaker identification. Available via http://www.cs.joensuu.fi/pages/tkinnu/webpage/pdf/ComparisonClusteringAlgsSpeakerRec.pdf
Vuppala AK, Rao KS (2012) Speaker identification under background noise using features extracted from steady vowel regions. Int J Adapt Control Signal Process. doi:10.1002/acs.2357
Pati D, Prasanna SRM (2012) Speaker verification using excitation source information. Int J Speech Technol. doi:10.1007/s10772-012-9137-5
Rilling G, Flandrin P, Goncalves P (2003) On empirical mode decomposition and its algorithms. In: Proceedings of the 6th IEEE/EURASIP workshop on nonlinear signal and image processing, Italy.
Bouzid A, Ellouze N (2007) EMD analysis of speech signal in voiced mode.In: Proceedings of ITRW on non-linear speech processing. France, Paris, pp 112–115.
Schlotthauer G, Torres ME, Rufiner HL (2009) Voice fundamental frequency extraction algorithm based on ensemble empirical mode decomposition and entropies. In: Proceedings of the world congress on medical physics and biomedical engineering, Germany.
Schlotthauer G, Torres ME, Rufiner HL (2009) A new algorithm for instantaneous F0 speech extraction based on ensemble empirical mode decomposition. In: Proceedings of 17th European signal processing conference.
Hasan T, Hasan K (2009) Suppression of residual noise from speech signals using empirical mode decomposition. IEEE Signal Process Lett 16(1):2–5
Battista BM, Knapp C, McGee T, Goebel V (2007) Application of the empirical mode decomposition and hilbert-huang transform to seismic reflection data. Geophysics 72:29–37
Bullinaria JA (2000) A learning vector quantization algorithm for probabilistic models. Proceedings of EUSIPCO 2:721–724
Boersma P, Weenink D Praat: doing phonetics by computer. Available via http://www.fon.hum.uva.nl/praat/
Fakotakis N, Tsopanoglou A, Kokkinakis G (1993) A text-independent speaker recognition system based on vowel spotting. Speech Commun 12(1):57–68
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2014 Springer India
About this chapter
Cite this chapter
Sarma, M., Sarma, K.K. (2014). Application of Proposed Phoneme Segmentation Technique for Speaker Identification. In: Phoneme-Based Speech Segmentation using Hybrid Soft Computing Framework. Studies in Computational Intelligence, vol 550. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1862-3_9
Download citation
DOI: https://doi.org/10.1007/978-81-322-1862-3_9
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1861-6
Online ISBN: 978-81-322-1862-3
eBook Packages: EngineeringEngineering (R0)