Abstract
Since the speaker independent phoneme HMM based voice dialing system uses only the phoneme transcription of the input sentence, the storage space could be reduced greatly. However, the performance of the system is worse than that of the speaker dependent system due to the phoneme recognition errors generated when the speaker independent models are used. In order to solve this problem, a new method that jointly estimates the transformation vectors (bias) and transcriptions for the speaker adaptation is presented. The biases and transcriptions are estimated iteratively from the training data of each user with maximum likelihood approach to the stochastic matching using speaker independent phoneme models. Experimental result shows that the proposed method is superior to the conventional method using transcriptions only.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jain, N., Cole, R., Barnard, E.: Creating Speaker-Specific Phonetic Templates with a Speaker-Independent Phonetic Recognizer: Implications for Voice Dialing. In: Proc. of ICASSP 1996, pp. 881–884 (1996)
Fontaine, V., Bourlard, H.: Speaker-Dependent Speech Recognition Based on Phone-Like Units Models-Application to Voice Dialing. In: Proc. of ICASSP 1997, pp. 1527–1530 (1997)
Ramabhadran, B., Bahl, L.R., deSouza, P.V.: Acoustic-Only Based Automatic Phonetic Baseform Generation. In: Proc. of ICASSP 1998, pp. 2275–2278 (1998)
Shozakai, M.: Speech Interface for Car Applications. In: Proc. of ICASSP 1999, pp. 1386–1389 (1999)
Zavaliagkos, G., Schwartz, R., Makhoul, J.: Batch, Incremental and Instantaneous Adaptation Techniques for Speech Recognition. In: Proc. of ICASSP 1995, pp. 676–679 (1995)
Sankar, A., Lee, C.H.: A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition. IEEE Trans. on Speech and Audio Processing 4, 190–202 (1996)
Sukkar, R.A., Lee, C.H.: Vocabulary independent discriminative utterance verification for non-keyword rejection in subword based speech recognition. IEEE Trans. Speech and Au-dio Processing 4, 420–429 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, WG., Jang, M., Lee, CH. (2005). Unsupervised Speaker Adaptation for Phonetic Transcription Based Voice Dialing. In: Wang, L., Jin, Y. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2005. Lecture Notes in Computer Science(), vol 3614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11540007_29
Download citation
DOI: https://doi.org/10.1007/11540007_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28331-7
Online ISBN: 978-3-540-31828-6
eBook Packages: Computer ScienceComputer Science (R0)