Unsupervised Speaker Adaptation for Phonetic Transcription Based Voice Dialing

Kim, Weon-Goo; Jang, MinSeok; Lee, Chin-Hui

doi:10.1007/11540007_29

Weon-Goo Kim²⁰,
MinSeok Jang²¹ &
Chin-Hui Lee²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3614))

Included in the following conference series:

International Conference on Fuzzy Systems and Knowledge Discovery

1307 Accesses

Abstract

Since the speaker independent phoneme HMM based voice dialing system uses only the phoneme transcription of the input sentence, the storage space could be reduced greatly. However, the performance of the system is worse than that of the speaker dependent system due to the phoneme recognition errors generated when the speaker independent models are used. In order to solve this problem, a new method that jointly estimates the transformation vectors (bias) and transcriptions for the speaker adaptation is presented. The biases and transcriptions are estimated iteratively from the training data of each user with maximum likelihood approach to the stochastic matching using speaker independent phoneme models. Experimental result shows that the proposed method is superior to the conventional method using transcriptions only.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jain, N., Cole, R., Barnard, E.: Creating Speaker-Specific Phonetic Templates with a Speaker-Independent Phonetic Recognizer: Implications for Voice Dialing. In: Proc. of ICASSP 1996, pp. 881–884 (1996)
Google Scholar
Fontaine, V., Bourlard, H.: Speaker-Dependent Speech Recognition Based on Phone-Like Units Models-Application to Voice Dialing. In: Proc. of ICASSP 1997, pp. 1527–1530 (1997)
Google Scholar
Ramabhadran, B., Bahl, L.R., deSouza, P.V.: Acoustic-Only Based Automatic Phonetic Baseform Generation. In: Proc. of ICASSP 1998, pp. 2275–2278 (1998)
Google Scholar
Shozakai, M.: Speech Interface for Car Applications. In: Proc. of ICASSP 1999, pp. 1386–1389 (1999)
Google Scholar
Zavaliagkos, G., Schwartz, R., Makhoul, J.: Batch, Incremental and Instantaneous Adaptation Techniques for Speech Recognition. In: Proc. of ICASSP 1995, pp. 676–679 (1995)
Google Scholar
Sankar, A., Lee, C.H.: A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition. IEEE Trans. on Speech and Audio Processing 4, 190–202 (1996)
Article Google Scholar
Sukkar, R.A., Lee, C.H.: Vocabulary independent discriminative utterance verification for non-keyword rejection in subword based speech recognition. IEEE Trans. Speech and Au-dio Processing 4, 420–429 (1996)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Biometrics Engineering Research Center, School of Electronic and Information Eng., Kunsan National Univ., Kunsan, Chonbuk, 573-701, Korea
Weon-Goo Kim
Dept. of Computer Information Science, Kunsan National Univ., Kunsan, Chonbuk, 573-701, Korea
MinSeok Jang
School of Electrical and Computer Eng., Georgia Institute of Technology, Atlanta, Georgia, 30332, USA
Chin-Hui Lee

Authors

Weon-Goo Kim
View author publications
You can also search for this author in PubMed Google Scholar
MinSeok Jang
View author publications
You can also search for this author in PubMed Google Scholar
Chin-Hui Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University, Block S1, Nanyang Avenue, 639798, Singapore
Lipo Wang
Honda Research Institute Europe GmbH, Offenbach/Main, Germany
Yaochu Jin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, WG., Jang, M., Lee, CH. (2005). Unsupervised Speaker Adaptation for Phonetic Transcription Based Voice Dialing. In: Wang, L., Jin, Y. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2005. Lecture Notes in Computer Science(), vol 3614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11540007_29

Download citation

DOI: https://doi.org/10.1007/11540007_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28331-7
Online ISBN: 978-3-540-31828-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics