Adaptation of Deep Neural Networks

Yu, Dong; Deng, Li

doi:10.1007/978-1-4471-5779-3_11

Dong Yu³ &
Li Deng⁴

Part of the book series: Signals and Communication Technology ((SCT))

13k Accesses
3 Citations

Abstract

Adaptation techniques can compensate for the difference between the training and testing conditions and thus can further improve the speech recognition accuracy. Unlike Gaussian mixture models (GMMs), which are generative models, deep neural networks (DNNs) are discriminative models. For this reason, the adaptation techniques developed for GMMs cannot be directly applied to DNNs. In this chapter, we first introduce the concept of adaptation. We then describe the important adaptation techniques developed for DNNs, which are classified into the categories of linear transformation, conservative training, and subspace methods. We further show that adaptation in DNNs can bring significant error rate reduction at least for some speech recognition tasks and thus is as important as that in the GMM systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdel-Hamid, O., Jiang, H.: Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7942–7946 (2013)
Google Scholar
Abrash, V., Franco, H., Sankar, A., Cohen, M.: Connectionist speaker normalization and adaptation. In: Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH) (1995)
Google Scholar
Albesano, D., Gemello, R., Laface, P., Mana, F., Scanzio, S.: Adaptation of artificial neural networks avoiding catastrophic forgetting. In: Proceedings of the International Conference on Neural Networks (IJCNN), pp. 1554–1561 (2006)
Google Scholar
Albesano, D., Gemello, R., Mana, F.: Hybrid HMM-NN modeling of stationary-transitional units for continuous speech recognition. Inf. Sci. 123(1), 3–11 (2000)
Article MATH Google Scholar
Bacchiani, M.: Rapid adaptation for mobile speech applications. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7903–7907 (2013)
Google Scholar
Brümmer, N.: The EM algorithm and minimum divergence. Agnitio Labs Technical Report. http://niko.brummer.googlepages (2009)
Chesta, C., Siohan, O., Lee, C.H.: Maximum a posteriori linear regression for hidden markov model adaptation. In: Eurospeech (1999)
Google Scholar
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio, Speech Lang. Process. 19(4), 788–798 (2011)
Article Google Scholar
Dupont, S., Cheboub, L.: Fast speaker adaptation of artificial neural networks for automatic speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 3, pp. 1795–1798 (2000)
Google Scholar
Gales, M.J.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)
Article Google Scholar
Gales, M.J., Woodland, P.: Mean and variance adaptation within the mllr framework. Comput. Speech Lang. 10(4), 249–264 (1996)
Article Google Scholar
Gemello, R., Mana, F., Scanzio, S., Laface, P., De Mori, R.: Linear hidden transformations for adaptation of hybrid ANN/HMM models. Speech Commun. 49(10), 827–835 (2007)
Article Google Scholar
Glembek, O., Burget, L., Matejka, P., Karafiát, M., Kenny, P.: Simplification and optimization of i-vector extraction. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4516–4519 (2011)
Google Scholar
Godfrey, J.J., Holliman, E.: Switchboard-1 release 2. Linguistic Data Consortium (1997)
Google Scholar
Godfrey, J.J., Holliman, E.C., McDaniel, J.: Switchboard: telephone speech corpus for research and development. In: Proceeding of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 517–520 (1992)
Google Scholar
Karafiát, M., Burget, L., Matejka, P., Glembek, O., Cernocky, J.: iVector-based discriminative adaptation for automatic speech recognition. In: Proceedings of the IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU), pp. 152–157 (2011)
Google Scholar
Kenny, P.: Joint factor analysis of speaker and session variability: theory and algorithms. CRIM, Montreal, (Report) CRIM-06/08-13 (2005)
Google Scholar
Kim, D.Y., Kwan Un, C., Kim, N.S.: Speech recognition in noisy environments using first-order vector Taylor series. Speech Commun. 24(1), 39–49 (1998)
Article Google Scholar
Lee, C.H., Huo, Q.: On adaptive decision rules and decision parameter adaptation for automatic speech recognition. Proc. IEEE 88(8), 1241–1269 (2000)
Article Google Scholar
Leggetter, C.J., Woodland, P.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang. 9(2), 171–185 (1995)
Article Google Scholar
Li, B., Sim, K.C.: Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 526–529 (2010)
Google Scholar
Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: High-performance HMM adaptation with joint compensation of additive and convolutive distortions via vector Taylor series. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 65–70 (2007)
Google Scholar
Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: HMM adaptation using a phase-sensitive acoustic distortion model for environment-robust speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4069–4072 (2008)
Google Scholar
Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions. Comput. Speech Lang. 23(3), 389–405 (2009)
Article Google Scholar
Li, X., Bilmes, J.: Regularized adaptation of discriminative classifiers. In: Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1. IEEE, pp. I–I (2006)
Google Scholar
Moreno, P.J., Raj, B., Stern, R.M.: A vector Taylor series approach for environment-independent speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 733–736 (1996)
Google Scholar
Morgan, N., Bourlard, H.: Continuous speech recognition using multilayer perceptrons with hidden Markov models. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 413–416 (1990)
Google Scholar
Neto, J., Almeida, L., Hochberg, M., Martins, C., Nunes, L., Renals, S., Robinson, T.: Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system, pp. 2171–2174 (1995)
Google Scholar
Saon, G., Soltau, H., Nahamoo, D., Picheny, M.: Speaker adaptation of neural network acoustic models using i-vectors. In: Proceedings of the IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU), pp. 55–59 (2013)
Google Scholar
Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: Proceedings of the IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU), pp. 24–29 (2011)
Google Scholar
Seltzer, M., Yu, D., Wang, Y.: An investigation of deep neural networks for noise robust speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)
Google Scholar
Stadermann, J., Rigoll, G.: Two-stage speaker adaptation of hybrid tied-posterior acoustic models. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2005)
Google Scholar
Trmal, J., Zelinka, J., Müller, L.: Adaptation of a feedforward artificial neural network using a linear transform. In: Text, Speech and Dialogue, pp. 423–430. Springer (2010)
Google Scholar
Wang, Y., Gales, M.J.: Speaker and noise factorization for robust speech recognition. IEEE Trans. Audio, Speech Lang. Process. 20(7), 2149–2158 (2012)
Article Google Scholar
Xiao, Y., Zhang, Z., Cai, S., Pan, J., Yan, Y.: A initial attempt on task-specific adaptation for deep neural network-based large vocabulary continuous speech recognition. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2012)
Google Scholar
Xue, J., Li, J., Yu, D., Seltzer, M., Gong, Y.: Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014)
Google Scholar
Xue, S., Abdel-Hamid, O., Jiang, H., Dai, L.: Direct adaptation of hybrid DNN/HMM model for fast speaker adaptation in LVCSR based on speaker code. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6389–6393 (2014)
Google Scholar
Xue, S., Jiang, H., Dai, L.: Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition. In: International Symposium on Chinese Spoken Language Processing (ISCSLP) (2014)
Google Scholar
Yao, K., Gong, Y., Liu, C.: A feature space transformation method for personalization using generalized i-vector clustering. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2012)
Google Scholar
Yao, K., Yu, D., Seide, F., Su, H., Deng, L., Gong, Y.: Adaptation of context-dependent deep neural networks for automatic speech recognition. In: Proceedings of the IEEE Spoken Language Technology Workshop (SLT), pp. 366–369 (2012)
Google Scholar
Yu, D., Chen, X., Deng, L.: Factorized deep neural networks for adaptive speech recognition. In: Proceedings of the International Workshop on Statistical Machine Learning for Speech Processing (2012)
Google Scholar
Yu, D., Deng, L., Seide, F.: The deep tensor neural network with applications to large vocabulary speech recognition. IEEE Trans. Audio, Speech Lang. Process. 21(3), 388–396 (2013)
Article Google Scholar
Yu, D., Yao, K., Su, H., Li, G., Seide, F.: Kl-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7893–7897 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, Bothell, USA
Dong Yu
Microsoft Research, Redmond, WA, USA
Li Deng

Authors

Dong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Li Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong Yu .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yu, D., Deng, L. (2015). Adaptation of Deep Neural Networks. In: Automatic Speech Recognition. Signals and Communication Technology. Springer, London. https://doi.org/10.1007/978-1-4471-5779-3_11

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5779-3_11
Published: 12 November 2014
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5778-6
Online ISBN: 978-1-4471-5779-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics