Abstract
Adaptation techniques can compensate for the difference between the training and testing conditions and thus can further improve the speech recognition accuracy. Unlike Gaussian mixture models (GMMs), which are generative models, deep neural networks (DNNs) are discriminative models. For this reason, the adaptation techniques developed for GMMs cannot be directly applied to DNNs. In this chapter, we first introduce the concept of adaptation. We then describe the important adaptation techniques developed for DNNs, which are classified into the categories of linear transformation, conservative training, and subspace methods. We further show that adaptation in DNNs can bring significant error rate reduction at least for some speech recognition tasks and thus is as important as that in the GMM systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdel-Hamid, O., Jiang, H.: Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7942–7946 (2013)
Abrash, V., Franco, H., Sankar, A., Cohen, M.: Connectionist speaker normalization and adaptation. In: Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH) (1995)
Albesano, D., Gemello, R., Laface, P., Mana, F., Scanzio, S.: Adaptation of artificial neural networks avoiding catastrophic forgetting. In: Proceedings of the International Conference on Neural Networks (IJCNN), pp. 1554–1561 (2006)
Albesano, D., Gemello, R., Mana, F.: Hybrid HMM-NN modeling of stationary-transitional units for continuous speech recognition. Inf. Sci. 123(1), 3–11 (2000)
Bacchiani, M.: Rapid adaptation for mobile speech applications. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7903–7907 (2013)
Brümmer, N.: The EM algorithm and minimum divergence. Agnitio Labs Technical Report. http://niko.brummer.googlepages (2009)
Chesta, C., Siohan, O., Lee, C.H.: Maximum a posteriori linear regression for hidden markov model adaptation. In: Eurospeech (1999)
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio, Speech Lang. Process. 19(4), 788–798 (2011)
Dupont, S., Cheboub, L.: Fast speaker adaptation of artificial neural networks for automatic speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 3, pp. 1795–1798 (2000)
Gales, M.J.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)
Gales, M.J., Woodland, P.: Mean and variance adaptation within the mllr framework. Comput. Speech Lang. 10(4), 249–264 (1996)
Gemello, R., Mana, F., Scanzio, S., Laface, P., De Mori, R.: Linear hidden transformations for adaptation of hybrid ANN/HMM models. Speech Commun. 49(10), 827–835 (2007)
Glembek, O., Burget, L., Matejka, P., Karafiát, M., Kenny, P.: Simplification and optimization of i-vector extraction. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4516–4519 (2011)
Godfrey, J.J., Holliman, E.: Switchboard-1 release 2. Linguistic Data Consortium (1997)
Godfrey, J.J., Holliman, E.C., McDaniel, J.: Switchboard: telephone speech corpus for research and development. In: Proceeding of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 517–520 (1992)
Karafiát, M., Burget, L., Matejka, P., Glembek, O., Cernocky, J.: iVector-based discriminative adaptation for automatic speech recognition. In: Proceedings of the IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU), pp. 152–157 (2011)
Kenny, P.: Joint factor analysis of speaker and session variability: theory and algorithms. CRIM, Montreal, (Report) CRIM-06/08-13 (2005)
Kim, D.Y., Kwan Un, C., Kim, N.S.: Speech recognition in noisy environments using first-order vector Taylor series. Speech Commun. 24(1), 39–49 (1998)
Lee, C.H., Huo, Q.: On adaptive decision rules and decision parameter adaptation for automatic speech recognition. Proc. IEEE 88(8), 1241–1269 (2000)
Leggetter, C.J., Woodland, P.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang. 9(2), 171–185 (1995)
Li, B., Sim, K.C.: Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 526–529 (2010)
Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: High-performance HMM adaptation with joint compensation of additive and convolutive distortions via vector Taylor series. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 65–70 (2007)
Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: HMM adaptation using a phase-sensitive acoustic distortion model for environment-robust speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4069–4072 (2008)
Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions. Comput. Speech Lang. 23(3), 389–405 (2009)
Li, X., Bilmes, J.: Regularized adaptation of discriminative classifiers. In: Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1. IEEE, pp. I–I (2006)
Moreno, P.J., Raj, B., Stern, R.M.: A vector Taylor series approach for environment-independent speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 733–736 (1996)
Morgan, N., Bourlard, H.: Continuous speech recognition using multilayer perceptrons with hidden Markov models. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 413–416 (1990)
Neto, J., Almeida, L., Hochberg, M., Martins, C., Nunes, L., Renals, S., Robinson, T.: Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system, pp. 2171–2174 (1995)
Saon, G., Soltau, H., Nahamoo, D., Picheny, M.: Speaker adaptation of neural network acoustic models using i-vectors. In: Proceedings of the IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU), pp. 55–59 (2013)
Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: Proceedings of the IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU), pp. 24–29 (2011)
Seltzer, M., Yu, D., Wang, Y.: An investigation of deep neural networks for noise robust speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)
Stadermann, J., Rigoll, G.: Two-stage speaker adaptation of hybrid tied-posterior acoustic models. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2005)
Trmal, J., Zelinka, J., Müller, L.: Adaptation of a feedforward artificial neural network using a linear transform. In: Text, Speech and Dialogue, pp. 423–430. Springer (2010)
Wang, Y., Gales, M.J.: Speaker and noise factorization for robust speech recognition. IEEE Trans. Audio, Speech Lang. Process. 20(7), 2149–2158 (2012)
Xiao, Y., Zhang, Z., Cai, S., Pan, J., Yan, Y.: A initial attempt on task-specific adaptation for deep neural network-based large vocabulary continuous speech recognition. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2012)
Xue, J., Li, J., Yu, D., Seltzer, M., Gong, Y.: Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014)
Xue, S., Abdel-Hamid, O., Jiang, H., Dai, L.: Direct adaptation of hybrid DNN/HMM model for fast speaker adaptation in LVCSR based on speaker code. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6389–6393 (2014)
Xue, S., Jiang, H., Dai, L.: Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition. In: International Symposium on Chinese Spoken Language Processing (ISCSLP) (2014)
Yao, K., Gong, Y., Liu, C.: A feature space transformation method for personalization using generalized i-vector clustering. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2012)
Yao, K., Yu, D., Seide, F., Su, H., Deng, L., Gong, Y.: Adaptation of context-dependent deep neural networks for automatic speech recognition. In: Proceedings of the IEEE Spoken Language Technology Workshop (SLT), pp. 366–369 (2012)
Yu, D., Chen, X., Deng, L.: Factorized deep neural networks for adaptive speech recognition. In: Proceedings of the International Workshop on Statistical Machine Learning for Speech Processing (2012)
Yu, D., Deng, L., Seide, F.: The deep tensor neural network with applications to large vocabulary speech recognition. IEEE Trans. Audio, Speech Lang. Process. 21(3), 388–396 (2013)
Yu, D., Yao, K., Su, H., Li, G., Seide, F.: Kl-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7893–7897 (2013)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2015 Springer-Verlag London
About this chapter
Cite this chapter
Yu, D., Deng, L. (2015). Adaptation of Deep Neural Networks. In: Automatic Speech Recognition. Signals and Communication Technology. Springer, London. https://doi.org/10.1007/978-1-4471-5779-3_11
Download citation
DOI: https://doi.org/10.1007/978-1-4471-5779-3_11
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5778-6
Online ISBN: 978-1-4471-5779-3
eBook Packages: EngineeringEngineering (R0)