Acoustic Model Compression with Knowledge Transfer

Yi, Jiangyan; Tao, Jianhua; Wen, Zhengqi; Li, Ya; Ni, Hao

doi:10.1007/978-981-10-8111-8_9

Jiangyan Yi^14,15,
Jianhua Tao^14,15,16,
Zhengqi Wen¹⁴,
Ya Li¹⁴ &
…
Hao Ni^14,15

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 807))

Included in the following conference series:

National Conference on Man-Machine Speech Communication

358 Accesses

Abstract

Mobile devices have limited computing power and limited memory. Thus, large deep neural network (DNN) based acoustic models are not well suited for application on mobile devices. In order to alleviate this problem, this paper proposes to compress acoustic models by using knowledge transfer. This approach forces a large teacher model to transfer generalized knowledge to a small student model. The student model is trained with a linear interpolation of hard probabilities and soft probabilities to learn generalized knowledge from the teacher model. The hard probabilities are generated from a Gaussian mixture model hidden Markov model (GMM-HMM) system. The soft probabilities are computed from a teacher model (DNN or RNN). Experiments on AMI corpus show that a small student model obtains 2.4% relative WER improvement over a large teacher model with almost 7.6 times compression ratio.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 72.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dahl, G.E., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Article Google Scholar
Deng, L., Li, J., Huang, J.T., Yao, K., Yu, D., Seide, F., et al.: Recent advances in deep learning for speech research at microsoft. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, pp. 8604–8608. IEEE (2013)
Google Scholar
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, pp. 6645–6649. IEEE (2013)
Google Scholar
Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: Automatic Speech Recognition and Understanding, Olomouc, pp. 273–278. IEEE (2013)
Google Scholar
Weng, C., Yu, D., Watanabe, S., Juang, B.H.F.: Recurrent deep neural networks for robust speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, pp. 5532–5536. IEEE (2014)
Google Scholar
Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. Comput. Sci. 20(1), 338–342 (2014)
Google Scholar
Sainath, T.N., Kingsbury, B., Sindhwani, V., Arisoy, E., Ramabhadran, B.: Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, pp. 6655–6659. IEEE (2013)
Google Scholar
Lu, Z., Sindhwani, V., Sainath, T.N.: Learning compact recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, pp. 5960–5964. IEEE (2016)
Google Scholar
Xue, J., Li, J., Gong, Y.: Restructuring of deep neural network acoustic models with singular value decomposition. In: 14th Annual Conference of the International Speech Communication Association, Lyon, pp. 662–665. ISCA (2013)
Google Scholar
Prabhavalkar, R., Alsharif, O., Bruguier, A., Mcgraw, I.: On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, pp. 5970–5974. IEEE (2016)
Google Scholar
Vanhoucke, V., Devin, M., Heigold, G.: Multiframe deep neural networks for acoustic modeling. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, pp. 6645–6649. IEEE (2013)
Google Scholar
Lei, X., Senior, A., Gruenstein, A., Sorensen, J.: Accurate and compact large vocabulary speech recognition on mobile devices. In: 14th Annual Conference of the International Speech Communication Association, Lyon, pp. 2365–2369. ISCA (2013)
Google Scholar
Wang, Y., Li, J., Gong, Y.: Small-footprint high-performance deep neural network-based speech recognition using split-VQ. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, pp. 4984–4988. IEEE (2015)
Google Scholar
Mcgraw, I., Prabhavalkar, R., Alvarez, R., Arenas, M.G., Rao, K., Rybach, D., et al.: Personalized speech recognition on mobile devices. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, pp. 5955–5959. IEEE (2016)
Google Scholar
Li, J., Zhao, R., Huang, J.T., Gong, Y.: Learning small-size DNN with output-distribution-based criteria. In: 15th Annual Conference of the International Speech Communication Association, Singapore, pp. 1910–1914. ISCA (2014)
Google Scholar
Chan, W., Ke, N.R., Lane, I.: Transferring knowledge from a RNN to a DNN. In: 15th Annual Conference of the International Speech Communication Association, Dresden, pp. 3264– 3268. ISCA (2015)
Google Scholar
Bucila, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, pp. 535–541. ACM (2006)
Google Scholar
Ba, L.J., Caruana, R.: Do deep nets really need to be deep? Adv. Neural. Inf. Process. Syst. 12(1), 2654–2662 (2014)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. Comput. Sci. 14(7), 35–39 (2015)
Google Scholar
Chen, W., Wilson, J.T., Tyree, S., Weinberger, K.Q., Chen, Y.: Compressing neural networks with the hashing trick. Comput. Sci. 20(2), 2285–2294 (2015)
Google Scholar
Yu, D., Yao, K., Su, H., Li, G., Seide, F.: KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, pp. 7893–7897. IEEE (2013)
Google Scholar
Huang, Y., Yu, D., Liu, C., Gong, A.Y.: Multi-accent deep neural network acoustic model with accent-specific top layer using the KLD-regularized model adaptation. In: 15th Annual Conference of the International Speech Communication Association, Singapore, pp. 2977–2981. ISCA (2014)
Google Scholar
Liu, C., Wang, Y., Kumar, K., Gong, Y.: Investigations on speaker adaptation of LSTM RNN models for speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, pp. 5020–5024. IEEE (2016)
Google Scholar
Chebotar, Y., Waters, A.: Distilling knowledge from ensembles of neural networks for speech recognition. In: 17th Annual Conference of the International Speech Communication Association, San Francisco, pp. 3439–3443. ISCA (2016)
Google Scholar
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: hints for thin deep nets. Comput. Sci. 10(2), 138–143 (2014)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., et al.: The Kaldi speech recognition toolkit. In: Automatic Speech Recognition and Understanding, Hawaï, pp. 4–9. IEEE (2011)
Google Scholar
RASC863: 863 annotated 4 regional accent speech corpus. http://www.chineseldc.org/doc/CLDC-SPC-2004-003/intro.htm. Accessed 7 Nov 2017
Carletta, J.: Announcing the AMI meeting corpus. The ELRA Newsl. 11(1), 3–5 (2012)
Google Scholar

Download references

Acknowledgements

This work is supported by the National High-Tech Research and Development Program of China (863 Program) (No. 2015AA016305), the National Natural Science Foundation of China (NSFC) (No. 61425017, No. 61403386).

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Ya Li & Hao Ni
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100190, China
Jiangyan Yi, Jianhua Tao & Hao Ni
CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Jianhua Tao

Authors

Jiangyan Yi
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Tao
View author publications
You can also search for this author in PubMed Google Scholar
Zhengqi Wen
View author publications
You can also search for this author in PubMed Google Scholar
Ya Li
View author publications
You can also search for this author in PubMed Google Scholar
Hao Ni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiangyan Yi .

Editor information

Editors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Jianhua Tao
Computer Science and Technology, Tsinghua University, Beijing, China
Thomas Fang Zheng
Beijing University of Technology , Beijing, China
Changchun Bao
Tsinghua University , Beijing, China
Dong Wang
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Ya Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yi, J., Tao, J., Wen, Z., Li, Y., Ni, H. (2018). Acoustic Model Compression with Knowledge Transfer. In: Tao, J., Zheng, T., Bao, C., Wang, D., Li, Y. (eds) Man-Machine Speech Communication. NCMMSC 2017. Communications in Computer and Information Science, vol 807. Springer, Singapore. https://doi.org/10.1007/978-981-10-8111-8_9

Download citation

DOI: https://doi.org/10.1007/978-981-10-8111-8_9
Published: 03 February 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8110-1
Online ISBN: 978-981-10-8111-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics