Skip to main content

Acoustic Model Compression with Knowledge Transfer

  • Conference paper
  • First Online:
Man-Machine Speech Communication (NCMMSC 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 807))

Included in the following conference series:

  • 358 Accesses

Abstract

Mobile devices have limited computing power and limited memory. Thus, large deep neural network (DNN) based acoustic models are not well suited for application on mobile devices. In order to alleviate this problem, this paper proposes to compress acoustic models by using knowledge transfer. This approach forces a large teacher model to transfer generalized knowledge to a small student model. The student model is trained with a linear interpolation of hard probabilities and soft probabilities to learn generalized knowledge from the teacher model. The hard probabilities are generated from a Gaussian mixture model hidden Markov model (GMM-HMM) system. The soft probabilities are computed from a teacher model (DNN or RNN). Experiments on AMI corpus show that a small student model obtains 2.4% relative WER improvement over a large teacher model with almost 7.6 times compression ratio.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dahl, G.E., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)

    Article  Google Scholar 

  2. Deng, L., Li, J., Huang, J.T., Yao, K., Yu, D., Seide, F., et al.: Recent advances in deep learning for speech research at microsoft. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, pp. 8604–8608. IEEE (2013)

    Google Scholar 

  3. Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, pp. 6645–6649. IEEE (2013)

    Google Scholar 

  4. Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: Automatic Speech Recognition and Understanding, Olomouc, pp. 273–278. IEEE (2013)

    Google Scholar 

  5. Weng, C., Yu, D., Watanabe, S., Juang, B.H.F.: Recurrent deep neural networks for robust speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, pp. 5532–5536. IEEE (2014)

    Google Scholar 

  6. Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. Comput. Sci. 20(1), 338–342 (2014)

    Google Scholar 

  7. Sainath, T.N., Kingsbury, B., Sindhwani, V., Arisoy, E., Ramabhadran, B.: Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, pp. 6655–6659. IEEE (2013)

    Google Scholar 

  8. Lu, Z., Sindhwani, V., Sainath, T.N.: Learning compact recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, pp. 5960–5964. IEEE (2016)

    Google Scholar 

  9. Xue, J., Li, J., Gong, Y.: Restructuring of deep neural network acoustic models with singular value decomposition. In: 14th Annual Conference of the International Speech Communication Association, Lyon, pp. 662–665. ISCA (2013)

    Google Scholar 

  10. Prabhavalkar, R., Alsharif, O., Bruguier, A., Mcgraw, I.: On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, pp. 5970–5974. IEEE (2016)

    Google Scholar 

  11. Vanhoucke, V., Devin, M., Heigold, G.: Multiframe deep neural networks for acoustic modeling. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, pp. 6645–6649. IEEE (2013)

    Google Scholar 

  12. Lei, X., Senior, A., Gruenstein, A., Sorensen, J.: Accurate and compact large vocabulary speech recognition on mobile devices. In: 14th Annual Conference of the International Speech Communication Association, Lyon, pp. 2365–2369. ISCA (2013)

    Google Scholar 

  13. Wang, Y., Li, J., Gong, Y.: Small-footprint high-performance deep neural network-based speech recognition using split-VQ. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, pp. 4984–4988. IEEE (2015)

    Google Scholar 

  14. Mcgraw, I., Prabhavalkar, R., Alvarez, R., Arenas, M.G., Rao, K., Rybach, D., et al.: Personalized speech recognition on mobile devices. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, pp. 5955–5959. IEEE (2016)

    Google Scholar 

  15. Li, J., Zhao, R., Huang, J.T., Gong, Y.: Learning small-size DNN with output-distribution-based criteria. In: 15th Annual Conference of the International Speech Communication Association, Singapore, pp. 1910–1914. ISCA (2014)

    Google Scholar 

  16. Chan, W., Ke, N.R., Lane, I.: Transferring knowledge from a RNN to a DNN. In: 15th Annual Conference of the International Speech Communication Association, Dresden, pp. 3264– 3268. ISCA (2015)

    Google Scholar 

  17. Bucila, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, pp. 535–541. ACM (2006)

    Google Scholar 

  18. Ba, L.J., Caruana, R.: Do deep nets really need to be deep? Adv. Neural. Inf. Process. Syst. 12(1), 2654–2662 (2014)

    Google Scholar 

  19. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. Comput. Sci. 14(7), 35–39 (2015)

    Google Scholar 

  20. Chen, W., Wilson, J.T., Tyree, S., Weinberger, K.Q., Chen, Y.: Compressing neural networks with the hashing trick. Comput. Sci. 20(2), 2285–2294 (2015)

    Google Scholar 

  21. Yu, D., Yao, K., Su, H., Li, G., Seide, F.: KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, pp. 7893–7897. IEEE (2013)

    Google Scholar 

  22. Huang, Y., Yu, D., Liu, C., Gong, A.Y.: Multi-accent deep neural network acoustic model with accent-specific top layer using the KLD-regularized model adaptation. In: 15th Annual Conference of the International Speech Communication Association, Singapore, pp. 2977–2981. ISCA (2014)

    Google Scholar 

  23. Liu, C., Wang, Y., Kumar, K., Gong, Y.: Investigations on speaker adaptation of LSTM RNN models for speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, pp. 5020–5024. IEEE (2016)

    Google Scholar 

  24. Chebotar, Y., Waters, A.: Distilling knowledge from ensembles of neural networks for speech recognition. In: 17th Annual Conference of the International Speech Communication Association, San Francisco, pp. 3439–3443. ISCA (2016)

    Google Scholar 

  25. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: hints for thin deep nets. Comput. Sci. 10(2), 138–143 (2014)

    Google Scholar 

  26. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., et al.: The Kaldi speech recognition toolkit. In: Automatic Speech Recognition and Understanding, Hawaï, pp. 4–9. IEEE (2011)

    Google Scholar 

  27. RASC863: 863 annotated 4 regional accent speech corpus. http://www.chineseldc.org/doc/CLDC-SPC-2004-003/intro.htm. Accessed 7 Nov 2017

  28. Carletta, J.: Announcing the AMI meeting corpus. The ELRA Newsl. 11(1), 3–5 (2012)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National High-Tech Research and Development Program of China (863 Program) (No. 2015AA016305), the National Natural Science Foundation of China (NSFC) (No. 61425017, No. 61403386).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiangyan Yi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yi, J., Tao, J., Wen, Z., Li, Y., Ni, H. (2018). Acoustic Model Compression with Knowledge Transfer. In: Tao, J., Zheng, T., Bao, C., Wang, D., Li, Y. (eds) Man-Machine Speech Communication. NCMMSC 2017. Communications in Computer and Information Science, vol 807. Springer, Singapore. https://doi.org/10.1007/978-981-10-8111-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8111-8_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8110-1

  • Online ISBN: 978-981-10-8111-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics