Skip to main content

Deep Neural Network Sequence-Discriminative Training

  • Chapter
  • First Online:
Automatic Speech Recognition

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

The cross-entropy criterion discussed in the previous chapters treats each frame independently. However, speech recognition is a sequence classification problem. In this chapter, we introduce the sequence-discriminative training techniques that match better to the problem. We describe the popular maximum mutual information (MMI), boosted MMI (BMMI), minimum phone error (MPE), and minimum Bayes risk (MBR) training criteria, and discuss the practical techniques, including lattice generation, lattice compensation, frame dropping, frame smoothing, and learning rate adjustment, to make DNN sequence-discriminative training effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bahl, L., Brown, P., De Souza, P., Mercer, R.: Maximum mutual information estimation of hidden markov model parameters for speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 11, pp. 49–52 (1986)

    Google Scholar 

  2. Goel, V., Byrne, W.J.: Minimum Bayes-risk automatic speech recognition. Comput. Speech Lang. 14(2), 115–135 (2000)

    Article  Google Scholar 

  3. Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: International Conference on Artificial Intelligence and Statistics, pp. 297–304 (2010)

    Google Scholar 

  4. Gutmann, M.U., Hyvärinen, A.: Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J. Mach. Learn. Res. 13, 307–361 (2012)

    MATH  MathSciNet  Google Scholar 

  5. Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., Tibshirani, R.: The Elements of Statistical Learning, vol. 2. Springer, Heidelberg (2009)

    Book  MATH  Google Scholar 

  6. Hennebert, J., Ris, C., Bourlard, H., Renals, S., Morgan, N.: Estimation of Global Posteriors and Forward-Backward Training of Hybrid HMM/ANN Systems (1997)

    Google Scholar 

  7. Kapadia, S., Valtchev, V., Young, S.: MMI training for continuous phoneme recognition on the TIMIT database. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 491–494 (1993)

    Google Scholar 

  8. Kingsbury, B.: Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3761–3764 (2009)

    Google Scholar 

  9. Kingsbury, B., Sainath, T.N., Soltau, H.: Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2012)

    Google Scholar 

  10. Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426 (2012)

  11. Mohamed, A.-R., Yu, D., Deng, L.: Investigation of full-sequence training of deep belief networks for speech recognition. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 2846–2849 (2010)

    Google Scholar 

  12. Povey, D.: Discriminative Training for Large Vocabulary Speech Recognition. Ph.D. thesis, Cambridge University Engineering Department (2003)

    Google Scholar 

  13. Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., Visweswariah, K.: Boosted MMI for model and feature-space discriminative training. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4057–4060 (2008)

    Google Scholar 

  14. Povey, D., Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I–105 (2002)

    Google Scholar 

  15. Su, H., Li, G., Yu, D., Seide, F.: Error back propagation for sequence training of context-dependent deep networks for conversational speech transcription. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)

    Google Scholar 

  16. Veselỳ, K., Ghoshal, A., Burget, L., Povey, D.: Sequence-discriminative training of deep neural networks. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2013)

    Google Scholar 

  17. Yu, D., Yao, K., Su, H., Li, G., Seide, F.: Kl-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7893–7897 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong Yu .

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag London

About this chapter

Cite this chapter

Yu, D., Deng, L. (2015). Deep Neural Network Sequence-Discriminative Training. In: Automatic Speech Recognition. Signals and Communication Technology. Springer, London. https://doi.org/10.1007/978-1-4471-5779-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5779-3_8

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5778-6

  • Online ISBN: 978-1-4471-5779-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics