Skip to main content

Improving Acoustic Model for Vietnamese Large Vocabulary Continuous Speech Recognition System Using Deep Bottleneck Features

  • Conference paper

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 326))

Abstract

In this paper, a method based on deep learning for extracting bottleneck features for Vietnamese large vocabulary speech recognition is presented. Deep bottleneck features (DBNFs) is able to achieve significant improvements over a number of base bottleneck features which was reported previously. The experiments are carried out on the dataset containing speeches on Voice of Vietnam channel (VOV). The results show that adding tonal feature as input feature of the network reached around 20% relative recognition performance. The DBNF extraction for Vietnamese recognition decrease the error rate by 51%, compared to the MFCC baseline.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vu, T.T., Nguyen, D.T., Luong, M.C., Hosom, J.-P.: Vietnamese large vocabulary continuous speech recognition. In: INTERSPEECH (2005)

    Google Scholar 

  2. Quang, N.H., Nocera, P., Castelli, E., Van Loan, T.: A novel approach in continuous speech recognition for vietnamese. In: SLTU (2008)

    Google Scholar 

  3. Vu, N.T., Schultz, T.: Vietnamese large vocabulary continuous speech recognition. In: Proc. Automatic Speech Recognition and Understanding (ASRU), Merano, Italy. IEEE (December 2009)

    Google Scholar 

  4. Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 257–286 (1989)

    Google Scholar 

  5. Bourlard, H.A., Morgan, N.: Connectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, Norwell (1993)

    Google Scholar 

  6. Hermansky, H., Ellis, D.P.W., Sharma, S.: Tandem connectionist feature extraction for conventional hmm systems. In: Proc. ICASSP, pp. 1635–1638 (2000)

    Google Scholar 

  7. Grezl, F., Karafiat, M., Kontair, S., Cernocky, J.: Probabilistic and bottle-neck features for lvcsr of meetings. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. V–757–IV–760. IEEE (2007)

    Google Scholar 

  8. Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proc. Interspeech 2011, pp. 437–440 (2011)

    Google Scholar 

  9. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets 18, 1527–1554 (2006)

    Google Scholar 

  10. Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010)

    MathSciNet  MATH  Google Scholar 

  11. Gehring, J., Miao, Y., Metze, F., Waibel, A.: Extracting deep bottleneck features using stacked auto-encoders. In: ICASSP 2013, Vancouver, CA, pp. 3377–3381 (2013)

    Google Scholar 

  12. Metze, F., Sheikh, Z.A.W., Waibel, A., Gehring, J., Kilgour, K., Nguyen, Q.B., Nguyen, V.H.: Models of tone for tonal and non-tonal languages. In: ASRU, pp. 261–266. IEEE (2013)

    Google Scholar 

  13. Ghahremani, P., BabaAli, B., Povey, D., Riedhammer, K., Trmal, J., Khudanpur, S.: A pitch extraction algorithm tuned for automatic speech recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE Signal Processing Society (to appear, May 2014)

    Google Scholar 

  14. Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Klein, W.B., Palival, K.K. (eds.) Speech Coding and Synthesis. Elsevier (1995)

    Google Scholar 

  15. Plante, F., Meyer, G.F., Ainsworth, W.A.: A pitch extraction reference database. In: EUROSPEECH. ISCA (1995)

    Google Scholar 

  16. Yu, D., Seltzer, M.L.: Improved bottleneck features using pretrained deep neural networks. In: INTERSPEECH, pp. 237–240 (2011)

    Google Scholar 

  17. Tüske, Z., Schlüter, R., Ney, H.: Deep hierarchical bottleneck mrasta features for IVCSR. In: ICASSP, pp. 6970–6974 (2013)

    Google Scholar 

  18. Sainath, T.N., Kingsbury, B., Ramabhadran, B.: Auto-encoder bottleneck features using deep belief networks. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4153–4156 (2012)

    Google Scholar 

  19. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: ICML 2008, pp. 1096–1103 (2008)

    Google Scholar 

  20. Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: A deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 513–520 (2011)

    Google Scholar 

  21. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, IEEE Catalog No.: CFP11SRW-USB (December 2011)

    Google Scholar 

  22. Rath, S.P., Povey, D., Vesely, K., Cernocky, J.: Improved feature processing for deep neural networks. In: INTERSPEECH, pp. 109–113. ISCA (2013)

    Google Scholar 

  23. Nguyen, V.H., Luong, C.M., Vu, T.T.: Applying bottle neck feature for vietnamese speech recognition, pp. 379–388 (2013)

    Google Scholar 

  24. Nguyen, Q.B., Gehring, J., Kilgour, K.: A Waibel, “Optimizing deep bottleneck feature extraction.” in. In: 2013 IEEE RIVF International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), pp. 152–156 (November 2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quoc Bao Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Nguyen, Q.B., Vu, T.T., Luong, C.M. (2015). Improving Acoustic Model for Vietnamese Large Vocabulary Continuous Speech Recognition System Using Deep Bottleneck Features. In: Nguyen, VH., Le, AC., Huynh, VN. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 326. Springer, Cham. https://doi.org/10.1007/978-3-319-11680-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11680-8_5

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11679-2

  • Online ISBN: 978-3-319-11680-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics