Improving Acoustic Model for Vietnamese Large Vocabulary Continuous Speech Recognition System Using Deep Bottleneck Features

Nguyen, Quoc Bao; Vu, Tat Thang; Luong, Chi Mai

doi:10.1007/978-3-319-11680-8_5

Improving Acoustic Model for Vietnamese Large Vocabulary Continuous Speech Recognition System Using Deep Bottleneck Features

Quoc Bao Nguyen⁵,
Tat Thang Vu⁶ &
Chi Mai Luong⁵

Conference paper

1774 Accesses
1 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 326))

Abstract

In this paper, a method based on deep learning for extracting bottleneck features for Vietnamese large vocabulary speech recognition is presented. Deep bottleneck features (DBNFs) is able to achieve significant improvements over a number of base bottleneck features which was reported previously. The experiments are carried out on the dataset containing speeches on Voice of Vietnam channel (VOV). The results show that adding tonal feature as input feature of the network reached around 20% relative recognition performance. The DBNF extraction for Vietnamese recognition decrease the error rate by 51%, compared to the MFCC baseline.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Vu, T.T., Nguyen, D.T., Luong, M.C., Hosom, J.-P.: Vietnamese large vocabulary continuous speech recognition. In: INTERSPEECH (2005)
Google Scholar
Quang, N.H., Nocera, P., Castelli, E., Van Loan, T.: A novel approach in continuous speech recognition for vietnamese. In: SLTU (2008)
Google Scholar
Vu, N.T., Schultz, T.: Vietnamese large vocabulary continuous speech recognition. In: Proc. Automatic Speech Recognition and Understanding (ASRU), Merano, Italy. IEEE (December 2009)
Google Scholar
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 257–286 (1989)
Google Scholar
Bourlard, H.A., Morgan, N.: Connectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, Norwell (1993)
Google Scholar
Hermansky, H., Ellis, D.P.W., Sharma, S.: Tandem connectionist feature extraction for conventional hmm systems. In: Proc. ICASSP, pp. 1635–1638 (2000)
Google Scholar
Grezl, F., Karafiat, M., Kontair, S., Cernocky, J.: Probabilistic and bottle-neck features for lvcsr of meetings. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. V–757–IV–760. IEEE (2007)
Google Scholar
Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proc. Interspeech 2011, pp. 437–440 (2011)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets 18, 1527–1554 (2006)
Google Scholar
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010)
MathSciNet MATH Google Scholar
Gehring, J., Miao, Y., Metze, F., Waibel, A.: Extracting deep bottleneck features using stacked auto-encoders. In: ICASSP 2013, Vancouver, CA, pp. 3377–3381 (2013)
Google Scholar
Metze, F., Sheikh, Z.A.W., Waibel, A., Gehring, J., Kilgour, K., Nguyen, Q.B., Nguyen, V.H.: Models of tone for tonal and non-tonal languages. In: ASRU, pp. 261–266. IEEE (2013)
Google Scholar
Ghahremani, P., BabaAli, B., Povey, D., Riedhammer, K., Trmal, J., Khudanpur, S.: A pitch extraction algorithm tuned for automatic speech recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE Signal Processing Society (to appear, May 2014)
Google Scholar
Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Klein, W.B., Palival, K.K. (eds.) Speech Coding and Synthesis. Elsevier (1995)
Google Scholar
Plante, F., Meyer, G.F., Ainsworth, W.A.: A pitch extraction reference database. In: EUROSPEECH. ISCA (1995)
Google Scholar
Yu, D., Seltzer, M.L.: Improved bottleneck features using pretrained deep neural networks. In: INTERSPEECH, pp. 237–240 (2011)
Google Scholar
Tüske, Z., Schlüter, R., Ney, H.: Deep hierarchical bottleneck mrasta features for IVCSR. In: ICASSP, pp. 6970–6974 (2013)
Google Scholar
Sainath, T.N., Kingsbury, B., Ramabhadran, B.: Auto-encoder bottleneck features using deep belief networks. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4153–4156 (2012)
Google Scholar
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: ICML 2008, pp. 1096–1103 (2008)
Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: A deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 513–520 (2011)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, IEEE Catalog No.: CFP11SRW-USB (December 2011)
Google Scholar
Rath, S.P., Povey, D., Vesely, K., Cernocky, J.: Improved feature processing for deep neural networks. In: INTERSPEECH, pp. 109–113. ISCA (2013)
Google Scholar
Nguyen, V.H., Luong, C.M., Vu, T.T.: Applying bottle neck feature for vietnamese speech recognition, pp. 379–388 (2013)
Google Scholar
Nguyen, Q.B., Gehring, J., Kilgour, K.: A Waibel, “Optimizing deep bottleneck feature extraction.” in. In: 2013 IEEE RIVF International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), pp. 152–156 (November 2013)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Information and Communication Technology, Thai Nguyen University, Thai Nguyen, Vietnam
Quoc Bao Nguyen & Chi Mai Luong
Institute of Information Technology, Vietnam Academy of Science and Technology, Hanoi, Vietnam
Tat Thang Vu

Authors

Quoc Bao Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Tat Thang Vu
View author publications
You can also search for this author in PubMed Google Scholar
Chi Mai Luong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quoc Bao Nguyen .

Editor information

Editors and Affiliations

Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam
Viet-Ha Nguyen
Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam
Anh-Cuong Le
School of Knowledge Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Van-Nam Huynh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, Q.B., Vu, T.T., Luong, C.M. (2015). Improving Acoustic Model for Vietnamese Large Vocabulary Continuous Speech Recognition System Using Deep Bottleneck Features. In: Nguyen, VH., Le, AC., Huynh, VN. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 326. Springer, Cham. https://doi.org/10.1007/978-3-319-11680-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-11680-8_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11679-2
Online ISBN: 978-3-319-11680-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics