Research on Mongolian Speech Recognition Based on FSMN

Wang, Yonghe; Bao, Feilong; Zhang, Hongwei; Gao, Guanglai

doi:10.1007/978-3-319-73618-1_21

Yonghe Wang¹⁸,
Feilong Bao¹⁸,
Hongwei Zhang¹⁸ &
…
Guanglai Gao¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10619))

Included in the following conference series:

National CCF Conference on Natural Language Processing and Chinese Computing

3243 Accesses
3 Citations
3 Altmetric

Abstract

Deep Neural Network (DNN) model has been achieved a significant result over the Mongolian speech recognition task, however, compared to Chinese, English or the others, there are still opportunities for further enhancements. This paper presents the first application of Feed-forward Sequential Memory Network (FSMN) for Mongolian speech recognition tasks to model long-term dependency in time series without using recurrent feedback. Furthermore, by modeling the speaker in the feature space, we extract the i-vector features and combine them with the Fbank features as the input to validate their effectiveness in Mongolian ASR tasks. Finally, discriminative training was firstly conducted over the FSMN by using maximum mutual information (MMI) and state-level minimum Bayes risk (sMBR), respectively. The experimental results show that: FSMN possesses better performance than DNN in the Mongolian ASR, and by using i-vector features combined with Fbank features as FSMN input and discriminative training, the word error rate (WER) is relatively reduced by 17.9% compared with the DNN baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hinton, G., Deng, L., Dong, Y., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 38th ICASSP, pp. 6645–6649. IEEE Press, Vancouver (2013)
Google Scholar
Sak, H., Senior, A.W., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: 15th INTERSPEECH, Singapore, pp. 338–342 (2014)
Google Scholar
Zhang, S.L., Jiang, H., Wei, S., et al.: Feedforward sequential memory neural networks without recurrent feedback. Comput. Sci. arXiv:1510.02693 (2015)
Zhang, S., Liu, C., Jiang, H., et al.: Feedforward sequential memory networks: a new structure to learn long-term dependency. Comput. Sci. arXiv:1512.08301 (2015)
Gao, G., Biligetu, Nabuqing, Zhang, S.: A Mongolian speech recognition system based on HMM. In: Huang, D.S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCS, vol. 4114, pp. 667–676. Springer, Heidelberg (2015). https://doi.org/10.1007/11816171_84
Chapter Google Scholar
Qilao, H., Gao, G.L.: Researching of speech recognition oriented mongolian acoustic model. In: Chinese Conference on 2nd Pattern Recognition, CCPR 2008, pp. 1–6. IEEE Press, Beijing (2008)
Google Scholar
Bao, F., Gao, G.: Improving of acoustic model for the mongolian speech recognition system. In: Chinese Conference on 2nd Pattern Recognition, CCPR 2009, pp. 1–5. IEEE Press, Nanjing (2009)
Google Scholar
Bao, F., Gao, G., Yan, X., Wang, W.: Segmentation-based Mongolian LVCSR approach. In: 38th ICASSP 2013, pp. 1–5. IEEE Press, Vancouver (2013)
Google Scholar
Zhang, H., Bao, F., Gao, G.: Mongolian speech recognition based on deep neural networks. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds.) CCL 2015. LNCS (LNAI), vol. 9427, pp. 180–188. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25816-4_15
Chapter Google Scholar
Alam, M.J., Gupta, V., Kenny, P., Dumouchel, P.: Use of multiple front-ends and I-vector-based speaker adaptation for robust speech recognition. In: REVERB Workshop. (2014)
Google Scholar
Xue, S., Abdel-Hamid, O., Jiang, H., Dai, L.: Fast adaptation of deep neural network based on discriminant codes for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1713–1725 (2014)
Article Google Scholar
Senior, A., Lopez-Moreno, I.: Improving DNN speaker independence with I-vector inputs. In: 39th ICASSP, pp. 225–229. IEEE Press, Florence (2014)
Google Scholar
Peddinti, V., Chen, G., Povey, D., Khudanpur, S.: Reverberation robust acoustic modeling using i-vectors with time delay neural networks. In: 16th INTERSPEECH, Dresden, pp. 2440–2444 (2015)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Silovsky, J.: The Kaldi speech recognition toolkit. In: Workshop on Automatic Speech Recognition and Understanding (No. EPFL-CONF-192584). IEEE Signal Processing Society (2011)
Google Scholar
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: 30th ICML Workshop on Deep Learning for Audio, Speech and Language Processing (2013)
Google Scholar

Download references

Acknowledgements

This research was supports in part by the China national natural science foundation (No. 61563040, No. 61773224) and Inner Mongolian nature science foundation (No. 2016ZD06).

Author information

Authors and Affiliations

College of Computer Science, Inner Mongolia University, Huhhot, 010021, China
Yonghe Wang, Feilong Bao, Hongwei Zhang & Guanglai Gao

Authors

Yonghe Wang
View author publications
You can also search for this author in PubMed Google Scholar
Feilong Bao
View author publications
You can also search for this author in PubMed Google Scholar
Hongwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Guanglai Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feilong Bao .

Editor information

Editors and Affiliations

Fudan University, Shanghai, China
Xuanjing Huang
Singapore Management University, Singapore, Singapore
Jing Jiang
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Yansong Feng
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Bao, F., Zhang, H., Gao, G. (2018). Research on Mongolian Speech Recognition Based on FSMN. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-73618-1_21
Published: 05 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73617-4
Online ISBN: 978-3-319-73618-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics