Skip to main content

Research on Mongolian Speech Recognition Based on FSMN

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10619))

Abstract

Deep Neural Network (DNN) model has been achieved a significant result over the Mongolian speech recognition task, however, compared to Chinese, English or the others, there are still opportunities for further enhancements. This paper presents the first application of Feed-forward Sequential Memory Network (FSMN) for Mongolian speech recognition tasks to model long-term dependency in time series without using recurrent feedback. Furthermore, by modeling the speaker in the feature space, we extract the i-vector features and combine them with the Fbank features as the input to validate their effectiveness in Mongolian ASR tasks. Finally, discriminative training was firstly conducted over the FSMN by using maximum mutual information (MMI) and state-level minimum Bayes risk (sMBR), respectively. The experimental results show that: FSMN possesses better performance than DNN in the Mongolian ASR, and by using i-vector features combined with Fbank features as FSMN input and discriminative training, the word error rate (WER) is relatively reduced by 17.9% compared with the DNN baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hinton, G., Deng, L., Dong, Y., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  2. Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 38th ICASSP, pp. 6645–6649. IEEE Press, Vancouver (2013)

    Google Scholar 

  3. Sak, H., Senior, A.W., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: 15th INTERSPEECH, Singapore, pp. 338–342 (2014)

    Google Scholar 

  4. Zhang, S.L., Jiang, H., Wei, S., et al.: Feedforward sequential memory neural networks without recurrent feedback. Comput. Sci. arXiv:1510.02693 (2015)

  5. Zhang, S., Liu, C., Jiang, H., et al.: Feedforward sequential memory networks: a new structure to learn long-term dependency. Comput. Sci. arXiv:1512.08301 (2015)

  6. Gao, G., Biligetu, Nabuqing, Zhang, S.: A Mongolian speech recognition system based on HMM. In: Huang, D.S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCS, vol. 4114, pp. 667–676. Springer, Heidelberg (2015). https://doi.org/10.1007/11816171_84

    Chapter  Google Scholar 

  7. Qilao, H., Gao, G.L.: Researching of speech recognition oriented mongolian acoustic model. In: Chinese Conference on 2nd Pattern Recognition, CCPR 2008, pp. 1–6. IEEE Press, Beijing (2008)

    Google Scholar 

  8. Bao, F., Gao, G.: Improving of acoustic model for the mongolian speech recognition system. In: Chinese Conference on 2nd Pattern Recognition, CCPR 2009, pp. 1–5. IEEE Press, Nanjing (2009)

    Google Scholar 

  9. Bao, F., Gao, G., Yan, X., Wang, W.: Segmentation-based Mongolian LVCSR approach. In: 38th ICASSP 2013, pp. 1–5. IEEE Press, Vancouver (2013)

    Google Scholar 

  10. Zhang, H., Bao, F., Gao, G.: Mongolian speech recognition based on deep neural networks. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds.) CCL 2015. LNCS (LNAI), vol. 9427, pp. 180–188. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25816-4_15

    Chapter  Google Scholar 

  11. Alam, M.J., Gupta, V., Kenny, P., Dumouchel, P.: Use of multiple front-ends and I-vector-based speaker adaptation for robust speech recognition. In: REVERB Workshop. (2014)

    Google Scholar 

  12. Xue, S., Abdel-Hamid, O., Jiang, H., Dai, L.: Fast adaptation of deep neural network based on discriminant codes for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1713–1725 (2014)

    Article  Google Scholar 

  13. Senior, A., Lopez-Moreno, I.: Improving DNN speaker independence with I-vector inputs. In: 39th ICASSP, pp. 225–229. IEEE Press, Florence (2014)

    Google Scholar 

  14. Peddinti, V., Chen, G., Povey, D., Khudanpur, S.: Reverberation robust acoustic modeling using i-vectors with time delay neural networks. In: 16th INTERSPEECH, Dresden, pp. 2440–2444 (2015)

    Google Scholar 

  15. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Silovsky, J.: The Kaldi speech recognition toolkit. In: Workshop on Automatic Speech Recognition and Understanding (No. EPFL-CONF-192584). IEEE Signal Processing Society (2011)

    Google Scholar 

  16. Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: 30th ICML Workshop on Deep Learning for Audio, Speech and Language Processing (2013)

    Google Scholar 

Download references

Acknowledgements

This research was supports in part by the China national natural science foundation (No. 61563040, No. 61773224) and Inner Mongolian nature science foundation (No. 2016ZD06).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feilong Bao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Y., Bao, F., Zhang, H., Gao, G. (2018). Research on Mongolian Speech Recognition Based on FSMN. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73618-1_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73617-4

  • Online ISBN: 978-3-319-73618-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics