Speaker Adaptation on Myanmar Spontaneous Speech Recognition

Soe Naing, Hay Mar; Pa, Win Pa

doi:10.1007/978-981-10-8438-6_24

Hay Mar Soe Naing¹¹ &
Win Pa Pa¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 781))

Included in the following conference series:

International Conference of the Pacific Association for Computational Linguistics

825 Accesses

Abstract

This paper introduces the work on automatic speech recognition (ASR) of Myanmar spontaneous speech. The recognizer is based on the Gaussian Mixture and Hidden Markov Model (GMM-HMM). A baseline ASR is developed with 20.5 h of spontaneous speech corpus and refine it with many speaker adaptation methods. In this paper, five kinds of adapted acoustic models were explored; Maximum A Posteriori (MAP), Maximum Mutual Information (MMI), Minimum Phone Error (MPE), Maximum Mutual Information including feature space and model space (fMMI) and Subspace GMM (SGMM). We evaluate these adapted models using spontaneous evaluation set consists of 100 utterances from 61 speakers totally 23 min and 19 s. Experiments on this speech corpus show significant improvement of speaker adaptative training models and SGMM-based acoustic model performs better than other adaptative models. It can significantly reduce 3.16% WER compared with the baseline GMM model. It is also investigated that the Deep Neural Network (DNN) training on the same corpus and evaluated with same evaluation set. With respect to the DNN training, the result reaches up to 31.5% WER.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Liu, G., Lei, Y., Hansen, J.H.: Dialect identification: impact of differences between read versus spontaneous speech. In: 2010 18th European Signal Processing Conference, pp. 2003–2006. IEEE (2010)
Google Scholar
Chen, L.Y., Lee, C.J., Jang, J.S.R.: Minimum phone error discriminative training for Mandarin Chinese speaker adaptation. In: INTERSPEECH, pp. 1241–1244 (2008)
Google Scholar
Hoesen, D., Satriawan, C.H., Lestari, D.P., Khodra, M.L.: Towards robust Indonesian speech recognition with spontaneous-speech adapted acoustic models. Procedia Comput. Sci. 81, 167–173 (2016)
Article Google Scholar
Pirhosseinloo, S., Ganj, F.A.: Discriminative speaker adaptation in Persian continuous speech recognition systems. Procedia Soc. Behav. Sci. 32, 296–301 (2012)
Article Google Scholar
Lestari, D.P., Irfani, A.: Acoustic and language models adaptation for Indonesian spontaneous speech recognition. In: 2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), pp. 1–5. IEEE (2015)
Google Scholar
Saz, O., Vaquero, C., Lleida, E., Marcos, J.M., Canalís, C.: Study of maximum a posteriori speaker adaptation for automatic speech recognition of pathological speech. In: Proceedings of Jornadas en Tecnología del Habla (2006)
Google Scholar
Vertanen, K.: An overview of discriminative training for speech recognition, pp. 1–14. University of Cambridge (2004)
Google Scholar
Hsiao, R., Schultz, T.: Generalized discriminative feature transformation for speech recognition. In: INTERSPEECH, pp. 664–667 (2009)
Google Scholar
Povey, D., Burget, L., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., Glembek, O., Goel, N.K., Karafiát, M., Rastrow, A., et al.: Subspace Gaussian mixture models for speech recognition. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4330–4333. IEEE (2010)
Google Scholar
Ghalehjegh, S.H.: New Paradigms for Modeling Acoustic Variation in Speech Processing. Ph.D. thesis, McGill University (2016)
Google Scholar
Hsu, B.J.P., Glass, J.R.: Iterative language model estimation: efficient data structure & algorithms. In: INTERSPEECH, pp. 841–844 (2008)
Google Scholar
Naing, H.M.S., Hlaing, A.M., Pa, W.P., Hu, X., Thu, Y.K., Hori, C., Kawai, H.: A Myanmar large vocabulary continuous speech recognition system. In: 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 320–327. IEEE (2015)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, No. EPFL-CONF-192584. IEEE Signal Processing Society (2011)
Google Scholar

Download references

Acknowledgements

We would like to thank all members of Spoken Language Communication Lab., from National Institute of Information and Communications Technology (NICT), Kyoto, Japan, for the utilization of Myanmar spontaneous speech corpus presented in this paper.

Author information

Authors and Affiliations

Natural Language Processing Laboratory, UCSY, Yangon, Myanmar
Hay Mar Soe Naing & Win Pa Pa

Authors

Hay Mar Soe Naing
View author publications
You can also search for this author in PubMed Google Scholar
Win Pa Pa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hay Mar Soe Naing .

Editor information

Editors and Affiliations

Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
Kôiti Hasida
Natural Language Processing Lab, University of Computer Studies, Yangon, Yangon, Myanmar
Win Pa Pa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Soe Naing, H.M., Pa, W.P. (2018). Speaker Adaptation on Myanmar Spontaneous Speech Recognition. In: Hasida, K., Pa, W. (eds) Computational Linguistics. PACLING 2017. Communications in Computer and Information Science, vol 781. Springer, Singapore. https://doi.org/10.1007/978-981-10-8438-6_24

Download citation

DOI: https://doi.org/10.1007/978-981-10-8438-6_24
Published: 04 March 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8437-9
Online ISBN: 978-981-10-8438-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics