Advertisement

Automatic Recognition of Kazakh Speech Using Deep Neural Networks

  • Orken MamyrbayevEmail author
  • Mussa TurdalyulyEmail author
  • Nurbapa Mekebayev
  • Keylan Alimhan
  • Aizat Kydyrbekova
  • Tolganay Turdalykyzy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11432)

Abstract

This article presents a deep neural network (DNN) system based on automatic speech recognition for Kazakh language, developed using the Kaldi speech recognition tool. DNNs are initialized using the restricted Boltzmann machines (RBM) and are trained using cross-entropy as the objective function and the standard back propagation of error. In order to achieve optimal results, the training has been modified based on peculiarities of Kazakh language. A 76 hours-corpus has been used in training. Results are compared for two different sets of values between classical models and various DNN settings.

Keywords

DNN ASR Kazakh speech recognition LM 

Notes

Acknowledgements

This work was supported by the Ministry of Education and Science of the Republic of Kazakhstan. IRN AP05131207 Development of technologies for multilingual automatic speech recognition using deep neural networks.

References

  1. 1.
    Stouten, F., Duchateau, J., Martens, J.-P., Wambacq, P.: Coping with disfluencies in spontaneous speech recognition: acoustic detection and linguistic context manipulation. Speech Commun. 48, 1590–1606 (2006)CrossRefGoogle Scholar
  2. 2.
    Tsiaras, V., Panagiotakis, C., Stylianou, Y.: Video and audio based detection of filled hesitation pauses in classroom lectures. In: Proceedings of the 17th European Signal Processing Conference (EUSIPCO 2009), Glasgow, Scotland, 24–28 August 2009, pp. 834–838 (2009)Google Scholar
  3. 3.
    Psutka, J., Ircing, P., Psutka, J.V., Hajič, J., Byrne, W.J., Mirovsky, J.: Automatic transcription of Czech, Russian, and Slovak spontaneous speech in the MALACH project. In: Proceedings of Eurospeech, Portugal, Lisboa, 4–8 September 2005, pp. 1349–1352 (2005)Google Scholar
  4. 4.
    Young, S., et al.: The HTK Book (for HTK Version 3.4), Cambridge, UK, 375 p. (2009)Google Scholar
  5. 5.
    Karpov, A., Kipyatkova, I., Ronzhin, A.: Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis. In: Proceedings INTERSPEECH-2011, Florence, Italy, pp. 3161–3164 (2011)Google Scholar
  6. 6.
    Serizel, R., Giuliani, D.: Vocal tract length normalization approaches to DNN-Based children’s and adults’ speech recognition. In: IEEE Workshop on Spoken Language Technology, pp. 135–140 (2014)Google Scholar
  7. 7.
    Behbahani, Y.M., Babaali, B., Turdalyuly, M.: Persian sentences to phoneme sequences conversion based on recurrent neural networks. Open Comput. Sci. 6, 219–225 (2016)CrossRefGoogle Scholar
  8. 8.
    Yu, D., Deng, L.: Automatic Speech Recognition, p. 315. Springer, London (2014).  https://doi.org/10.1007/978-1-4471-5779-3CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Institute of Information and Computational TechnologyAlmatyKazakhstan
  2. 2.al-Farabi Kazakh National UniversityAlmatyKazakhstan

Personalised recommendations