Skip to main content

Speech Recognition Technologies Based on Artificial Intelligence Algorithms

  • Conference paper
  • First Online:
Intelligent Human Computer Interaction (IHCI 2022)

Abstract

In this article, research was conducted on the development of automatic Uzbek speech recognition technology based on integral models. Methods of continuous speech recognition technology in Uzbek were studied at all stages and suitable ones were selected. A 200-hour speech corpus was trained on the DNN-CTC architecture for acoustic modeling. The accuracy of the developed speech recognition system achieved WER = 17.3%, CER = 7.5% on the test data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alhawiti, K.M.: Advances in artificial intelligence using speech recognition. Int. J. Comput. Inf. Eng. 9, 1432–1435 (2015)

    Google Scholar 

  2. Musaev, M., Khujayorov, I., Ochilov, M.: Automatic recognition of Uzbek speech based on integrated neural networks. In: Aliev, R.A., Yusupbekov, N.R., Kacprzyk, J., Pedrycz, W., Sadikoglu, F.M. (eds.) WCIS 2020. AISC, vol. 1323, pp. 215–223. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68004-6_28

    Chapter  Google Scholar 

  3. Prasad, V.: Voice recognition system: speech-to-text. J. Appl. Fundam. Sci. 1(2), 191 (2015)

    Google Scholar 

  4. Serizel, R., Giuliani, D.: Vocal tract length normalization approaches to DNN-based children’s and adults’ speech recognition. In: IEEE Workshop on Spoken Language Technology, pp. 135–140 (2014)

    Google Scholar 

  5. Kipyatkova, I., Karpov, A.: An analytical survey of large vocabulary Russian speech recognition systems. SPIIRAS Proc. 1(12), 7–20 (2010). https://doi.org/10.15622/sp.12.1

    Article  Google Scholar 

  6. Parada-Cabaleiro, E., Costantini, G., Batliner, A., Schmitt, M., Schuller, B.W.: DEMoS: an Italian emotional speech corpus. Lang. Resour. Eval. 54(2), 341–383 (2019). https://doi.org/10.1007/s10579-019-09450-y

    Article  Google Scholar 

  7. Musaev, M.M., Ochilov, M.M., Khujayarov, I.S.: E2E models of continuous speech recognition with large vocabulary size. TATU Bull. 2(58), 19–40 (2021)

    Google Scholar 

  8. Khujayorov, I., Ochilov, M.: Parallel signal processing based-on graphics processing units. Int. Conf. Inf. Sci. Commun. Technol. 2019, 1–4 (2019). https://doi.org/10.1109/ICISCT47635.2019.9011976

    Article  Google Scholar 

  9. Musaev, M., Khujayorov, I., Ochilov, M.: The use of neural networks to improve the recognition accuracy of explosive and unvoiced phonemes in Uzbek language. Inf. Commun. Technol. Conf. 2020, 231–234 (2020). https://doi.org/10.1109/ICTC49638.2020.9123309

    Article  Google Scholar 

  10. Abdel-Hamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014). https://doi.org/10.1109/TASLP.2014.2339736

    Article  Google Scholar 

  11. Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.: Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:1701.02720 (2017)

  12. Musaev, M., Khujayorov, I., Ochilov, M.: Image approach to speech recognition on CNN. In: Proceedings of the 2019 3rd International Symposium on Computer Science and Intelligent Control (ISCSIC 2019). Association for Computing Machinery, New York, Article 57, pp. 1–6 (2019). https://doi.org/10.1145/3386164.3389100

  13. Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 187–197 (2011)

    Google Scholar 

  14. Kumar, A., Vembu, S., Menon, A.K., Elkan, C.: Beam search algorithms for multilabel learning. Mach. Learn. 92(1), 65–89 (2013). https://doi.org/10.1007/s10994-013-5371-6

    Article  MathSciNet  MATH  Google Scholar 

  15. Dong, L., Xu, S., Xu, B.: Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5888. IEEE (2018)

    Google Scholar 

  16. Musaev, M., Khujayorov, I., Ochilov, M.: Development of integral model of speech recognition system for Uzbek language. In: 2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–6. IEEE (2020). https://doi.org/10.1109/AICT50176.2020.9368719

  17. Musaev, M., Mussakhojayeva, S., Khujayorov, I., Khassanov, Y., Ochilov, M., Atakan Varol, H.: USC: an open-source Uzbek speech corpus and initial speech recognition experiments. In: Karpov, A., Potapova, R. (eds.) SPECOM 2021. LNCS (LNAI), vol. 12997, pp. 437–447. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87802-3_40

    Chapter  Google Scholar 

  18. Abdullaeva, M., Khujayorov, I., Ochilov, M.: Formant set as a main parameter for recognizing vowels of the Uzbek language. Int. Conf. Inf. Sci. Commun. Technol. 2021, 1–5 (2021). https://doi.org/10.1109/ICISCT52966.2021.9670268

    Article  Google Scholar 

  19. Khassanov, Y., Mussakhojayeva, S., Mirzakhmetov, A., Adiyev, A., Nurpeiissov, M., Varol, H.A.: A crowdsourced open-source Kazakh speech corpus and initial speech recognition baseline. In: Proc. of the Conference of the European Chapter of the Association for Computational Linguistics, pp. 697–706. Association for Computational Linguistics (2021)

    Google Scholar 

  20. Rakhimov, M., Ochilov, M.: Distribution of operations in heterogeneous computing systems for processing speech signals. In: 2021 IEEE 15th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–4 (2021). https://doi.org/10.1109/AICT52784.2021.9620451

  21. Fazliddinovich, R.M., Abdumurodovich, B.U.: Parallel processing capabilities in the process of speech recognition. Int. Conf. Inf. Sci. Commun. Technol. 2017, 1–3 (2017). https://doi.org/10.1109/ICISCT.2017.8188585

    Article  Google Scholar 

  22. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. Association for Computing Machinery, New York (2006)

    Google Scholar 

  23. Nasimova, N., Muminov, B., Nasimov, R., Abdurashidova, K., Abdullaev, M.: Comparative analysis of the results of algorithms for dilated cardiomyopathy and hypertrophic cardiomyopathy using deep learning. Int. Conf. Inf. Sci. Commun. Technol. 2021, 1–5 (2021). https://doi.org/10.1109/ICISCT52966.2021.9670134

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mannon Ochilov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Musaev, M., Khujayarov, I., Ochilov, M. (2023). Speech Recognition Technologies Based on Artificial Intelligence Algorithms. In: Zaynidinov, H., Singh, M., Tiwary, U.S., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2022. Lecture Notes in Computer Science, vol 13741. Springer, Cham. https://doi.org/10.1007/978-3-031-27199-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-27199-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-27198-4

  • Online ISBN: 978-3-031-27199-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics