Abstract
In this article, research was conducted on the development of automatic Uzbek speech recognition technology based on integral models. Methods of continuous speech recognition technology in Uzbek were studied at all stages and suitable ones were selected. A 200-hour speech corpus was trained on the DNN-CTC architecture for acoustic modeling. The accuracy of the developed speech recognition system achieved WER = 17.3%, CER = 7.5% on the test data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alhawiti, K.M.: Advances in artificial intelligence using speech recognition. Int. J. Comput. Inf. Eng. 9, 1432–1435 (2015)
Musaev, M., Khujayorov, I., Ochilov, M.: Automatic recognition of Uzbek speech based on integrated neural networks. In: Aliev, R.A., Yusupbekov, N.R., Kacprzyk, J., Pedrycz, W., Sadikoglu, F.M. (eds.) WCIS 2020. AISC, vol. 1323, pp. 215–223. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68004-6_28
Prasad, V.: Voice recognition system: speech-to-text. J. Appl. Fundam. Sci. 1(2), 191 (2015)
Serizel, R., Giuliani, D.: Vocal tract length normalization approaches to DNN-based children’s and adults’ speech recognition. In: IEEE Workshop on Spoken Language Technology, pp. 135–140 (2014)
Kipyatkova, I., Karpov, A.: An analytical survey of large vocabulary Russian speech recognition systems. SPIIRAS Proc. 1(12), 7–20 (2010). https://doi.org/10.15622/sp.12.1
Parada-Cabaleiro, E., Costantini, G., Batliner, A., Schmitt, M., Schuller, B.W.: DEMoS: an Italian emotional speech corpus. Lang. Resour. Eval. 54(2), 341–383 (2019). https://doi.org/10.1007/s10579-019-09450-y
Musaev, M.M., Ochilov, M.M., Khujayarov, I.S.: E2E models of continuous speech recognition with large vocabulary size. TATU Bull. 2(58), 19–40 (2021)
Khujayorov, I., Ochilov, M.: Parallel signal processing based-on graphics processing units. Int. Conf. Inf. Sci. Commun. Technol. 2019, 1–4 (2019). https://doi.org/10.1109/ICISCT47635.2019.9011976
Musaev, M., Khujayorov, I., Ochilov, M.: The use of neural networks to improve the recognition accuracy of explosive and unvoiced phonemes in Uzbek language. Inf. Commun. Technol. Conf. 2020, 231–234 (2020). https://doi.org/10.1109/ICTC49638.2020.9123309
Abdel-Hamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014). https://doi.org/10.1109/TASLP.2014.2339736
Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.: Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:1701.02720 (2017)
Musaev, M., Khujayorov, I., Ochilov, M.: Image approach to speech recognition on CNN. In: Proceedings of the 2019 3rd International Symposium on Computer Science and Intelligent Control (ISCSIC 2019). Association for Computing Machinery, New York, Article 57, pp. 1–6 (2019). https://doi.org/10.1145/3386164.3389100
Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 187–197 (2011)
Kumar, A., Vembu, S., Menon, A.K., Elkan, C.: Beam search algorithms for multilabel learning. Mach. Learn. 92(1), 65–89 (2013). https://doi.org/10.1007/s10994-013-5371-6
Dong, L., Xu, S., Xu, B.: Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5888. IEEE (2018)
Musaev, M., Khujayorov, I., Ochilov, M.: Development of integral model of speech recognition system for Uzbek language. In: 2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–6. IEEE (2020). https://doi.org/10.1109/AICT50176.2020.9368719
Musaev, M., Mussakhojayeva, S., Khujayorov, I., Khassanov, Y., Ochilov, M., Atakan Varol, H.: USC: an open-source Uzbek speech corpus and initial speech recognition experiments. In: Karpov, A., Potapova, R. (eds.) SPECOM 2021. LNCS (LNAI), vol. 12997, pp. 437–447. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87802-3_40
Abdullaeva, M., Khujayorov, I., Ochilov, M.: Formant set as a main parameter for recognizing vowels of the Uzbek language. Int. Conf. Inf. Sci. Commun. Technol. 2021, 1–5 (2021). https://doi.org/10.1109/ICISCT52966.2021.9670268
Khassanov, Y., Mussakhojayeva, S., Mirzakhmetov, A., Adiyev, A., Nurpeiissov, M., Varol, H.A.: A crowdsourced open-source Kazakh speech corpus and initial speech recognition baseline. In: Proc. of the Conference of the European Chapter of the Association for Computational Linguistics, pp. 697–706. Association for Computational Linguistics (2021)
Rakhimov, M., Ochilov, M.: Distribution of operations in heterogeneous computing systems for processing speech signals. In: 2021 IEEE 15th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–4 (2021). https://doi.org/10.1109/AICT52784.2021.9620451
Fazliddinovich, R.M., Abdumurodovich, B.U.: Parallel processing capabilities in the process of speech recognition. Int. Conf. Inf. Sci. Commun. Technol. 2017, 1–3 (2017). https://doi.org/10.1109/ICISCT.2017.8188585
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. Association for Computing Machinery, New York (2006)
Nasimova, N., Muminov, B., Nasimov, R., Abdurashidova, K., Abdullaev, M.: Comparative analysis of the results of algorithms for dilated cardiomyopathy and hypertrophic cardiomyopathy using deep learning. Int. Conf. Inf. Sci. Commun. Technol. 2021, 1–5 (2021). https://doi.org/10.1109/ICISCT52966.2021.9670134
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Musaev, M., Khujayarov, I., Ochilov, M. (2023). Speech Recognition Technologies Based on Artificial Intelligence Algorithms. In: Zaynidinov, H., Singh, M., Tiwary, U.S., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2022. Lecture Notes in Computer Science, vol 13741. Springer, Cham. https://doi.org/10.1007/978-3-031-27199-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-27199-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27198-4
Online ISBN: 978-3-031-27199-1
eBook Packages: Computer ScienceComputer Science (R0)