Speech Recognition Technologies Based on Artificial Intelligence Algorithms

Musaev, Muhammadjon; Khujayarov, Ilyos; Ochilov, Mannon

doi:10.1007/978-3-031-27199-1_6

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13741))

Included in the following conference series:

International Conference on Intelligent Human Computer Interaction

Abstract

In this article, research was conducted on the development of automatic Uzbek speech recognition technology based on integral models. Methods of continuous speech recognition technology in Uzbek were studied at all stages and suitable ones were selected. A 200-hour speech corpus was trained on the DNN-CTC architecture for acoustic modeling. The accuracy of the developed speech recognition system achieved WER = 17.3%, CER = 7.5% on the test data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Speech Recognition Using Artificial Neural Network

Automatic Speech Recognition Based on Neural Networks

Performance Optimization of Speech Recognition System with Deep Neural Network Model

Article 01 October 2018

References

Alhawiti, K.M.: Advances in artificial intelligence using speech recognition. Int. J. Comput. Inf. Eng. 9, 1432–1435 (2015)
Google Scholar
Musaev, M., Khujayorov, I., Ochilov, M.: Automatic recognition of Uzbek speech based on integrated neural networks. In: Aliev, R.A., Yusupbekov, N.R., Kacprzyk, J., Pedrycz, W., Sadikoglu, F.M. (eds.) WCIS 2020. AISC, vol. 1323, pp. 215–223. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68004-6_28
Chapter Google Scholar
Prasad, V.: Voice recognition system: speech-to-text. J. Appl. Fundam. Sci. 1(2), 191 (2015)
Google Scholar
Serizel, R., Giuliani, D.: Vocal tract length normalization approaches to DNN-based children’s and adults’ speech recognition. In: IEEE Workshop on Spoken Language Technology, pp. 135–140 (2014)
Google Scholar
Kipyatkova, I., Karpov, A.: An analytical survey of large vocabulary Russian speech recognition systems. SPIIRAS Proc. 1(12), 7–20 (2010). https://doi.org/10.15622/sp.12.1
Article Google Scholar
Parada-Cabaleiro, E., Costantini, G., Batliner, A., Schmitt, M., Schuller, B.W.: DEMoS: an Italian emotional speech corpus. Lang. Resour. Eval. 54(2), 341–383 (2019). https://doi.org/10.1007/s10579-019-09450-y
Article Google Scholar
Musaev, M.M., Ochilov, M.M., Khujayarov, I.S.: E2E models of continuous speech recognition with large vocabulary size. TATU Bull. 2(58), 19–40 (2021)
Google Scholar
Khujayorov, I., Ochilov, M.: Parallel signal processing based-on graphics processing units. Int. Conf. Inf. Sci. Commun. Technol. 2019, 1–4 (2019). https://doi.org/10.1109/ICISCT47635.2019.9011976
Article Google Scholar
Musaev, M., Khujayorov, I., Ochilov, M.: The use of neural networks to improve the recognition accuracy of explosive and unvoiced phonemes in Uzbek language. Inf. Commun. Technol. Conf. 2020, 231–234 (2020). https://doi.org/10.1109/ICTC49638.2020.9123309
Article Google Scholar
Abdel-Hamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014). https://doi.org/10.1109/TASLP.2014.2339736
Article Google Scholar
Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.: Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:1701.02720 (2017)
Musaev, M., Khujayorov, I., Ochilov, M.: Image approach to speech recognition on CNN. In: Proceedings of the 2019 3rd International Symposium on Computer Science and Intelligent Control (ISCSIC 2019). Association for Computing Machinery, New York, Article 57, pp. 1–6 (2019). https://doi.org/10.1145/3386164.3389100
Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 187–197 (2011)
Google Scholar
Kumar, A., Vembu, S., Menon, A.K., Elkan, C.: Beam search algorithms for multilabel learning. Mach. Learn. 92(1), 65–89 (2013). https://doi.org/10.1007/s10994-013-5371-6
Article MathSciNet MATH Google Scholar
Dong, L., Xu, S., Xu, B.: Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5888. IEEE (2018)
Google Scholar
Musaev, M., Khujayorov, I., Ochilov, M.: Development of integral model of speech recognition system for Uzbek language. In: 2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–6. IEEE (2020). https://doi.org/10.1109/AICT50176.2020.9368719
Musaev, M., Mussakhojayeva, S., Khujayorov, I., Khassanov, Y., Ochilov, M., Atakan Varol, H.: USC: an open-source Uzbek speech corpus and initial speech recognition experiments. In: Karpov, A., Potapova, R. (eds.) SPECOM 2021. LNCS (LNAI), vol. 12997, pp. 437–447. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87802-3_40
Chapter Google Scholar
Abdullaeva, M., Khujayorov, I., Ochilov, M.: Formant set as a main parameter for recognizing vowels of the Uzbek language. Int. Conf. Inf. Sci. Commun. Technol. 2021, 1–5 (2021). https://doi.org/10.1109/ICISCT52966.2021.9670268
Article Google Scholar
Khassanov, Y., Mussakhojayeva, S., Mirzakhmetov, A., Adiyev, A., Nurpeiissov, M., Varol, H.A.: A crowdsourced open-source Kazakh speech corpus and initial speech recognition baseline. In: Proc. of the Conference of the European Chapter of the Association for Computational Linguistics, pp. 697–706. Association for Computational Linguistics (2021)
Google Scholar
Rakhimov, M., Ochilov, M.: Distribution of operations in heterogeneous computing systems for processing speech signals. In: 2021 IEEE 15th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–4 (2021). https://doi.org/10.1109/AICT52784.2021.9620451
Fazliddinovich, R.M., Abdumurodovich, B.U.: Parallel processing capabilities in the process of speech recognition. Int. Conf. Inf. Sci. Commun. Technol. 2017, 1–3 (2017). https://doi.org/10.1109/ICISCT.2017.8188585
Article Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. Association for Computing Machinery, New York (2006)
Google Scholar
Nasimova, N., Muminov, B., Nasimov, R., Abdurashidova, K., Abdullaev, M.: Comparative analysis of the results of algorithms for dilated cardiomyopathy and hypertrophic cardiomyopathy using deep learning. Int. Conf. Inf. Sci. Commun. Technol. 2021, 1–5 (2021). https://doi.org/10.1109/ICISCT52966.2021.9670134
Article Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence, Tashkent University of Information Technology named after Muhammad Al-Khwarizmi, Tashkent, Uzbekistan
Muhammadjon Musaev & Mannon Ochilov
Information Technologies, Samarkand branch of Tashkent University of Information Technology named after Muhammad Al-Khwarizmi, Tashkent, Uzbekistan
Ilyos Khujayarov

Authors

Muhammadjon Musaev
View author publications
You can also search for this author in PubMed Google Scholar
Ilyos Khujayarov
View author publications
You can also search for this author in PubMed Google Scholar
Mannon Ochilov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mannon Ochilov .

Editor information

Editors and Affiliations

Tashkent University Information Technologies, Tashkent, Uzbekistan
Hakimjon Zaynidinov
Oregon Institute of Technology, Klamath Falls, USA
Madhusudan Singh
Indian Institute of Information Technology, Allahabad, India
Uma Shanker Tiwary
Hankuk University of Foreign Studies, Yongin, Korea (Republic of)
Dhananjay Singh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Musaev, M., Khujayarov, I., Ochilov, M. (2023). Speech Recognition Technologies Based on Artificial Intelligence Algorithms. In: Zaynidinov, H., Singh, M., Tiwary, U.S., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2022. Lecture Notes in Computer Science, vol 13741. Springer, Cham. https://doi.org/10.1007/978-3-031-27199-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-27199-1_6
Published: 11 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27198-4
Online ISBN: 978-3-031-27199-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Speech Recognition Technologies Based on Artificial Intelligence Algorithms

Abstract

Access this chapter

Similar content being viewed by others

Speech Recognition Using Artificial Neural Network

Automatic Speech Recognition Based on Neural Networks

Performance Optimization of Speech Recognition System with Deep Neural Network Model

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Speech Recognition Technologies Based on Artificial Intelligence Algorithms

Abstract

Access this chapter

Similar content being viewed by others

Speech Recognition Using Artificial Neural Network

Automatic Speech Recognition Based on Neural Networks

Performance Optimization of Speech Recognition System with Deep Neural Network Model

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation