Abstract
This article focuses on the problems that arise in the recognition of speech through machine learning and the methods based on in-depth learning used to overcome them, which outlines approaches to the transition to a coding-decoding architecture system based on the attention mechanism. It also describes the hybrid CTC/Attention architecture, which is now widely used in speech recognition. In recent years, models of neural network architectures and neural network model based on attention mechanism, which are widely used in automatic speech recognition, have been proposed, which are taught on the basis of Uzbek and Russian speech corpuscles and the results obtained are comparatively analyzed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bahl, L.R., Jelinek, F., Mercer, R.L.: A maximum likelihood approach to continuous speech recognition. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-5(2), 179–190 (1983)
Noisy channel model (2020). Page was last edited on 12 April 2020, at 09:02 (UTC). https://en.wikipedia.org/wiki/Noisy_channel_model
Watanabe, S., et al.: Hybrid CTC/attention architecture for end-to-end. IEEE J. Sel. Top. Sig. Process. 11(8), 1240–1253 (2017)
Hannun “Sequence Modeling with CTC” (2017). https://distill.pub/2017/ctc/
Graves, S., Gomez, F.F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML (2006)
Chan, W., et al.: Listen, attend and spell. arXiv: 1508.01211 https://arxiv.org/abs/1508.01211
Amodei, D., et al.: Deep speech2: end-to-end speech recognition in English and Mandarin arXiv: 1512.02595 https://arxiv.org/abs/1512.02595
Zeghidour, N., Usunier, N., Synnaeve, G., Collobert, R., Dupoux, E.: End-to-end speech recognition from the raw waveform. arXiv:1806.07098 (2018)
Mamatov, N.S., Niyozmatova, N.A., Abdullaev, S.S., Samijonov, A.N., Erejepov, K.K.: Speech recognition based on transformer neural networks. In: International Conference on Information Science and Communications Technologies: Applications, Trends and Opportunities, ICISCT 2021 (2021)
Mamatov, N., Niyozmatova, N., Samijonov, A.: Software for preprocessing voice signals. Int. J. Appl. Sci. Eng. 18(1) (2021). https://doi.org/10.6703/IJASE.202103_18(1).006
Narzillo, M., Abdurashid, S., Parakhat, N., Nilufar, N.: Automatic speaker identification by voice based on vector quantization method. Int. J. Innov. Technol. Explor. Eng. 8(10), 2443–2445 (2019). https://doi.org/10.35940/ijitee.J9523.0881019
Wiedecke, B., Narzillo, M., Payazov, M., Abdurashid, S.: Acoustic signal analysis and identification. Int. J. Innov. Technol. Explor. Eng. 8(10), 2440–2442 (2019). https://doi.org/10.35940/ijitee.J9522.0881019
Narzillo, M., Abdurashid, S., Parakhat, N., Nilufar, N.: Karakalpak speech recognition with CMU sphinx. Int. J. Innov. Technol. Explor. Eng. 8(10), 2446–2448 (2019). https://doi.org/10.35940/ijitee.J9524.0881019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mamatov, N.S., Niyozmatova, N.A., Yuldoshev, Y.S., Abdullaev, S.S., Samijonov, A.N. (2023). Automatic Speech Recognition on the Neutral Network Based on Attention Mechanism. In: Zaynidinov, H., Singh, M., Tiwary, U.S., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2022. Lecture Notes in Computer Science, vol 13741. Springer, Cham. https://doi.org/10.1007/978-3-031-27199-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-27199-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27198-4
Online ISBN: 978-3-031-27199-1
eBook Packages: Computer ScienceComputer Science (R0)