Automatic Speech Recognition on the Neutral Network Based on Attention Mechanism

Mamatov, N. S.; Niyozmatova, N. A.; Yuldoshev, Yu. Sh.; Abdullaev, Sh. Sh.; Samijonov, A. N.

doi:10.1007/978-3-031-27199-1_11

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13741))

Included in the following conference series:

International Conference on Intelligent Human Computer Interaction

2 Citations

Abstract

This article focuses on the problems that arise in the recognition of speech through machine learning and the methods based on in-depth learning used to overcome them, which outlines approaches to the transition to a coding-decoding architecture system based on the attention mechanism. It also describes the hybrid CTC/Attention architecture, which is now widely used in speech recognition. In recent years, models of neural network architectures and neural network model based on attention mechanism, which are widely used in automatic speech recognition, have been proposed, which are taught on the basis of Uzbek and Russian speech corpuscles and the results obtained are comparatively analyzed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Toward Developing Attention-Based End-To-End Automatic Speech Recognition

Transformers in Automatic Speech Recognition

Full single-type deep learning models with multihead attention for speech enhancement

Article 15 April 2023

References

Bahl, L.R., Jelinek, F., Mercer, R.L.: A maximum likelihood approach to continuous speech recognition. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-5(2), 179–190 (1983)
Google Scholar
Noisy channel model (2020). Page was last edited on 12 April 2020, at 09:02 (UTC). https://en.wikipedia.org/wiki/Noisy_channel_model
Watanabe, S., et al.: Hybrid CTC/attention architecture for end-to-end. IEEE J. Sel. Top. Sig. Process. 11(8), 1240–1253 (2017)
Article Google Scholar
Hannun “Sequence Modeling with CTC” (2017). https://distill.pub/2017/ctc/
Graves, S., Gomez, F.F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML (2006)
Google Scholar
Chan, W., et al.: Listen, attend and spell. arXiv: 1508.01211 https://arxiv.org/abs/1508.01211
Amodei, D., et al.: Deep speech2: end-to-end speech recognition in English and Mandarin arXiv: 1512.02595 https://arxiv.org/abs/1512.02595
Zeghidour, N., Usunier, N., Synnaeve, G., Collobert, R., Dupoux, E.: End-to-end speech recognition from the raw waveform. arXiv:1806.07098 (2018)
Mamatov, N.S., Niyozmatova, N.A., Abdullaev, S.S., Samijonov, A.N., Erejepov, K.K.: Speech recognition based on transformer neural networks. In: International Conference on Information Science and Communications Technologies: Applications, Trends and Opportunities, ICISCT 2021 (2021)
Google Scholar
Mamatov, N., Niyozmatova, N., Samijonov, A.: Software for preprocessing voice signals. Int. J. Appl. Sci. Eng. 18(1) (2021). https://doi.org/10.6703/IJASE.202103_18(1).006
Narzillo, M., Abdurashid, S., Parakhat, N., Nilufar, N.: Automatic speaker identification by voice based on vector quantization method. Int. J. Innov. Technol. Explor. Eng. 8(10), 2443–2445 (2019). https://doi.org/10.35940/ijitee.J9523.0881019
Wiedecke, B., Narzillo, M., Payazov, M., Abdurashid, S.: Acoustic signal analysis and identification. Int. J. Innov. Technol. Explor. Eng. 8(10), 2440–2442 (2019). https://doi.org/10.35940/ijitee.J9522.0881019
Narzillo, M., Abdurashid, S., Parakhat, N., Nilufar, N.: Karakalpak speech recognition with CMU sphinx. Int. J. Innov. Technol. Explor. Eng. 8(10), 2446–2448 (2019). https://doi.org/10.35940/ijitee.J9524.0881019

Download references

Author information

Authors and Affiliations

Tashkent Institute of Irrigation and Agricultural Mechanization Engineers National Research University, Tashkent, Uzbekistan
N. S. Mamatov, N. A. Niyozmatova, Yu. Sh. Yuldoshev & Sh. Sh. Abdullaev
Tashkent University of Information Technology After Named Muhammad Al-Khwarizmi, Tashkent, Uzbekistan
A. N. Samijonov

Authors

N. S. Mamatov
View author publications
You can also search for this author in PubMed Google Scholar
N. A. Niyozmatova
View author publications
You can also search for this author in PubMed Google Scholar
Yu. Sh. Yuldoshev
View author publications
You can also search for this author in PubMed Google Scholar
Sh. Sh. Abdullaev
View author publications
You can also search for this author in PubMed Google Scholar
A. N. Samijonov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. S. Mamatov .

Editor information

Editors and Affiliations

Tashkent University Information Technologies, Tashkent, Uzbekistan
Hakimjon Zaynidinov
Oregon Institute of Technology, Klamath Falls, USA
Madhusudan Singh
Indian Institute of Information Technology, Allahabad, India
Uma Shanker Tiwary
Hankuk University of Foreign Studies, Yongin, Korea (Republic of)
Dhananjay Singh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mamatov, N.S., Niyozmatova, N.A., Yuldoshev, Y.S., Abdullaev, S.S., Samijonov, A.N. (2023). Automatic Speech Recognition on the Neutral Network Based on Attention Mechanism. In: Zaynidinov, H., Singh, M., Tiwary, U.S., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2022. Lecture Notes in Computer Science, vol 13741. Springer, Cham. https://doi.org/10.1007/978-3-031-27199-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-27199-1_11
Published: 11 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27198-4
Online ISBN: 978-3-031-27199-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Speech Recognition on the Neutral Network Based on Attention Mechanism

Abstract

Access this chapter

Similar content being viewed by others

Toward Developing Attention-Based End-To-End Automatic Speech Recognition

Transformers in Automatic Speech Recognition

Full single-type deep learning models with multihead attention for speech enhancement

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Automatic Speech Recognition on the Neutral Network Based on Attention Mechanism

Abstract

Access this chapter

Similar content being viewed by others

Toward Developing Attention-Based End-To-End Automatic Speech Recognition

Transformers in Automatic Speech Recognition

Full single-type deep learning models with multihead attention for speech enhancement

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation