Skip to main content

Automatic Speech Recognition on the Neutral Network Based on Attention Mechanism

  • Conference paper
  • First Online:
Intelligent Human Computer Interaction (IHCI 2022)

Abstract

This article focuses on the problems that arise in the recognition of speech through machine learning and the methods based on in-depth learning used to overcome them, which outlines approaches to the transition to a coding-decoding architecture system based on the attention mechanism. It also describes the hybrid CTC/Attention architecture, which is now widely used in speech recognition. In recent years, models of neural network architectures and neural network model based on attention mechanism, which are widely used in automatic speech recognition, have been proposed, which are taught on the basis of Uzbek and Russian speech corpuscles and the results obtained are comparatively analyzed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bahl, L.R., Jelinek, F., Mercer, R.L.: A maximum likelihood approach to continuous speech recognition. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-5(2), 179–190 (1983)

    Google Scholar 

  2. Noisy channel model (2020). Page was last edited on 12 April 2020, at 09:02 (UTC). https://en.wikipedia.org/wiki/Noisy_channel_model

  3. Watanabe, S., et al.: Hybrid CTC/attention architecture for end-to-end. IEEE J. Sel. Top. Sig. Process. 11(8), 1240–1253 (2017)

    Article  Google Scholar 

  4. Hannun “Sequence Modeling with CTC” (2017). https://distill.pub/2017/ctc/

  5. Graves, S., Gomez, F.F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML (2006)

    Google Scholar 

  6. Chan, W., et al.: Listen, attend and spell. arXiv: 1508.01211 https://arxiv.org/abs/1508.01211

  7. Amodei, D., et al.: Deep speech2: end-to-end speech recognition in English and Mandarin arXiv: 1512.02595 https://arxiv.org/abs/1512.02595

  8. Zeghidour, N., Usunier, N., Synnaeve, G., Collobert, R., Dupoux, E.: End-to-end speech recognition from the raw waveform. arXiv:1806.07098 (2018)

  9. Mamatov, N.S., Niyozmatova, N.A., Abdullaev, S.S., Samijonov, A.N., Erejepov, K.K.: Speech recognition based on transformer neural networks. In: International Conference on Information Science and Communications Technologies: Applications, Trends and Opportunities, ICISCT 2021 (2021)

    Google Scholar 

  10. Mamatov, N., Niyozmatova, N., Samijonov, A.: Software for preprocessing voice signals. Int. J. Appl. Sci. Eng. 18(1) (2021). https://doi.org/10.6703/IJASE.202103_18(1).006

  11. Narzillo, M., Abdurashid, S., Parakhat, N., Nilufar, N.: Automatic speaker identification by voice based on vector quantization method. Int. J. Innov. Technol. Explor. Eng. 8(10), 2443–2445 (2019). https://doi.org/10.35940/ijitee.J9523.0881019

  12. Wiedecke, B., Narzillo, M., Payazov, M., Abdurashid, S.: Acoustic signal analysis and identification. Int. J. Innov. Technol. Explor. Eng. 8(10), 2440–2442 (2019). https://doi.org/10.35940/ijitee.J9522.0881019

  13. Narzillo, M., Abdurashid, S., Parakhat, N., Nilufar, N.: Karakalpak speech recognition with CMU sphinx. Int. J. Innov. Technol. Explor. Eng. 8(10), 2446–2448 (2019). https://doi.org/10.35940/ijitee.J9524.0881019

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. S. Mamatov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mamatov, N.S., Niyozmatova, N.A., Yuldoshev, Y.S., Abdullaev, S.S., Samijonov, A.N. (2023). Automatic Speech Recognition on the Neutral Network Based on Attention Mechanism. In: Zaynidinov, H., Singh, M., Tiwary, U.S., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2022. Lecture Notes in Computer Science, vol 13741. Springer, Cham. https://doi.org/10.1007/978-3-031-27199-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-27199-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-27198-4

  • Online ISBN: 978-3-031-27199-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics