Recent Results in Speech Recognition for the Tatar Language

Khusainov, Aidar

doi:10.1007/978-3-319-64206-2_21

Recent Results in Speech Recognition for the Tatar Language

Aidar Khusainov¹⁵

Conference paper
First Online: 29 July 2017

1486 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Abstract

This paper presents a comparative study of several different systems for speech recognition for the Tatar language, including systems for very large and unlimited vocabularies. All the compared systems use a corpus based approach, so recent results in speech and text corpora creation are also shown. The recognition systems differ in acoustic modelling algorithms, basic acoustic units, and language modelling techniques. The DNN based system with the sub-word based language model shows the best recognition result obtained on the test part of speech corpus.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Yandex translate (2017). https://translate.yandex.com/
Khusainov, A.: Design and creation of speech corpora for the Tatar speech recognition and synthesis tasks. In: Proceedings of Third International Conference on Turkic Languages Processing TurkLang-2015, Kazan, Russia, pp. 475–484 (2015)
Google Scholar
Khusainov, A.: Speech human-machine interface for the Tatar language, FRUCT Oy, Helsinki, pp. 60–65 (2016)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The Kaldi speech recognition toolkit. In: Proceedings of ASRU, pp. 1–4 (2011)
Google Scholar
Suleymanov, D., Nevzorova, O.A., Khakimov, B.: National corpus of the Tatar language Tugan tel: structure and features of grammatical annotation. In: Proceedings International Conference Georgian Language and Modern Technology, Tbilisi, pp. 107–108 (2013)
Google Scholar
Khusainov, A.: Tekhnologiya avtomatizatsii sozdaniya I otsenki kachestva programmnikh sredstv analiza rechi c uchetom osobennostey maloresursnykh yazikov. Ph.D. thesis, Kazan (2014)
Google Scholar
Krauwer, S.: The basic language resource kit (BLARK) as the first milestone for the language resources roadmap. In: Proceedings of International Workshop Speech and Computer SPEECOM, Moscow, Russia, pp. 8–15 (2003)
Google Scholar
Lewis, M., Paul Simons, G.F., Fennig, C.D. (eds.): Ethnologue: Languages of the World, 9th edn. (2016). http://www.ethnologue.com. Accessed 15 Jan 2017
Kneser, R., Ney, H.: Improved backing off for m-gram language modeling. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1 (1995)
Google Scholar
Rath, S.P., Povey, D., Vesely, K., Cernocky, J.H.: Improved feature processing for deep neural networks. In. Proceedings of InterSpeech (2013)
Google Scholar
Stolcke, A.: Entropy-based pruning of backoff language models. In: Proceedings DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, pp. 270–274 (1998)
Google Scholar
Stolcke, A.: SRILM an extensible language modeling toolkit. In: Proceedings of International Conference on Spoken Language Processing, Denver, vol. 2, pp. 901–904 (2002)
Google Scholar
Robeiko, V., Sazhok, M.: Bidirectional text-to-pronunciation conversion with word stress prediction for Ukranian. In: Proceedings UkrObraz 2012, Kyiv, pp. 43–46 (2025)
Google Scholar
Zhang, X., Trmal, J., Povey, D., Khudanpur, S.: Improving deep neural network acoustic models using generalized maxout networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 215–219 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Applied Semiotics of the Tatarstan Academy of Sciences, Kazan Federal University, Kazan, Russia
Aidar Khusainov

Authors

Aidar Khusainov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aidar Khusainov .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Kamil Ekštein
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khusainov, A. (2017). Recent Results in Speech Recognition for the Tatar Language. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-64206-2_21
Published: 29 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64205-5
Online ISBN: 978-3-319-64206-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics