Unified Approach to Development of ASR Systems for East Slavic Languages

Safarik, Radek; Nouza, Jan

doi:10.1007/978-3-319-68456-7_16

Radek Safarik¹⁶ &
Jan Nouza¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10583))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

686 Accesses
3 Citations

Abstract

This paper deals with the development of language specific modules (lexicons, phonetic inventories, LMs and AMs) for Russian, Ukrainian and Belarusian (used by 260M, 45M and 3M native speakers, respectively). Instead of working on each language separately, we adopt a common approach that allows us to share data and tools, yet taking into account language unique features. We utilize only freely available text and audio data that can be found on web pages of major newspaper and broadcast publishers. This must be done with large care, as the 3 languages are often mixed in spoken and written media. So, one component of the automated training process is a language identification module. At the output of the complete process there are 3 pronunciation lexicons (each about 300K words), 3 partly shared phoneme sets, and corresponding acoustic (DNN) and language (N-gram) models. We employ them in our media monitoring system and provide results achieved on a test set made of several complete TV news in all the 3 languages. The WER values vary in range from 24 to 36%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chaloupka, J.: Digits to words converter for slavic languages in systems of automatic speech recognition. In: Karpov, A., Potapova, R., Mporas, I. (eds.) Speech and Computer. LNCS, vol. 10458, pp. 312–321. Springer, Cham (2017). doi:10.1007/978-3-319-66429-3_30
Chapter Google Scholar
Jokisch, O., et al.: Multilingual speech data collection for the assessment of pronunciation and prosody in a language learning system. In: Proceeding of SPECOM (2009)
Google Scholar
Vintsyuk, T.: Speech discrimination by dynamic programming. Kibernetica 1, 15–22 (1968)
Google Scholar
Kanevsky, D., Monkowski, M., Sedivy, J.: Large vocabulary speaker-independent continuous speech recognition in Russian language. In: Proceeding of SPECOM, vol. 96, pp. 28–31 (1996)
Google Scholar
Karpov, A., et al.: Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Commun. 56, 213–228 (2014)
Article Google Scholar
Karpov, A., Kipyatkova, I., Ronzhin, A.: Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis. In: Proceeding of Interspeech (2011)
Google Scholar
Lyudovyk, T., Robeiko, V., Pylypenko, V.: Automatic recognition of spontaneous Ukrainian speech based on the Ukrainian broadcast speech corpus. In: Dialog 2011 (2011)
Google Scholar
Lyudovyk, T., Pylypenko, V.: Code-switching speech recognition for closely related languages. In: SLTU (2014)
Google Scholar
Nikalaenka K., Hetsevich, Y.: Training algorithm for speaker-independent voice recognition systems using HTK. In: Pattern recognition and information processing (2016)
Google Scholar
Nouza J., Safarik R., Cerva, P.: Asr for south slavic languages developed in almost automated way. In: LNCS (LNAI) (2017)
Google Scholar
Nouza, J., et al.: Continual on-line monitoring of czech spoken broadcast programs. In: Proceeding of Interspeech, pp. 1650–1653 (2006)
Google Scholar
Nouza, J., et al.: Czech-to-slovak adapted broadcast news transcription system. In: Proceeding of Interspeech (2008)
Google Scholar
Nouza, J., et al.: Speech-to-text technology to transcribe and disclose 100,000+ hours of bilingual documents from historical czech and czechoslovak radio archive. In: Proceeding of Interspeech (2014)
Google Scholar
Nouza, J., Cerva, P., Safarik, R.: Cross-lingual adaptation of broadcast transcription system to polish language using public data sources. In: Proceeding of Interspeech (2016)
Google Scholar
Nouza, J., Cerva, P., Silovsky, J.: Dealing with bilingualism in automatic transcription of historical archive of czech radio. In: Petrosino, A., Maddalena, L., Pala, P. (eds.) ICIAP 2013. LNCS, vol. 8158, pp. 238–246. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41190-8_26
Chapter Google Scholar
Pylypenko, V., et al.: Ukrainian broadcast speech corpus development. In: Proceeding of SPECOM (2011)
Google Scholar
Pylypenko, V., Robeyko, V.: Experimental system of computerized stenographer for Ukrainian speech. In: Proceeding of SPECOM (2009)
Google Scholar
Robeiko, V., Sazhok, M.: Real-time spontaneous Ukrainian speech recognition system based on word acoustic composite models. In: Proceeding of UkrObraz (2012)
Google Scholar
Safarik, R., Nouza, J.: Methods for rapid development of automatic speech recognition system for Russian. In: Proceeding of IEEE Workshop ECMS (2015)
Google Scholar
Schlippe, T., et al.: Rapid bootstrapping of a Ukrainian large vocabulary continuous speech recognition system. In: Acoustics, Speech and Signal Processing (ICASSP) (2013)
Google Scholar
Stüker, S., Schultz, T.: A grapheme based speech recognition system for Russian. In: Proceeding of International Conference SPECOM 2004 (2004)
Google Scholar
Schultz, T.: Globalphone: a multilingual speech and text database developed at karlsruhe university. In: Proceeding of Interspeech (2002)
Google Scholar
Vu, N.T., et al.: Rapid bootstrapping of five eastern European languages using the rapid language adaptation toolkit. In: Proceeding of Interspeech (2010)
Google Scholar

Download references

Acknowledgments

The research was supported by the Technology Agency of the Czech Republic (project TA04010199) and by the Student Grant Scheme (SGS) at the Technical University of Liberec.

Author information

Authors and Affiliations

SpeechLab, Technical University of Liberec, Studentska 2, 461 17, Liberec, Czech Republic
Radek Safarik & Jan Nouza

Authors

Radek Safarik
View author publications
You can also search for this author in PubMed Google Scholar
Jan Nouza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Radek Safarik .

Editor information

Editors and Affiliations

University of Le Mans, Le Mans, France
Nathalie Camelin
University of Le Mans, Le Mans, France
Yannick Estève
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Safarik, R., Nouza, J. (2017). Unified Approach to Development of ASR Systems for East Slavic Languages. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2017. Lecture Notes in Computer Science(), vol 10583. Springer, Cham. https://doi.org/10.1007/978-3-319-68456-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-68456-7_16
Published: 27 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68455-0
Online ISBN: 978-3-319-68456-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics