Abstract
This paper deals with the development of language specific modules (lexicons, phonetic inventories, LMs and AMs) for Russian, Ukrainian and Belarusian (used by 260M, 45M and 3M native speakers, respectively). Instead of working on each language separately, we adopt a common approach that allows us to share data and tools, yet taking into account language unique features. We utilize only freely available text and audio data that can be found on web pages of major newspaper and broadcast publishers. This must be done with large care, as the 3 languages are often mixed in spoken and written media. So, one component of the automated training process is a language identification module. At the output of the complete process there are 3 pronunciation lexicons (each about 300K words), 3 partly shared phoneme sets, and corresponding acoustic (DNN) and language (N-gram) models. We employ them in our media monitoring system and provide results achieved on a test set made of several complete TV news in all the 3 languages. The WER values vary in range from 24 to 36%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chaloupka, J.: Digits to words converter for slavic languages in systems of automatic speech recognition. In: Karpov, A., Potapova, R., Mporas, I. (eds.) Speech and Computer. LNCS, vol. 10458, pp. 312–321. Springer, Cham (2017). doi:10.1007/978-3-319-66429-3_30
Jokisch, O., et al.: Multilingual speech data collection for the assessment of pronunciation and prosody in a language learning system. In: Proceeding of SPECOM (2009)
Vintsyuk, T.: Speech discrimination by dynamic programming. Kibernetica 1, 15–22 (1968)
Kanevsky, D., Monkowski, M., Sedivy, J.: Large vocabulary speaker-independent continuous speech recognition in Russian language. In: Proceeding of SPECOM, vol. 96, pp. 28–31 (1996)
Karpov, A., et al.: Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Commun. 56, 213–228 (2014)
Karpov, A., Kipyatkova, I., Ronzhin, A.: Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis. In: Proceeding of Interspeech (2011)
Lyudovyk, T., Robeiko, V., Pylypenko, V.: Automatic recognition of spontaneous Ukrainian speech based on the Ukrainian broadcast speech corpus. In: Dialog 2011 (2011)
Lyudovyk, T., Pylypenko, V.: Code-switching speech recognition for closely related languages. In: SLTU (2014)
Nikalaenka K., Hetsevich, Y.: Training algorithm for speaker-independent voice recognition systems using HTK. In: Pattern recognition and information processing (2016)
Nouza J., Safarik R., Cerva, P.: Asr for south slavic languages developed in almost automated way. In: LNCS (LNAI) (2017)
Nouza, J., et al.: Continual on-line monitoring of czech spoken broadcast programs. In: Proceeding of Interspeech, pp. 1650–1653 (2006)
Nouza, J., et al.: Czech-to-slovak adapted broadcast news transcription system. In: Proceeding of Interspeech (2008)
Nouza, J., et al.: Speech-to-text technology to transcribe and disclose 100,000+ hours of bilingual documents from historical czech and czechoslovak radio archive. In: Proceeding of Interspeech (2014)
Nouza, J., Cerva, P., Safarik, R.: Cross-lingual adaptation of broadcast transcription system to polish language using public data sources. In: Proceeding of Interspeech (2016)
Nouza, J., Cerva, P., Silovsky, J.: Dealing with bilingualism in automatic transcription of historical archive of czech radio. In: Petrosino, A., Maddalena, L., Pala, P. (eds.) ICIAP 2013. LNCS, vol. 8158, pp. 238–246. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41190-8_26
Pylypenko, V., et al.: Ukrainian broadcast speech corpus development. In: Proceeding of SPECOM (2011)
Pylypenko, V., Robeyko, V.: Experimental system of computerized stenographer for Ukrainian speech. In: Proceeding of SPECOM (2009)
Robeiko, V., Sazhok, M.: Real-time spontaneous Ukrainian speech recognition system based on word acoustic composite models. In: Proceeding of UkrObraz (2012)
Safarik, R., Nouza, J.: Methods for rapid development of automatic speech recognition system for Russian. In: Proceeding of IEEE Workshop ECMS (2015)
Schlippe, T., et al.: Rapid bootstrapping of a Ukrainian large vocabulary continuous speech recognition system. In: Acoustics, Speech and Signal Processing (ICASSP) (2013)
Stüker, S., Schultz, T.: A grapheme based speech recognition system for Russian. In: Proceeding of International Conference SPECOM 2004 (2004)
Schultz, T.: Globalphone: a multilingual speech and text database developed at karlsruhe university. In: Proceeding of Interspeech (2002)
Vu, N.T., et al.: Rapid bootstrapping of five eastern European languages using the rapid language adaptation toolkit. In: Proceeding of Interspeech (2010)
Acknowledgments
The research was supported by the Technology Agency of the Czech Republic (project TA04010199) and by the Student Grant Scheme (SGS) at the Technical University of Liberec.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Safarik, R., Nouza, J. (2017). Unified Approach to Development of ASR Systems for East Slavic Languages. In: Camelin, N., Estève, Y., MartÃn-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2017. Lecture Notes in Computer Science(), vol 10583. Springer, Cham. https://doi.org/10.1007/978-3-319-68456-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-68456-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68455-0
Online ISBN: 978-3-319-68456-7
eBook Packages: Computer ScienceComputer Science (R0)