Unified Approach to Development of ASR Systems for East Slavic Languages

  • Radek SafarikEmail author
  • Jan Nouza
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10583)


This paper deals with the development of language specific modules (lexicons, phonetic inventories, LMs and AMs) for Russian, Ukrainian and Belarusian (used by 260M, 45M and 3M native speakers, respectively). Instead of working on each language separately, we adopt a common approach that allows us to share data and tools, yet taking into account language unique features. We utilize only freely available text and audio data that can be found on web pages of major newspaper and broadcast publishers. This must be done with large care, as the 3 languages are often mixed in spoken and written media. So, one component of the automated training process is a language identification module. At the output of the complete process there are 3 pronunciation lexicons (each about 300K words), 3 partly shared phoneme sets, and corresponding acoustic (DNN) and language (N-gram) models. We employ them in our media monitoring system and provide results achieved on a test set made of several complete TV news in all the 3 languages. The WER values vary in range from 24 to 36%.


Speech recognition Multi-lingual Cross-lingual East slavic languages Language identification 



The research was supported by the Technology Agency of the Czech Republic (project TA04010199) and by the Student Grant Scheme (SGS) at the Technical University of Liberec.


  1. 1.
    Chaloupka, J.: Digits to words converter for slavic languages in systems of automatic speech recognition. In: Karpov, A., Potapova, R., Mporas, I. (eds.) Speech and Computer. LNCS, vol. 10458, pp. 312–321. Springer, Cham (2017). doi: 10.1007/978-3-319-66429-3_30 CrossRefGoogle Scholar
  2. 2.
    Jokisch, O., et al.: Multilingual speech data collection for the assessment of pronunciation and prosody in a language learning system. In: Proceeding of SPECOM (2009)Google Scholar
  3. 3.
    Vintsyuk, T.: Speech discrimination by dynamic programming. Kibernetica 1, 15–22 (1968)Google Scholar
  4. 4.
    Kanevsky, D., Monkowski, M., Sedivy, J.: Large vocabulary speaker-independent continuous speech recognition in Russian language. In: Proceeding of SPECOM, vol. 96, pp. 28–31 (1996)Google Scholar
  5. 5.
    Karpov, A., et al.: Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Commun. 56, 213–228 (2014)CrossRefGoogle Scholar
  6. 6.
    Karpov, A., Kipyatkova, I., Ronzhin, A.: Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis. In: Proceeding of Interspeech (2011)Google Scholar
  7. 7.
    Lyudovyk, T., Robeiko, V., Pylypenko, V.: Automatic recognition of spontaneous Ukrainian speech based on the Ukrainian broadcast speech corpus. In: Dialog 2011 (2011)Google Scholar
  8. 8.
    Lyudovyk, T., Pylypenko, V.: Code-switching speech recognition for closely related languages. In: SLTU (2014)Google Scholar
  9. 9.
    Nikalaenka K., Hetsevich, Y.: Training algorithm for speaker-independent voice recognition systems using HTK. In: Pattern recognition and information processing (2016)Google Scholar
  10. 10.
    Nouza J., Safarik R., Cerva, P.: Asr for south slavic languages developed in almost automated way. In: LNCS (LNAI) (2017)Google Scholar
  11. 11.
    Nouza, J., et al.: Continual on-line monitoring of czech spoken broadcast programs. In: Proceeding of Interspeech, pp. 1650–1653 (2006)Google Scholar
  12. 12.
    Nouza, J., et al.: Czech-to-slovak adapted broadcast news transcription system. In: Proceeding of Interspeech (2008)Google Scholar
  13. 13.
    Nouza, J., et al.: Speech-to-text technology to transcribe and disclose 100,000+ hours of bilingual documents from historical czech and czechoslovak radio archive. In: Proceeding of Interspeech (2014)Google Scholar
  14. 14.
    Nouza, J., Cerva, P., Safarik, R.: Cross-lingual adaptation of broadcast transcription system to polish language using public data sources. In: Proceeding of Interspeech (2016)Google Scholar
  15. 15.
    Nouza, J., Cerva, P., Silovsky, J.: Dealing with bilingualism in automatic transcription of historical archive of czech radio. In: Petrosino, A., Maddalena, L., Pala, P. (eds.) ICIAP 2013. LNCS, vol. 8158, pp. 238–246. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-41190-8_26 CrossRefGoogle Scholar
  16. 16.
    Pylypenko, V., et al.: Ukrainian broadcast speech corpus development. In: Proceeding of SPECOM (2011)Google Scholar
  17. 17.
    Pylypenko, V., Robeyko, V.: Experimental system of computerized stenographer for Ukrainian speech. In: Proceeding of SPECOM (2009)Google Scholar
  18. 18.
    Robeiko, V., Sazhok, M.: Real-time spontaneous Ukrainian speech recognition system based on word acoustic composite models. In: Proceeding of UkrObraz (2012)Google Scholar
  19. 19.
    Safarik, R., Nouza, J.: Methods for rapid development of automatic speech recognition system for Russian. In: Proceeding of IEEE Workshop ECMS (2015)Google Scholar
  20. 20.
    Schlippe, T., et al.: Rapid bootstrapping of a Ukrainian large vocabulary continuous speech recognition system. In: Acoustics, Speech and Signal Processing (ICASSP) (2013)Google Scholar
  21. 21.
    Stüker, S., Schultz, T.: A grapheme based speech recognition system for Russian. In: Proceeding of International Conference SPECOM 2004 (2004)Google Scholar
  22. 22.
    Schultz, T.: Globalphone: a multilingual speech and text database developed at karlsruhe university. In: Proceeding of Interspeech (2002)Google Scholar
  23. 23.
    Vu, N.T., et al.: Rapid bootstrapping of five eastern European languages using the rapid language adaptation toolkit. In: Proceeding of Interspeech (2010)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.SpeechLabTechnical University of LiberecLiberecCzech Republic

Personalised recommendations