Skip to main content

Unified Approach to Development of ASR Systems for East Slavic Languages

  • Conference paper
  • First Online:
Statistical Language and Speech Processing (SLSP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10583))

Included in the following conference series:

Abstract

This paper deals with the development of language specific modules (lexicons, phonetic inventories, LMs and AMs) for Russian, Ukrainian and Belarusian (used by 260M, 45M and 3M native speakers, respectively). Instead of working on each language separately, we adopt a common approach that allows us to share data and tools, yet taking into account language unique features. We utilize only freely available text and audio data that can be found on web pages of major newspaper and broadcast publishers. This must be done with large care, as the 3 languages are often mixed in spoken and written media. So, one component of the automated training process is a language identification module. At the output of the complete process there are 3 pronunciation lexicons (each about 300K words), 3 partly shared phoneme sets, and corresponding acoustic (DNN) and language (N-gram) models. We employ them in our media monitoring system and provide results achieved on a test set made of several complete TV news in all the 3 languages. The WER values vary in range from 24 to 36%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chaloupka, J.: Digits to words converter for slavic languages in systems of automatic speech recognition. In: Karpov, A., Potapova, R., Mporas, I. (eds.) Speech and Computer. LNCS, vol. 10458, pp. 312–321. Springer, Cham (2017). doi:10.1007/978-3-319-66429-3_30

    Chapter  Google Scholar 

  2. Jokisch, O., et al.: Multilingual speech data collection for the assessment of pronunciation and prosody in a language learning system. In: Proceeding of SPECOM (2009)

    Google Scholar 

  3. Vintsyuk, T.: Speech discrimination by dynamic programming. Kibernetica 1, 15–22 (1968)

    Google Scholar 

  4. Kanevsky, D., Monkowski, M., Sedivy, J.: Large vocabulary speaker-independent continuous speech recognition in Russian language. In: Proceeding of SPECOM, vol. 96, pp. 28–31 (1996)

    Google Scholar 

  5. Karpov, A., et al.: Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Commun. 56, 213–228 (2014)

    Article  Google Scholar 

  6. Karpov, A., Kipyatkova, I., Ronzhin, A.: Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis. In: Proceeding of Interspeech (2011)

    Google Scholar 

  7. Lyudovyk, T., Robeiko, V., Pylypenko, V.: Automatic recognition of spontaneous Ukrainian speech based on the Ukrainian broadcast speech corpus. In: Dialog 2011 (2011)

    Google Scholar 

  8. Lyudovyk, T., Pylypenko, V.: Code-switching speech recognition for closely related languages. In: SLTU (2014)

    Google Scholar 

  9. Nikalaenka K., Hetsevich, Y.: Training algorithm for speaker-independent voice recognition systems using HTK. In: Pattern recognition and information processing (2016)

    Google Scholar 

  10. Nouza J., Safarik R., Cerva, P.: Asr for south slavic languages developed in almost automated way. In: LNCS (LNAI) (2017)

    Google Scholar 

  11. Nouza, J., et al.: Continual on-line monitoring of czech spoken broadcast programs. In: Proceeding of Interspeech, pp. 1650–1653 (2006)

    Google Scholar 

  12. Nouza, J., et al.: Czech-to-slovak adapted broadcast news transcription system. In: Proceeding of Interspeech (2008)

    Google Scholar 

  13. Nouza, J., et al.: Speech-to-text technology to transcribe and disclose 100,000+ hours of bilingual documents from historical czech and czechoslovak radio archive. In: Proceeding of Interspeech (2014)

    Google Scholar 

  14. Nouza, J., Cerva, P., Safarik, R.: Cross-lingual adaptation of broadcast transcription system to polish language using public data sources. In: Proceeding of Interspeech (2016)

    Google Scholar 

  15. Nouza, J., Cerva, P., Silovsky, J.: Dealing with bilingualism in automatic transcription of historical archive of czech radio. In: Petrosino, A., Maddalena, L., Pala, P. (eds.) ICIAP 2013. LNCS, vol. 8158, pp. 238–246. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41190-8_26

    Chapter  Google Scholar 

  16. Pylypenko, V., et al.: Ukrainian broadcast speech corpus development. In: Proceeding of SPECOM (2011)

    Google Scholar 

  17. Pylypenko, V., Robeyko, V.: Experimental system of computerized stenographer for Ukrainian speech. In: Proceeding of SPECOM (2009)

    Google Scholar 

  18. Robeiko, V., Sazhok, M.: Real-time spontaneous Ukrainian speech recognition system based on word acoustic composite models. In: Proceeding of UkrObraz (2012)

    Google Scholar 

  19. Safarik, R., Nouza, J.: Methods for rapid development of automatic speech recognition system for Russian. In: Proceeding of IEEE Workshop ECMS (2015)

    Google Scholar 

  20. Schlippe, T., et al.: Rapid bootstrapping of a Ukrainian large vocabulary continuous speech recognition system. In: Acoustics, Speech and Signal Processing (ICASSP) (2013)

    Google Scholar 

  21. Stüker, S., Schultz, T.: A grapheme based speech recognition system for Russian. In: Proceeding of International Conference SPECOM 2004 (2004)

    Google Scholar 

  22. Schultz, T.: Globalphone: a multilingual speech and text database developed at karlsruhe university. In: Proceeding of Interspeech (2002)

    Google Scholar 

  23. Vu, N.T., et al.: Rapid bootstrapping of five eastern European languages using the rapid language adaptation toolkit. In: Proceeding of Interspeech (2010)

    Google Scholar 

Download references

Acknowledgments

The research was supported by the Technology Agency of the Czech Republic (project TA04010199) and by the Student Grant Scheme (SGS) at the Technical University of Liberec.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Radek Safarik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Safarik, R., Nouza, J. (2017). Unified Approach to Development of ASR Systems for East Slavic Languages. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2017. Lecture Notes in Computer Science(), vol 10583. Springer, Cham. https://doi.org/10.1007/978-3-319-68456-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68456-7_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68455-0

  • Online ISBN: 978-3-319-68456-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics