Automatic Transcription and Subtitling of Slovak Multi-genre Audiovisual Recordings

  • Ján StašEmail author
  • Peter Viszlay
  • Martin Lojka
  • Tomáš Koctúr
  • Daniel Hládek
  • Jozef Juhár
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10930)


This paper summarizes a recent progress in the development of the automatic transcription system for subtitling of the Slovak multi-genre audiovisual recordings, such as lectures, talks, discussions, broadcast news or TV/radio shows. The main concept is based on application of current and innovative principles and methods oriented towards speech and language processing, automatic speech segmentation, speech recognition, statistical modeling and adaptation of acoustic and language models to a specific topic, gender and speaking style of the speaker. We have developed a working prototype of automatic transcription system for the Slovak language, mainly designed for subtitling of various types of single- or multi-channel audiovisual recordings. Preliminary results show a significant decrease in word error rate relatively from 2.40% to 47.10% for an individual speaker in fully automatic transcription and subtitling of Slovak parliament speech, broadcast news or TEDx talks.


Automatic subtitling Broadcast news Lecture speech Parliament speech Speech recognition Speech segmentation User modeling 



The research in this paper was supported by the Faculty of Electrical Engineering and Informatics at the Technical University of Košice under the research project FEI-2015-30, by the Ministry of Education, Science, Research and Sport of the Slovak Republic under the research project VEGA 1/0511/17 and the Slovak Research and Development Agency under the research project APVV-15-0517.


  1. 1.
    Akita, Y., Watanabe, M., Kawahara, T.: Automatic transcription of lecture speech using language model based on speaking-style transformation of proceeding texts. In: Proceedings of the INTERSPEECH 2012, Portland, Oregon, USA, pp. 2326–2329 (2012)Google Scholar
  2. 2.
    Alvarez, A., Mendes, C., Raffaelli, M., Luís, T., Paulo, S., Piccinini, N., Arzelus, H., Neto, J., Aliprandi, C., del Pozo, A.: Automating live and batch subtitling of multimedia contents for several European languages. Multimed. Tools Appl. 75, 10823–10853 (2015)CrossRefGoogle Scholar
  3. 3.
    Bobeldijk, M., Ellisen, K.M., Lamby, J., Schaeding, R., Best-Smolarek, L.: Creating a barrier-free Europe for all hard of hearing citizens: State of subtitling access in EU. Technical report 2011, European Federation of Hard of Hearing People, Stockholm, Sweden (2011)Google Scholar
  4. 4.
    Fiscus, J.G.: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER). In: Proceedings of the ASRU 1997, Santa Barbara, California, USA, pp. 347–354 (1997)Google Scholar
  5. 5.
    Gauvain, J.L., Lamel, L., Adda, G.: The LIMSI broadcast news transcription system. Speech Commun. 37, 89–108 (2002)CrossRefGoogle Scholar
  6. 6.
    Hládek, D., Ondáš, S., Staš, J.: Online natural language processing of the Slovak language. In: Proceedings of the CogInfoCom 2014, Vietri sul Mare, Italy, pp. 315–316 (2014)Google Scholar
  7. 7.
    Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, New York (1986). Scholar
  8. 8.
    Kiktová, E., Juhár, J.: Comparison of diarization tools for building speaker database. Adv. Electr. Electron. Eng. 13(4), 314–319 (2015)Google Scholar
  9. 9.
    Koctúr, T., Pleva, M., Juhár, J.: Interface for smart audiovisual data archive. In: Proceedings of the RADIOELEKTRONIKA 2015, Pardubice, Czech Republic, pp. 292–294 (2015)Google Scholar
  10. 10.
    Kumar, N.: Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition. Ph.D. thesis, Johns Hopkins Universtiy, Baltimore, Maryland (1997)Google Scholar
  11. 11.
    Lee, A., Kawahara, T., Shikano, K.: Julius - an open source real-time large vocabulary recognition engine. In: Proceedings of the EUROSPEECH 2001, Aalborg, Denmark, pp. 1691–1694 (2001)Google Scholar
  12. 12.
    Lojka, M., Juhár, J.: Hypothesis combination for Slovak dictation speech recognition. In: Proceedings of the 56th International Symposium ELMAR 2014, Zadar, Croatia, pp. 43–46 (2014)Google Scholar
  13. 13.
    Lojka, M., Ondáš, S., Pleva, M., Juhár, J.: Multi-thread parallel speech recognition for mobile applications. J. Electr. Electron. Eng. 7(1), 81–86 (2014)Google Scholar
  14. 14.
    Maučec, M.S., Žgank, A.: Speech recognition system for Slovenian broadcast news. In: Ipsic, I. (ed.) Speech Technologies, pp. 105–112. InTech, Rijeka (2012)Google Scholar
  15. 15.
    Pleva, M., Juhár, J.: TUKE-BNews-SK: Slovak broadcast news corpus construction and evaluation. In: Proceedings of the LREC 2014, Reykjavik, Iceland, pp. 1709–1713 (2014)Google Scholar
  16. 16.
    Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: Proceedins of the ASRU 2011, Waikoloa, Hawaii, US, pp. 1–4 (2011)Google Scholar
  17. 17.
    Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. In: Proceedings of the INTERSPEECH 2013, Lyon, France, pp. 1477–1481 (2013)Google Scholar
  18. 18.
    Rusko, M., Juhár, J., Trnka, M., Staš, J., Darjaa, S., Hládek, D., Sabo, R., Pleva, M., Ritomský, M., Ondáš, S.: Recent advances in the Slovak dictation system for judicial domain. In: Proceedings of the LTC 2013, Poznań, Poland, pp. 555–560 (2013)Google Scholar
  19. 19.
    Staš, J., Viszlay, P., Lojka, M., Koctúr, T., Hládek, D., Kiktová, E., Pleva, M., Juhár, J.: Automatic subtitling system for transcription, archiving and indexing of Slovak audiovisual recordings. In: Proceedings of the LTC 2015, Poznań, Poland, pp. 186–191 (2015)Google Scholar
  20. 20.
    Staš, J., Zlacký, D., Hládek, D.: Semantically similar document retrieval framework for language model speaker adaptation. In: Proceedings of the RADIOELEKTRONIKA 2016, Košice, Slovakia, pp. 403–407 (2016)Google Scholar
  21. 21.
    Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of the ICSLP 2002, Denver, USA, pp. 901–904 (2002)Google Scholar
  22. 22.
    Varga, Á., Tarján, B., Tobler, Z., Szaszák, G., Fegyó, T., Bordás, C., Mihajlik, P.: Automatic close captioning for live hungarian television broadcast speech: a fast and resource-efficient approach. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS (LNAI), vol. 9319, pp. 105–112. Springer, Cham (2015). Scholar
  23. 23.
    Vavrek, J., Viszlay, P., Kiktová, E., Lojka, M., Juhár, J., Čižmár, A.: Query-by-example retrieval via fast sequential dynamic time warping algorithm. In: Proceedings of the TSP 2014, Berlin, Germany, pp. 469–473 (2014)Google Scholar
  24. 24.
    Viszlay, P., Lojka, M., Juhár, J.: Class-dependent two-dimensional linear discriminant analysis using two-pass recognition strategy. In: Proceedings of the EUSIPCO 2014, Lisbon, Portugal, pp. 1796–1800 (2014)Google Scholar
  25. 25.
    Viszlay, P., Staš, J., Koctúr, T., Lojka, M., Juhár, J.: An extension of the Slovak broadcast news corpus based on semi-automatic annotation. In: Proceedings of the LREC 2016, Portorož, Slovenia, pp. 4684–4687 (2016)Google Scholar
  26. 26.
    Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the 7th LREC 2010 Workshop: New Challenges for NLP Frameworks, Valleta, Malta, pp. 46–50 (2010)Google Scholar
  27. 27.
    Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.4). Cambridge University, Cambridge (2006)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Ján Staš
    • 1
    Email author
  • Peter Viszlay
    • 1
  • Martin Lojka
    • 1
  • Tomáš Koctúr
    • 1
  • Daniel Hládek
    • 1
  • Jozef Juhár
    • 1
  1. 1.Department of Electronics and Multimedia Communications, Faculty of Electrical Engineering and InformaticsTechnical University of KošiceKošiceSlovak Republic

Personalised recommendations