Automatic Transcription and Subtitling of Slovak Multi-genre Audiovisual Recordings

Staš, Ján; Viszlay, Peter; Lojka, Martin; Koctúr, Tomáš; Hládek, Daniel; Juhár, Jozef

doi:10.1007/978-3-319-93782-3_4

Ján Staš¹⁶,
Peter Viszlay¹⁶,
Martin Lojka¹⁶,
Tomáš Koctúr¹⁶,
Daniel Hládek¹⁶ &
…
Jozef Juhár¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10930))

Included in the following conference series:

Language and Technology Conference

533 Accesses

Abstract

This paper summarizes a recent progress in the development of the automatic transcription system for subtitling of the Slovak multi-genre audiovisual recordings, such as lectures, talks, discussions, broadcast news or TV/radio shows. The main concept is based on application of current and innovative principles and methods oriented towards speech and language processing, automatic speech segmentation, speech recognition, statistical modeling and adaptation of acoustic and language models to a specific topic, gender and speaking style of the speaker. We have developed a working prototype of automatic transcription system for the Slovak language, mainly designed for subtitling of various types of single- or multi-channel audiovisual recordings. Preliminary results show a significant decrease in word error rate relatively from 2.40% to 47.10% for an individual speaker in fully automatic transcription and subtitling of Slovak parliament speech, broadcast news or TEDx talks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Akita, Y., Watanabe, M., Kawahara, T.: Automatic transcription of lecture speech using language model based on speaking-style transformation of proceeding texts. In: Proceedings of the INTERSPEECH 2012, Portland, Oregon, USA, pp. 2326–2329 (2012)
Google Scholar
Alvarez, A., Mendes, C., Raffaelli, M., Luís, T., Paulo, S., Piccinini, N., Arzelus, H., Neto, J., Aliprandi, C., del Pozo, A.: Automating live and batch subtitling of multimedia contents for several European languages. Multimed. Tools Appl. 75, 10823–10853 (2015)
Article Google Scholar
Bobeldijk, M., Ellisen, K.M., Lamby, J., Schaeding, R., Best-Smolarek, L.: Creating a barrier-free Europe for all hard of hearing citizens: State of subtitling access in EU. Technical report 2011, European Federation of Hard of Hearing People, Stockholm, Sweden (2011)
Google Scholar
Fiscus, J.G.: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER). In: Proceedings of the ASRU 1997, Santa Barbara, California, USA, pp. 347–354 (1997)
Google Scholar
Gauvain, J.L., Lamel, L., Adda, G.: The LIMSI broadcast news transcription system. Speech Commun. 37, 89–108 (2002)
Article Google Scholar
Hládek, D., Ondáš, S., Staš, J.: Online natural language processing of the Slovak language. In: Proceedings of the CogInfoCom 2014, Vietri sul Mare, Italy, pp. 315–316 (2014)
Google Scholar
Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, New York (1986). https://doi.org/10.1007/978-1-4757-1904-8
Book MATH Google Scholar
Kiktová, E., Juhár, J.: Comparison of diarization tools for building speaker database. Adv. Electr. Electron. Eng. 13(4), 314–319 (2015)
Google Scholar
Koctúr, T., Pleva, M., Juhár, J.: Interface for smart audiovisual data archive. In: Proceedings of the RADIOELEKTRONIKA 2015, Pardubice, Czech Republic, pp. 292–294 (2015)
Google Scholar
Kumar, N.: Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition. Ph.D. thesis, Johns Hopkins Universtiy, Baltimore, Maryland (1997)
Google Scholar
Lee, A., Kawahara, T., Shikano, K.: Julius - an open source real-time large vocabulary recognition engine. In: Proceedings of the EUROSPEECH 2001, Aalborg, Denmark, pp. 1691–1694 (2001)
Google Scholar
Lojka, M., Juhár, J.: Hypothesis combination for Slovak dictation speech recognition. In: Proceedings of the 56th International Symposium ELMAR 2014, Zadar, Croatia, pp. 43–46 (2014)
Google Scholar
Lojka, M., Ondáš, S., Pleva, M., Juhár, J.: Multi-thread parallel speech recognition for mobile applications. J. Electr. Electron. Eng. 7(1), 81–86 (2014)
Google Scholar
Maučec, M.S., Žgank, A.: Speech recognition system for Slovenian broadcast news. In: Ipsic, I. (ed.) Speech Technologies, pp. 105–112. InTech, Rijeka (2012)
Google Scholar
Pleva, M., Juhár, J.: TUKE-BNews-SK: Slovak broadcast news corpus construction and evaluation. In: Proceedings of the LREC 2014, Reykjavik, Iceland, pp. 1709–1713 (2014)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: Proceedins of the ASRU 2011, Waikoloa, Hawaii, US, pp. 1–4 (2011)
Google Scholar
Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. In: Proceedings of the INTERSPEECH 2013, Lyon, France, pp. 1477–1481 (2013)
Google Scholar
Rusko, M., Juhár, J., Trnka, M., Staš, J., Darjaa, S., Hládek, D., Sabo, R., Pleva, M., Ritomský, M., Ondáš, S.: Recent advances in the Slovak dictation system for judicial domain. In: Proceedings of the LTC 2013, Poznań, Poland, pp. 555–560 (2013)
Google Scholar
Staš, J., Viszlay, P., Lojka, M., Koctúr, T., Hládek, D., Kiktová, E., Pleva, M., Juhár, J.: Automatic subtitling system for transcription, archiving and indexing of Slovak audiovisual recordings. In: Proceedings of the LTC 2015, Poznań, Poland, pp. 186–191 (2015)
Google Scholar
Staš, J., Zlacký, D., Hládek, D.: Semantically similar document retrieval framework for language model speaker adaptation. In: Proceedings of the RADIOELEKTRONIKA 2016, Košice, Slovakia, pp. 403–407 (2016)
Google Scholar
Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of the ICSLP 2002, Denver, USA, pp. 901–904 (2002)
Google Scholar
Varga, Á., Tarján, B., Tobler, Z., Szaszák, G., Fegyó, T., Bordás, C., Mihajlik, P.: Automatic close captioning for live hungarian television broadcast speech: a fast and resource-efficient approach. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS (LNAI), vol. 9319, pp. 105–112. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23132-7_13
Chapter Google Scholar
Vavrek, J., Viszlay, P., Kiktová, E., Lojka, M., Juhár, J., Čižmár, A.: Query-by-example retrieval via fast sequential dynamic time warping algorithm. In: Proceedings of the TSP 2014, Berlin, Germany, pp. 469–473 (2014)
Google Scholar
Viszlay, P., Lojka, M., Juhár, J.: Class-dependent two-dimensional linear discriminant analysis using two-pass recognition strategy. In: Proceedings of the EUSIPCO 2014, Lisbon, Portugal, pp. 1796–1800 (2014)
Google Scholar
Viszlay, P., Staš, J., Koctúr, T., Lojka, M., Juhár, J.: An extension of the Slovak broadcast news corpus based on semi-automatic annotation. In: Proceedings of the LREC 2016, Portorož, Slovenia, pp. 4684–4687 (2016)
Google Scholar
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the 7th LREC 2010 Workshop: New Challenges for NLP Frameworks, Valleta, Malta, pp. 46–50 (2010)
Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.4). Cambridge University, Cambridge (2006)
Google Scholar

Download references

Acknowledgement

The research in this paper was supported by the Faculty of Electrical Engineering and Informatics at the Technical University of Košice under the research project FEI-2015-30, by the Ministry of Education, Science, Research and Sport of the Slovak Republic under the research project VEGA 1/0511/17 and the Slovak Research and Development Agency under the research project APVV-15-0517.

Author information

Authors and Affiliations

Department of Electronics and Multimedia Communications, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Park Komenského 13, 042 00, Košice, Slovak Republic
Ján Staš, Peter Viszlay, Martin Lojka, Tomáš Koctúr, Daniel Hládek & Jozef Juhár

Authors

Ján Staš
View author publications
You can also search for this author in PubMed Google Scholar
Peter Viszlay
View author publications
You can also search for this author in PubMed Google Scholar
Martin Lojka
View author publications
You can also search for this author in PubMed Google Scholar
Tomáš Koctúr
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Hládek
View author publications
You can also search for this author in PubMed Google Scholar
Jozef Juhár
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ján Staš .

Editor information

Editors and Affiliations

Adam Mickiewicz University, Poznań, Poland
Zygmunt Vetulani
LIMSI-CNRS, Orsay Cedex, France
Joseph Mariani
Adam Mickiewicz University, Poznań, Poland
Marek Kubis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Staš, J., Viszlay, P., Lojka, M., Koctúr, T., Hládek, D., Juhár, J. (2018). Automatic Transcription and Subtitling of Slovak Multi-genre Audiovisual Recordings. In: Vetulani, Z., Mariani, J., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2015. Lecture Notes in Computer Science(), vol 10930. Springer, Cham. https://doi.org/10.1007/978-3-319-93782-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-93782-3_4
Published: 16 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93781-6
Online ISBN: 978-3-319-93782-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics