Unified Simplified Grapheme Acoustic Modeling for Medieval Latin LVCSR

Szabó, Lili; Mihajlik, Péter; Balog, András; Fegyó, Tibor

doi:10.1007/978-3-319-64206-2_47

Unified Simplified Grapheme Acoustic Modeling for Medieval Latin LVCSR

Lili Szabó¹⁵,
Péter Mihajlik^16,17,
András Balog¹⁶ &
…
Tibor Fegyó^15,17

Conference paper
First Online: 29 July 2017

1466 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Abstract

A large vocabulary continuous speech recognition (LVCSR) system designed for dictation of medieval Latin language documents is introduced. Such language technology tool can be of great help for preserving Latin language charters from this era, as optical character recognition systems are often challenged by these historical materials. As corresponding historical research focuses on the Visegrad region, our primary aim is to make medieval Latin dictation available for texts and speakers of this region, concentrating on Czech, Hungarian and Polish. The baseline acoustic models we start with are monolingual grapheme-based ones. On one hand, the application of medieval Latin knowledge-based grapheme-to-phoneme (G2P) mapping from the source language to the target language resulted in significant improvement, reducing the Word Error Rate (WER) by \(13.3\%\). On the other hand, applying a Unified Simplified Grapheme (USG) inventory set for the three-language acoustic data set complemented with Romanian speech data, resulted in a further \(0.7\%\) WER reduction - without using any target or source language G2P rules.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://medilatin.speechtex.com.

References

Allen, W.S.: Vox Latina: A Guide to the Pronunciation of Classical Latin. Cambridge University Press, Cambridge (1978). [Eng.], 2nd edn., New York
Book Google Scholar
Besacier, L., Barnard, E., Karpov, A., Schultz, T.: Automatic speech recognition for under-resourced languages: a survey. Speech Commun. 56, 85–100 (2014)
Article Google Scholar
Encyclopedia of Caribbean Literature, Latin Regional Pronunciation (2007)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (2011)
Google Scholar
Schultz, T., Waibel, A.: Language-independent and language-adaptive acoustic modeling for speech recognition. Speech Commun. 31, 31–51 (2001)
Article MATH Google Scholar
Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP), pp. 901–904 (2002)
Google Scholar
Tarjan, B., Mozsolics, T., Balog, A., Halmos, D., Fegyo, T., Mihajlik, P.: Broadcast news transcription in Central-East European languages. In: 3rd IEEE International Conference on Cognitive Infocommunications, pp. 59–64 (2012)
Google Scholar
Hungarian speecon database (2003). http://catalog.elra.info/product_info.php?products_id=1093
Czech speecon database (2004). http://catalog.elra.info/product_info.php?products_id=1095
Monasterium.net archive. http://monasterium.net/mom/HU-PBFL/archive
Latin library archive. http://www.thelatinlibrary.com/medieval.html
Waters, A., Bastani, M., Elfeky, M.G., Moreno, P., Velez, X.: Towards acoustic model unification across dialects. In: 2016 IEEE Workshop on Spoken Language Technology (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

SpeechTex Ltd., Budapest, Hungary
Lili Szabó & Tibor Fegyó
THINKTech Research Center, Budapest, Hungary
Péter Mihajlik & András Balog
Budapest University of Technology and Economics, Budapest, Hungary
Péter Mihajlik & Tibor Fegyó

Authors

Lili Szabó
View author publications
You can also search for this author in PubMed Google Scholar
Péter Mihajlik
View author publications
You can also search for this author in PubMed Google Scholar
András Balog
View author publications
You can also search for this author in PubMed Google Scholar
Tibor Fegyó
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lili Szabó .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Kamil Ekštein
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Szabó, L., Mihajlik, P., Balog, A., Fegyó, T. (2017). Unified Simplified Grapheme Acoustic Modeling for Medieval Latin LVCSR. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-64206-2_47
Published: 29 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64205-5
Online ISBN: 978-3-319-64206-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics