Unified Simplified Grapheme Acoustic Modeling for Medieval Latin LVCSR

  • Lili SzabóEmail author
  • Péter Mihajlik
  • András Balog
  • Tibor Fegyó
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10415)


A large vocabulary continuous speech recognition (LVCSR) system designed for dictation of medieval Latin language documents is introduced. Such language technology tool can be of great help for preserving Latin language charters from this era, as optical character recognition systems are often challenged by these historical materials. As corresponding historical research focuses on the Visegrad region, our primary aim is to make medieval Latin dictation available for texts and speakers of this region, concentrating on Czech, Hungarian and Polish. The baseline acoustic models we start with are monolingual grapheme-based ones. On one hand, the application of medieval Latin knowledge-based grapheme-to-phoneme (G2P) mapping from the source language to the target language resulted in significant improvement, reducing the Word Error Rate (WER) by \(13.3\%\). On the other hand, applying a Unified Simplified Grapheme (USG) inventory set for the three-language acoustic data set complemented with Romanian speech data, resulted in a further \(0.7\%\) WER reduction - without using any target or source language G2P rules.


G2P Medieval Latin Under-resourced speech recognition Unified simplified grapheme modeling 


  1. 1.
    Allen, W.S.: Vox Latina: A Guide to the Pronunciation of Classical Latin. Cambridge University Press, Cambridge (1978). [Eng.], 2nd edn., New YorkCrossRefGoogle Scholar
  2. 2.
    Besacier, L., Barnard, E., Karpov, A., Schultz, T.: Automatic speech recognition for under-resourced languages: a survey. Speech Commun. 56, 85–100 (2014)CrossRefGoogle Scholar
  3. 3.
    Encyclopedia of Caribbean Literature, Latin Regional Pronunciation (2007)Google Scholar
  4. 4.
    Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (2011)Google Scholar
  5. 5.
    Schultz, T., Waibel, A.: Language-independent and language-adaptive acoustic modeling for speech recognition. Speech Commun. 31, 31–51 (2001)CrossRefzbMATHGoogle Scholar
  6. 6.
    Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP), pp. 901–904 (2002)Google Scholar
  7. 7.
    Tarjan, B., Mozsolics, T., Balog, A., Halmos, D., Fegyo, T., Mihajlik, P.: Broadcast news transcription in Central-East European languages. In: 3rd IEEE International Conference on Cognitive Infocommunications, pp. 59–64 (2012)Google Scholar
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
    Waters, A., Bastani, M., Elfeky, M.G., Moreno, P., Velez, X.: Towards acoustic model unification across dialects. In: 2016 IEEE Workshop on Spoken Language Technology (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Lili Szabó
    • 1
    Email author
  • Péter Mihajlik
    • 2
    • 3
  • András Balog
    • 2
  • Tibor Fegyó
    • 1
    • 3
  1. 1.SpeechTex Ltd.BudapestHungary
  2. 2.THINKTech Research CenterBudapestHungary
  3. 3.Budapest University of Technology and EconomicsBudapestHungary

Personalised recommendations