Abstract
This paper presents a conversational telephone speech recognition system for the low-resourced Lithuanian language, developed in the context of IARPA-Babel program. Phoneme-based systems and grapheme-based systems are compared to establish whether or not it is necessary to use a phonemic lexicon. We explore the impact using Web data for language modeling and additional untranscribed data for semi-supervised training. Experimental results are reported for two conditions: Full Language Pack (FLP) and Very Limited Language Pack (VLLP), for which respectively 40 and 3 h of transcribed training data are available. Grapheme-based systems are shown to give comparable results to phoneme-based ones. Adding Web texts improves the performance of both the FLP and VLLP system. The best VLLP results are achieved using both Web texts and semi-supervised training.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In order to abide by the evaluation rules, different features were provided by BUT for the FLP and VLLP conditions, in the latter case being trained on multilingual data.
References
Ambrazas, V., Garšva, K., Girdenis, A.: Dabartinės lietuviu kalbos gramatika. (A Grammar of Modern Lithuanian), Vilnius, MELI (2006)
Filipovič, M., Lipeika, A.: Development of HMM/neural network-based mediumvocabularyisolated-word lithuanian speech recognition system. Informatica 15(4), 465–474(2004)
Fiscus, J.G., Ajot, J., Garofolo, J.S., Doddington, G.: Results of the 2006 spoken term detection evaluation. In: Proceedings of ACM SIGIR, vol. 7, pp. 51–55 (2007)
Fraga-Silva, T., Gauvain, J.L., Lamel, L.: Lattice-based unsupervised acoustic model training. In ICASSP 2011, 36th International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, pp. 4656–4659 (2011)
Gales, M.J.F., Knill, K.M., Ragni, A.: Unicode-based graphemic systems for limited resource languages. In: ICASSP 2015 (2015)
Gauvain, J.L., Lamel, L., Adda, G.: The LIMSI broad-cast news transcription system. Speech Commun. 37(1), 89–108 (2002)
Gelly, G. Gauvain, J.L.: Minimum word error training of RNN-based voice activity detection. In: Interspeech 2015, Dresden (2015)
Girdenis, A.: Teoriniai lietuviu fonologijos pagrindai (Theoretical Foundations of Lithuanian Phonology), 2nd edn. Mokslo ir enciklopediju leidybos inst., Vilnius (2003)
Grézl, F., Karafiát, M.: Semi-supervised bootstrapping approach for neural network feature extractor training. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 470–475. IEEE (2013)
Harper, M.: “IARPA Babel Program." http://www.iarpa.gov/index.php/research-programs/babel
Hartmann, W., Le, V.B., Messaoudi, A., Lamel, L., Gauvain, J.L.: Comparing decoding strategies for subword-based keyword spotting in low-resourced languages. In: Proceedings of Interspeech, Singapore, pp. 2764–2768 (2014)
Kanthak, S., Ney, H.: Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition. In: ICASSP, vol. 2, pp. 845–848 (2002)
Kazlauskienė, A., Raškinis, G.: Bendrinės lietuviu kalbos garsu dažnumas (The frequency of generic Lithuanian sounds). Respectus Philologicus 16(21), 169–182 (2009)
Kemp, T., Waibel, A.: Unsupervised training of a speech recognizer: recent experiments. In: ESCA Eurospeech, pp. 2725–2728 (1999)
Killer, M.: Grapheme-based speech recognition. M.S. Thesis, Carnegie Mellon University (2003)
Lamel, L.: Unsupervised acoustic model training with limited linguistic resources. In: ASRU 2013 (2013)
Lamel, L., et al.: Speech recognition for machine translation in Quaero. In: IWSLT 2011, pp. 121–128 (2011)
Laurinčiukaitė, S., Lipeika, A.: Syllable-phoneme based continuous speech recognition. Elektronika ir Elektrotechnika 70(6), 91–94 (2006)
Lipeika, A., Lipeikiene, J., Telksnys, L.: Development of isolated word speech recognition system. Informatica 13(1), 37–46 (2002)
Mangu, L., Brill, E., Stolcke, A.: Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Comput. Speech Lang. 14(4), 373–400 (2000)
Maskeliūnas, R., Rudžionis, A., Ratkevičius, K., Rudžionis, V.: Investigation of foreign languages models for Lithuanian speech recognition. Elektronika ir Elektrotechnika 91(3), 15–20 (2009)
Pakerys, A.: Lietuvių bendrinės kalbos fonetika. Enciklopedija, Valiulio Leidykla (The phonetics of generic Lithuanian language. Encyclopedia) (2003)
Prasad, R., et al.: The 2004 BBN/LIMSI 20xRT English Conversational Telephone Speech Recognition System. In: InterSpeech, pp. 1645–1648 (2005)
Raškinis, G., Raškinienė, D.: Building medium-vocabulary isolated-word Lithuanian HMM speech recognition system. Informatica 14(1), 75–84 (2003)
Šilingas, D., Laurinčiukaitė, S., Telksnys, L.: Towards Acoustic Modeling of Lithuanian Speech. In: Proceedings of International Conference SPECOM 2004, pp. 326–333 (2004)
Vaiciunas, A., Raškinis, G.: Cache-based statistical language models of English and highly inflected Lithuanian. Informatica 17(1), 111–124 (2006)
Vaišnienė, D., Zabarskaitė, J.: Lithuanuan Language in the Digital Age. Springer, White Paper Series (2012)
Zavaliagkos, G., Colthurst, T.: Utilizing untranscribed training data to improve performance. In: Proceedings of the DARPA Broadcast News Transcription and Understanding,Workshop, pp. 301–305 (1998)
Zhang, L., Karakos, D., Hartmann, W., Hsiao, R., Schwartz, R., Tsakalidis, S.: Enhancing low resource keyword spotting with automatically retrieved web documents. In: 2015 Interspeech (accepted, September 2015)
Acknowledgments
We would like to thank our IARPA-Babel partners for sharing resources (BUT for the bottle-neck features and BBN for the web data), and Grégory Gelly for providing the VADs.
This research was in part supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Defense US Army Research Laboratory contract number W911NF-12-C-0013. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoD/ARL, or the U.S. Government.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Lileikyte, R., Lamel, L., Gauvain, JL. (2015). Conversational Telephone Speech Recognition for Lithuanian. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-25789-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25788-4
Online ISBN: 978-3-319-25789-1
eBook Packages: Computer ScienceComputer Science (R0)