Conversational Telephone Speech Recognition for Lithuanian

Lileikyte, Rasa; Lamel, Lori; Gauvain, Jean-Luc

doi:10.1007/978-3-319-25789-1_16

Rasa Lileikyte¹⁶,
Lori Lamel¹⁶ &
Jean-Luc Gauvain¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9449))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

660 Accesses
1 Citations

Abstract

This paper presents a conversational telephone speech recognition system for the low-resourced Lithuanian language, developed in the context of IARPA-Babel program. Phoneme-based systems and grapheme-based systems are compared to establish whether or not it is necessary to use a phonemic lexicon. We explore the impact using Web data for language modeling and additional untranscribed data for semi-supervised training. Experimental results are reported for two conditions: Full Language Pack (FLP) and Very Limited Language Pack (VLLP), for which respectively 40 and 3 h of transcribed training data are available. Grapheme-based systems are shown to give comparable results to phoneme-based ones. Adding Web texts improves the performance of both the FLP and VLLP system. The best VLLP results are achieved using both Web texts and semi-supervised training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In order to abide by the evaluation rules, different features were provided by BUT for the FLP and VLLP conditions, in the latter case being trained on multilingual data.

References

Ambrazas, V., Garšva, K., Girdenis, A.: Dabartinės lietuviu kalbos gramatika. (A Grammar of Modern Lithuanian), Vilnius, MELI (2006)
Google Scholar
Filipovič, M., Lipeika, A.: Development of HMM/neural network-based mediumvocabularyisolated-word lithuanian speech recognition system. Informatica 15(4), 465–474(2004)
Google Scholar
Fiscus, J.G., Ajot, J., Garofolo, J.S., Doddington, G.: Results of the 2006 spoken term detection evaluation. In: Proceedings of ACM SIGIR, vol. 7, pp. 51–55 (2007)
Google Scholar
Fraga-Silva, T., Gauvain, J.L., Lamel, L.: Lattice-based unsupervised acoustic model training. In ICASSP 2011, 36th International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, pp. 4656–4659 (2011)
Google Scholar
Gales, M.J.F., Knill, K.M., Ragni, A.: Unicode-based graphemic systems for limited resource languages. In: ICASSP 2015 (2015)
Google Scholar
Gauvain, J.L., Lamel, L., Adda, G.: The LIMSI broad-cast news transcription system. Speech Commun. 37(1), 89–108 (2002)
Article MATH Google Scholar
Gelly, G. Gauvain, J.L.: Minimum word error training of RNN-based voice activity detection. In: Interspeech 2015, Dresden (2015)
Google Scholar
Girdenis, A.: Teoriniai lietuviu fonologijos pagrindai (Theoretical Foundations of Lithuanian Phonology), 2nd edn. Mokslo ir enciklopediju leidybos inst., Vilnius (2003)
Google Scholar
Grézl, F., Karafiát, M.: Semi-supervised bootstrapping approach for neural network feature extractor training. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 470–475. IEEE (2013)
Google Scholar
Harper, M.: “IARPA Babel Program." http://www.iarpa.gov/index.php/research-programs/babel
Hartmann, W., Le, V.B., Messaoudi, A., Lamel, L., Gauvain, J.L.: Comparing decoding strategies for subword-based keyword spotting in low-resourced languages. In: Proceedings of Interspeech, Singapore, pp. 2764–2768 (2014)
Google Scholar
Kanthak, S., Ney, H.: Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition. In: ICASSP, vol. 2, pp. 845–848 (2002)
Google Scholar
Kazlauskienė, A., Raškinis, G.: Bendrinės lietuviu kalbos garsu dažnumas (The frequency of generic Lithuanian sounds). Respectus Philologicus 16(21), 169–182 (2009)
Google Scholar
Kemp, T., Waibel, A.: Unsupervised training of a speech recognizer: recent experiments. In: ESCA Eurospeech, pp. 2725–2728 (1999)
Google Scholar
Killer, M.: Grapheme-based speech recognition. M.S. Thesis, Carnegie Mellon University (2003)
Google Scholar
Lamel, L.: Unsupervised acoustic model training with limited linguistic resources. In: ASRU 2013 (2013)
Google Scholar
Lamel, L., et al.: Speech recognition for machine translation in Quaero. In: IWSLT 2011, pp. 121–128 (2011)
Google Scholar
Laurinčiukaitė, S., Lipeika, A.: Syllable-phoneme based continuous speech recognition. Elektronika ir Elektrotechnika 70(6), 91–94 (2006)
Google Scholar
Lipeika, A., Lipeikiene, J., Telksnys, L.: Development of isolated word speech recognition system. Informatica 13(1), 37–46 (2002)
MATH Google Scholar
Mangu, L., Brill, E., Stolcke, A.: Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Comput. Speech Lang. 14(4), 373–400 (2000)
Article Google Scholar
Maskeliūnas, R., Rudžionis, A., Ratkevičius, K., Rudžionis, V.: Investigation of foreign languages models for Lithuanian speech recognition. Elektronika ir Elektrotechnika 91(3), 15–20 (2009)
Google Scholar
Pakerys, A.: Lietuvių bendrinės kalbos fonetika. Enciklopedija, Valiulio Leidykla (The phonetics of generic Lithuanian language. Encyclopedia) (2003)
Google Scholar
Prasad, R., et al.: The 2004 BBN/LIMSI 20xRT English Conversational Telephone Speech Recognition System. In: InterSpeech, pp. 1645–1648 (2005)
Google Scholar
Raškinis, G., Raškinienė, D.: Building medium-vocabulary isolated-word Lithuanian HMM speech recognition system. Informatica 14(1), 75–84 (2003)
MATH Google Scholar
Šilingas, D., Laurinčiukaitė, S., Telksnys, L.: Towards Acoustic Modeling of Lithuanian Speech. In: Proceedings of International Conference SPECOM 2004, pp. 326–333 (2004)
Google Scholar
Vaiciunas, A., Raškinis, G.: Cache-based statistical language models of English and highly inflected Lithuanian. Informatica 17(1), 111–124 (2006)
MATH Google Scholar
Vaišnienė, D., Zabarskaitė, J.: Lithuanuan Language in the Digital Age. Springer, White Paper Series (2012)
Google Scholar
Zavaliagkos, G., Colthurst, T.: Utilizing untranscribed training data to improve performance. In: Proceedings of the DARPA Broadcast News Transcription and Understanding,Workshop, pp. 301–305 (1998)
Google Scholar
Zhang, L., Karakos, D., Hartmann, W., Hsiao, R., Schwartz, R., Tsakalidis, S.: Enhancing low resource keyword spotting with automatically retrieved web documents. In: 2015 Interspeech (accepted, September 2015)
Google Scholar

Download references

Acknowledgments

We would like to thank our IARPA-Babel partners for sharing resources (BUT for the bottle-neck features and BBN for the web data), and Grégory Gelly for providing the VADs.

This research was in part supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Defense US Army Research Laboratory contract number W911NF-12-C-0013. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoD/ARL, or the U.S. Government.

Author information

Authors and Affiliations

CNRS/LIMSI, Spoken Language Processing Group, 91405, Orsay Cedex, France
Rasa Lileikyte, Lori Lamel & Jean-Luc Gauvain

Authors

Rasa Lileikyte
View author publications
You can also search for this author in PubMed Google Scholar
Lori Lamel
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Luc Gauvain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rasa Lileikyte .

Editor information

Editors and Affiliations

Research Group on Mathematical Linguistic, Rovira i Virgili University, Tarragona, Spain
Adrian-Horia Dediu
Research Group on Mathematical Linguistic, Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide
Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary
Klára Vicsi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lileikyte, R., Lamel, L., Gauvain, JL. (2015). Conversational Telephone Speech Recognition for Lithuanian. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-25789-1_16
Published: 17 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25788-4
Online ISBN: 978-3-319-25789-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics