Skip to main content

Conversational Telephone Speech Recognition for Lithuanian

  • Conference paper
  • First Online:
Statistical Language and Speech Processing (SLSP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9449))

Included in the following conference series:

Abstract

This paper presents a conversational telephone speech recognition system for the low-resourced Lithuanian language, developed in the context of IARPA-Babel program. Phoneme-based systems and grapheme-based systems are compared to establish whether or not it is necessary to use a phonemic lexicon. We explore the impact using Web data for language modeling and additional untranscribed data for semi-supervised training. Experimental results are reported for two conditions: Full Language Pack (FLP) and Very Limited Language Pack (VLLP), for which respectively 40 and 3 h of transcribed training data are available. Grapheme-based systems are shown to give comparable results to phoneme-based ones. Adding Web texts improves the performance of both the FLP and VLLP system. The best VLLP results are achieved using both Web texts and semi-supervised training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In order to abide by the evaluation rules, different features were provided by BUT for the FLP and VLLP conditions, in the latter case being trained on multilingual data.

References

  1. Ambrazas, V., Garšva, K., Girdenis, A.: Dabartinės lietuviu kalbos gramatika. (A Grammar of Modern Lithuanian), Vilnius, MELI (2006)

    Google Scholar 

  2. Filipovič, M., Lipeika, A.: Development of HMM/neural network-based mediumvocabularyisolated-word lithuanian speech recognition system. Informatica 15(4), 465–474(2004)

    Google Scholar 

  3. Fiscus, J.G., Ajot, J., Garofolo, J.S., Doddington, G.: Results of the 2006 spoken term detection evaluation. In: Proceedings of ACM SIGIR, vol. 7, pp. 51–55 (2007)

    Google Scholar 

  4. Fraga-Silva, T., Gauvain, J.L., Lamel, L.: Lattice-based unsupervised acoustic model training. In ICASSP 2011, 36th International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, pp. 4656–4659 (2011)

    Google Scholar 

  5. Gales, M.J.F., Knill, K.M., Ragni, A.: Unicode-based graphemic systems for limited resource languages. In: ICASSP 2015 (2015)

    Google Scholar 

  6. Gauvain, J.L., Lamel, L., Adda, G.: The LIMSI broad-cast news transcription system. Speech Commun. 37(1), 89–108 (2002)

    Article  MATH  Google Scholar 

  7. Gelly, G. Gauvain, J.L.: Minimum word error training of RNN-based voice activity detection. In: Interspeech 2015, Dresden (2015)

    Google Scholar 

  8. Girdenis, A.: Teoriniai lietuviu fonologijos pagrindai (Theoretical Foundations of Lithuanian Phonology), 2nd edn. Mokslo ir enciklopediju leidybos inst., Vilnius (2003)

    Google Scholar 

  9. Grézl, F., Karafiát, M.: Semi-supervised bootstrapping approach for neural network feature extractor training. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 470–475. IEEE (2013)

    Google Scholar 

  10. Harper, M.: “IARPA Babel Program." http://www.iarpa.gov/index.php/research-programs/babel

  11. Hartmann, W., Le, V.B., Messaoudi, A., Lamel, L., Gauvain, J.L.: Comparing decoding strategies for subword-based keyword spotting in low-resourced languages. In: Proceedings of Interspeech, Singapore, pp. 2764–2768 (2014)

    Google Scholar 

  12. Kanthak, S., Ney, H.: Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition. In: ICASSP, vol. 2, pp. 845–848 (2002)

    Google Scholar 

  13. Kazlauskienė, A., Raškinis, G.: Bendrinės lietuviu kalbos garsu dažnumas (The frequency of generic Lithuanian sounds). Respectus Philologicus 16(21), 169–182 (2009)

    Google Scholar 

  14. Kemp, T., Waibel, A.: Unsupervised training of a speech recognizer: recent experiments. In: ESCA Eurospeech, pp. 2725–2728 (1999)

    Google Scholar 

  15. Killer, M.: Grapheme-based speech recognition. M.S. Thesis, Carnegie Mellon University (2003)

    Google Scholar 

  16. Lamel, L.: Unsupervised acoustic model training with limited linguistic resources. In: ASRU 2013 (2013)

    Google Scholar 

  17. Lamel, L., et al.: Speech recognition for machine translation in Quaero. In: IWSLT 2011, pp. 121–128 (2011)

    Google Scholar 

  18. Laurinčiukaitė, S., Lipeika, A.: Syllable-phoneme based continuous speech recognition. Elektronika ir Elektrotechnika 70(6), 91–94 (2006)

    Google Scholar 

  19. Lipeika, A., Lipeikiene, J., Telksnys, L.: Development of isolated word speech recognition system. Informatica 13(1), 37–46 (2002)

    MATH  Google Scholar 

  20. Mangu, L., Brill, E., Stolcke, A.: Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Comput. Speech Lang. 14(4), 373–400 (2000)

    Article  Google Scholar 

  21. Maskeliūnas, R., Rudžionis, A., Ratkevičius, K., Rudžionis, V.: Investigation of foreign languages models for Lithuanian speech recognition. Elektronika ir Elektrotechnika 91(3), 15–20 (2009)

    Google Scholar 

  22. Pakerys, A.: Lietuvių bendrinės kalbos fonetika. Enciklopedija, Valiulio Leidykla (The phonetics of generic Lithuanian language. Encyclopedia) (2003)

    Google Scholar 

  23. Prasad, R., et al.: The 2004 BBN/LIMSI 20xRT English Conversational Telephone Speech Recognition System. In: InterSpeech, pp. 1645–1648 (2005)

    Google Scholar 

  24. Raškinis, G., Raškinienė, D.: Building medium-vocabulary isolated-word Lithuanian HMM speech recognition system. Informatica 14(1), 75–84 (2003)

    MATH  Google Scholar 

  25. Šilingas, D., Laurinčiukaitė, S., Telksnys, L.: Towards Acoustic Modeling of Lithuanian Speech. In: Proceedings of International Conference SPECOM 2004, pp. 326–333 (2004)

    Google Scholar 

  26. Vaiciunas, A., Raškinis, G.: Cache-based statistical language models of English and highly inflected Lithuanian. Informatica 17(1), 111–124 (2006)

    MATH  Google Scholar 

  27. Vaišnienė, D., Zabarskaitė, J.: Lithuanuan Language in the Digital Age. Springer, White Paper Series (2012)

    Google Scholar 

  28. Zavaliagkos, G., Colthurst, T.: Utilizing untranscribed training data to improve performance. In: Proceedings of the DARPA Broadcast News Transcription and Understanding,Workshop, pp. 301–305 (1998)

    Google Scholar 

  29. Zhang, L., Karakos, D., Hartmann, W., Hsiao, R., Schwartz, R., Tsakalidis, S.: Enhancing low resource keyword spotting with automatically retrieved web documents. In: 2015 Interspeech (accepted, September 2015)

    Google Scholar 

Download references

Acknowledgments

We would like to thank our IARPA-Babel partners for sharing resources (BUT for the bottle-neck features and BBN for the web data), and Grégory Gelly for providing the VADs.

This research was in part supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Defense US Army Research Laboratory contract number W911NF-12-C-0013. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoD/ARL, or the U.S. Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rasa Lileikyte .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Lileikyte, R., Lamel, L., Gauvain, JL. (2015). Conversational Telephone Speech Recognition for Lithuanian. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25789-1_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25788-4

  • Online ISBN: 978-3-319-25789-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics